Metrics
How to track and display metrics
Creating the Namespace
We create namespaces by writing to them inside a processor.
let dimensions = {
namespace: "cloudwatch",
app: "flows",
action: obj.event.action,
sender: obj.event.src_ip,
destination: obj.event.dest_ip
}
Platform_Metric_Counter("cloudwatch_flow_count", dimensions, 1)
While we call the namespace cloudwatch
the metrics will be called cloudwatch_flow_count
.
In this example, there is an object something like:
obj: {
event: {
account_id: "999954988160",
action: "ACCEPT",
bytes: "1542",
dest_ip: "18.118.216.36",
dest_port: "50221",
end_time: "1726663030",
id: "38505871963542404529999865800232899601069883806077878278",
interface_id: "eni-0739b0dca912ae857",
logStream: "eni-0739b0dca912ae857-all",
log_status: "OK",
owner: "999904988160",
packets: "7",
parserName: "AWS CloudWatch Network Log",
protocol: "6",
src_ip: "10.54.1.244",
src_port: "3389",
start_time: "1726663016",
subscriptionFilters: [
"passToKinesis"
],
version: "2"
},
sourcetype: "cloudwatch",
time: 1726663138987
}
We are defining a Pod , which is a holder for that data. We want to make this Pod simple for this keeps the size of the structure down. When we define a dimension for a metric, the objective is that it is focused on the query we are going to ask.
Creating a Table and Histogram
To create a table and histogram, we will tend to use two common functions: Platform_Metric_Sort
and Platform_Metric_Sort_Histogram
. These functions create the Prometheus queries, execute them, and provide a results object.
function main() {
let from ="-1h@h"
let to="@h"
let groupBy= ["sender", "destination"]
// let groupBy= "sender"
let options = {
metric: "cloudwatch_flow_count",
from: from,
to: to,
select: `action = "REJECT"`,
groupBy: groupBy,
sort: "topk",
limit: 20
}
let table = Platform_Metric_Sort(options)
options.interval= "5m"
let histogram = Platform_Metric_Sort_Histogram(options)
return {table, histogram}
}
The above code generates an Output like this:
How to Execute a Task/Report
There are two places we execute Tasks. On the menu, click the Report option:
There are the
- Task History
- Task Editor
Task Editor is for creating new tasks, while task history is a way to see the results of previous tasks. Tasks in the history are a combination of all user and system run tasks.
New Tasks
We use the Task Editor to create a new task. It will start with a blank editor. There are three tabs
- Editor: Where we place the code
- Output: The values placed in the return of the main code body
- Report: Where you add a presentation layuer to organize how a report looks.
You will place your code in the editor and run it. Any error or printf (console prints) will appear on the bottom of the Edit tab, while the results will appear on the Output tab.
Useful Queries
In these example, we will use the metric of cloudwatch_flow_count
You are going to create your own metric names.
We can write a simple process that will allow us to query the metric database and see results:
function main() {
let query = `<PLACE QUERY HERE>`
let table = Platform_Metric_Query(query, "@m")
return {table}
}
The remainder of the manual provides some useful queires to use in this code snippet.
For a more complete understanding of queries refer to VictoriaMetricsDocs. While this is the underlying engine, there are many examples by Googling Prometheus queries.
What are the Available namespaces?
Sometimes you are looking at a system that you are not as familiar using. In this case, we can request all namespaces.
group by(__name__) ({__name__!=""})
What are the dimensions of a name space?
We can simply query the and columns will show the dimensions.
topk(5, sort_desc(cloudwatch_flow_count))
or add a time limit
topk(5, sort_desc(cloudwatch_flow_count[5m]))
Count of pods per cluster and namespace
Having a list of how many pods your namespaces have in your cluster can be useful for detecting an unusually high or low number of pods on your namespaces.
sum by (namespace) (cloudwatch_flow_count)
Aggregate request rates
One of the key health indicators you want to measure about a service is how many requests it currently serves. Rate tells us how fast a value is increasing
rate(cloudwatch_flow_count[5m])
But often you care about a particular dimension, or group of dimensions. In this case, we use the ‘sum’ command.
sum by(sender, action)(rate(cloudwatch_flow_count[5m]))
Often, we get rates of zero for attributes with no data. We can remove them by only listing those attributes with a value:
sum by(destination)(rate(cloudwatch_flow_count[5m])) > 0
Updated 4 months ago