AWS ECS Cluster by HTTP
Overview
The template to monitor AWS ECS Cluster by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. NOTE This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.
Additional information about the metrics and used API methods:
- Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-metrics-ECS.html
Requirements
Zabbix version: 7.0 and higher.
Tested versions
This template has been tested on:
- AWS ECS Cluster by HTTP
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
The template gets AWS ECS metrics and uses the script item to make HTTP requests to the CloudWatch API.
Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.
Add the following required permissions to your Zabbix IAM policy in order to collect Amazon ECS metrics.
{
"Version":"2012-10-17",
"Statement":[
{
"Action":[
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"ecs:ListServices",
"esc:ListTasks"
],
"Effect":"Allow",
"Resource":"*"
}
]
}
If you are using role-based authorization, set the appropriate permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"ecs:ListServices",
"esc:ListTasks",
"ec2:AssociateIamInstanceProfile",
"ec2:ReplaceIamInstanceProfileAssociation"
],
"Resource": "*"
}
]
}
Set the following macros "{$AWS.AUTH_TYPE}", "{$AWS.REGION}", "{$AWS.ECS.CLUSTER.NAME}"
If you are using access key-based authorization, set the following macros "{$AWS.ACCESS.KEY.ID}", "{$AWS.SECRET.ACCESS.KEY}"
For more information about managing access keys, see official documentation
Refer to the Macros section for a list of macros used for LLD filters.
Additional information about the metrics and used API methods:
- Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html
Macros used
Name | Description | Default |
---|---|---|
{$AWS.PROXY} | Sets HTTP proxy value. If this macro is empty then no proxy is used. |
|
{$AWS.ACCESS.KEY.ID} | Access key ID. |
|
{$AWS.SECRET.ACCESS.KEY} | Secret access key. |
|
{$AWS.REGION} | Amazon ECS Region code. |
us-west-1 |
{$AWS.AUTH_TYPE} | Authorization method. Possible values: role_base, access_key. |
access_key |
{$AWS.ECS.CLUSTER.NAME} | ECS cluster name. |
|
{$AWS.ECS.LLD.FILTER.ALARM_NAME.MATCHES} | Filter of discoverable alarms by name. |
.* |
{$AWS.ECS.LLD.FILTER.ALARM_NAME.NOT_MATCHES} | Filter to exclude discovered alarms by name. |
CHANGE_IF_NEEDED |
{$AWS.ECS.LLD.FILTER.ALARM_SERVICE_NAMESPACE.MATCHES} | Filter of discoverable alarms by namespace. |
.* |
{$AWS.ECS.LLD.FILTER.ALARM_SERVICE_NAMESPACE.NOT_MATCHES} | Filter to exclude discovered alarms by namespace. |
CHANGE_IF_NEEDED |
{$AWS.ECS.LLD.FILTER.SERVICE.MATCHES} | Filter of discoverable services by name. |
.* |
{$AWS.ECS.LLD.FILTER.SERVICE.NOT_MATCHES} | Filter to exclude discovered services by name. |
CHANGE_IF_NEEDED |
{$AWS.ECS.CLUSTER.CPU.UTIL.WARN} | The warning threshold of the cluster CPU utilization expressed in %. |
70 |
{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN} | The warning threshold of the cluster memory utilization expressed in %. |
70 |
{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN} | The warning threshold of the cluster service CPU utilization expressed in %. |
80 |
{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN} | The warning threshold of the cluster service memory utilization expressed in %. |
80 |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Get cluster metrics | Get cluster metrics. Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html |
Script | aws.ecs.get_metrics Preprocessing
|
Get cluster services | Get cluster services. Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html |
Script | aws.ecs.get_cluster_services Preprocessing
|
Get alarms data | Get alarms data. DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html |
Script | aws.ecs.get_alarms Preprocessing
|
Get metrics check | Data collection check. |
Dependent item | aws.ecs.metrics.check Preprocessing
|
Get alarms check | Data collection check. |
Dependent item | aws.ecs.alarms.check Preprocessing
|
Container Instance Count | 'The number of EC2 instances running the Amazon ECS agent that are registered with a cluster.' |
Dependent item | aws.ecs.container_instance_count Preprocessing
|
Task Count | 'The number of tasks running in the cluster.' |
Dependent item | aws.ecs.task_count Preprocessing
|
Service Count | 'The number of services in the cluster.' |
Dependent item | aws.ecs.service_count Preprocessing
|
CPU Reserved | 'A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition.' |
Dependent item | aws.ecs.cpu_reserved Preprocessing
|
CPU Utilization | Cluster CPU utilization |
Dependent item | aws.ecs.cpu_utilization Preprocessing
|
Memory Utilization | 'The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.' |
Dependent item | aws.ecs.memory_utilization Preprocessing
|
Network rx bytes | 'The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.' |
Dependent item | aws.ecs.network.rx Preprocessing
|
Network tx bytes | 'The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.' |
Dependent item | aws.ecs.network.tx Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Failed to get metrics data | length(last(/AWS ECS Cluster by HTTP/aws.ecs.metrics.check))>0 |
Warning | ||
Failed to get alarms data | length(last(/AWS ECS Cluster by HTTP/aws.ecs.alarms.check))>0 |
Warning | ||
High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/AWS ECS Cluster by HTTP/aws.ecs.cpu_utilization,15m)>{$AWS.ECS.CLUSTER.CPU.UTIL.WARN} |
Warning | |
High memory utilization | The system is running out of free memory. |
min(/AWS ECS Cluster by HTTP/aws.ecs.memory_utilization,15m)>{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN} |
Warning |
LLD rule Cluster Alarms discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster Alarms discovery | Discovery instance alarms. |
Dependent item | aws.ecs.alarms.discovery Preprocessing
|
Item prototypes for Cluster Alarms discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
[{#ALARM_NAME}]: Get metrics | Get alarm metrics about the state and its reason. |
Dependent item | aws.ecs.alarm.get_metrics["{#ALARM_NAME}"] Preprocessing
|
[{#ALARM_NAME}]: State reason | An explanation for the alarm state, in text format. Alarm description: {#ALARM_DESCRIPTION} |
Dependent item | aws.ecs.alarm.state_reason["{#ALARM_NAME}"] Preprocessing
|
[{#ALARM_NAME}]: State | The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENT_DATA), 2 (ALARM). Alarm description: {#ALARM_DESCRIPTION} |
Dependent item | aws.ecs.alarm.state["{#ALARM_NAME}"] Preprocessing
|
Trigger prototypes for Cluster Alarms discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
[{#ALARM_NAME}] has 'Alarm' state | Alarm "{#ALARM_NAME}" has 'Alarm' state. |
last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state_reason["{#ALARM_NAME}"]))>0 |
Average | |
[{#ALARM_NAME}] has 'Insufficient data' state | last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=1 |
Info |
LLD rule Cluster Services discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster Services discovery | Discovery {$AWS.ECS.CLUSTER.NAME} services. |
Dependent item | aws.ecs.services.discovery Preprocessing
|
Item prototypes for Cluster Services discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
[{#AWS.ECS.SERVICE.NAME}]: Running Task | The number of tasks currently in the |
Dependent item | aws.ecs.services.running.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
[{#AWS.ECS.SERVICE.NAME}]: Pending Task | The number of tasks currently in the |
Dependent item | aws.ecs.services.pending.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
[{#AWS.ECS.SERVICE.NAME}]: Desired Task | The desired number of tasks for an {#AWS.ECS.SERVICE.NAME} service. |
Dependent item | aws.ecs.services.desired.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
[{#AWS.ECS.SERVICE.NAME}]: Task Set | The number of task sets in the {#AWS.ECS.SERVICE.NAME} service. |
Dependent item | aws.ecs.services.task.set["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
[{#AWS.ECS.SERVICE.NAME}]: CPU Reserved | "A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition." |
Dependent item | aws.ecs.services.cpu_reserved["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
[{#AWS.ECS.SERVICE.NAME}]: CPU Utilization | "A number of CPU units used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition." |
Dependent item | aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
[{#AWS.ECS.SERVICE.NAME}]: Memory utilized | 'The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.' |
Dependent item | aws.ecs.services.memory_utilized["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
[{#AWS.ECS.SERVICE.NAME}]: Memory utilization | 'The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.' |
Dependent item | aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
[{#AWS.ECS.SERVICE.NAME}]: Memory reserved | 'The memory that is reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.' |
Dependent item | aws.ecs.services.memory_reserved["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
[{#AWS.ECS.SERVICE.NAME}]: Network rx bytes | 'The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.' |
Dependent item | aws.ecs.services.network.rx["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
[{#AWS.ECS.SERVICE.NAME}]: Network tx bytes | 'The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.' |
Dependent item | aws.ecs.services.network.tx["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
[{#AWS.ECS.SERVICE.NAME}]: Get metrics | Get metrics of ESC services. Full metrics list related to ECS : https://docs.aws.amazon.com/ecs/index.html |
Script | aws.ecs.services.get_metrics["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
Trigger prototypes for Cluster Services discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
[{#AWS.ECS.SERVICE.NAME}]: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/AWS ECS Cluster by HTTP/aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN} |
Warning | |
[{#AWS.ECS.SERVICE.NAME}]: High memory utilization | The system is running out of free memory. |
min(/AWS ECS Cluster by HTTP/aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN} |
Warning |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums