AWS - CloudWatch Enum
Last updated
Last updated
Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE) Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)
CloudWatch collects monitoring and operational data in the form of logs/metrics/events providing a unified view of AWS resources, applications and services. CloudWatch Log Event have a size limitation of 256KB on each log line. It can set high resolution alarms, visualize logs and metrics side by side, take automated actions, troubleshoot issues, and discover insights to optimize applications.
You can monitor for example logs from CloudTrail. Events that are monitored:
Changes to Security Groups and NACLs
Starting, Stopping, rebooting and terminating EC2 instances
Changes to Security Policies within IAM and S3
Failed login attempts to the AWS Management Console
API calls that resulted in failed authorization
Filters to search in cloudwatch: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html
A namespace is a container for CloudWatch metrics. It helps to categorize and isolate metrics, making it easier to manage and analyze them.
Examples: AWS/EC2 for EC2-related metrics, AWS/RDS for RDS metrics.
Metrics are data points collected over time that represent the performance or utilization of AWS resources. Metrics can be collected from AWS services, custom applications, or third-party integrations.
Example: CPUUtilization, NetworkIn, DiskReadOps.
Dimensions are key-value pairs that are part of metrics. They help to uniquely identify a metric and provide additional context, being 30 the most number of dimensions that can be associated with a metric. Dimensions also allow to filter and aggregate metrics based on specific attributes.
Example: For EC2 instances, dimensions might include InstanceId, InstanceType, and AvailabilityZone.
Statistics are mathematical calculations performed on metric data to summarize it over time. Common statistics include Average, Sum, Minimum, Maximum, and SampleCount.
Example: Calculating the average CPU utilization over a period of one hour.
Units are the measurement type associated with a metric. Units help to provide context and meaning to the metric data. Common units include Percent, Bytes, Seconds, Count.
Example: CPUUtilization might be measured in Percent, while NetworkIn might be measured in Bytes.
CloudWatch Dashboards provide customizable views of your AWS CloudWatch metrics. It is possible to create and configure dashboards to visualize data and monitor resources in a single view, combining different metrics from various AWS services.
Key Features:
Widgets: Building blocks of dashboards, including graphs, text, alarms, and more.
Customization: Layout and content can be customized to fit specific monitoring needs.
Example Use Case:
A single dashboard showing key metrics for your entire AWS environment, including EC2 instances, RDS databases, and S3 buckets.
Metric Streams in AWS CloudWatch enable you to continuously stream CloudWatch metrics to a destination of your choice in near real-time. This is particularly useful for advanced monitoring, analytics, and custom dashboards using tools outside of AWS.
Metric Data inside Metric Streams refers to the actual measurements or data points that are being streamed. These data points represent various metrics like CPU utilization, memory usage, etc., for AWS resources.
Example Use Case:
Sending real-time metrics to a third-party monitoring service for advanced analysis.
Archiving metrics in an Amazon S3 bucket for long-term storage and compliance.
CloudWatch Alarms monitor your metrics and perform actions based on predefined thresholds. When a metric breaches a threshold, the alarm can perform one or more actions such as sending notifications via SNS, triggering an auto-scaling policy, or running an AWS Lambda function.
Key Components:
Threshold: The value at which the alarm triggers.
Evaluation Periods: The number of periods over which data is evaluated.
Datapoints to Alarm: The number of periods with a reached threshold needed to trigger the alarm
Actions: What happens when an alarm state is triggered (e.g., notify via SNS).
Example Use Case:
Monitoring EC2 instance CPU utilization and sending a notification via SNS if it exceeds 80% for 5 consecutive minutes.
Anomaly Detectors use machine learning to automatically detect anomalies in your metrics. You can apply anomaly detection to any CloudWatch metric to identify deviations from normal patterns that might indicate issues.
Key Components:
Model Training: CloudWatch uses historical data to train a model and establish what normal behavior looks like.
Anomaly Detection Band: A visual representation of the expected range of values for a metric.
Example Use Case:
Detecting unusual CPU utilization patterns in an EC2 instance that might indicate a security breach or application issue.
Insight Rules allow you to identify trends, detect spikes, or other patterns of interest in your metric data using powerful mathematical expressions to define the conditions under which actions should be taken. These rules can help you identify anomalies or unusual behaviors in your resource performance and utilization.
Managed Insight Rules are pre-configured insight rules provided by AWS. They are designed to monitor specific AWS services or common use cases and can be enabled without needing detailed configuration.
Example Use Case:
Monitoring RDS Performance: Enable a managed insight rule for Amazon RDS that monitors key performance indicators such as CPU utilization, memory usage, and disk I/O. If any of these metrics exceed safe operational thresholds, the rule can trigger an alert or automated mitigation action.
Allows to aggregate and monitor logs from applications and systems from AWS services (including CloudTrail) and from apps/systems (CloudWatch Agent can be installed on a host). Logs can be stored indefinitely (depending on the Log Group settings) and can be exported.
Elements:
Log Group
A collection of log streams that share the same retention, monitoring, and access control settings
Log Stream
A sequence of log events that share the same source
Subscription Filters
Define a filter pattern that matches events in a particular log group, send them to Kinesis Data Firehose stream, Kinesis stream, or a Lambda function
CloudWatch basic aggregates data every 5min (the detailed one does that every 1 min). After the aggregation, it checks the thresholds of the alarms in case it needs to trigger one. In that case, CLoudWatch can be prepared to send an event and perform some automatic actions (AWS lambda functions, SNS topics, SQS queues, Kinesis Streams)
You can install agents inside your machines/containers to automatically send the logs back to CloudWatch.
Create a role and attach it to the instance with permissions allowing CloudWatch to collect data from the instances in addition to interacting with AWS systems manager SSM (CloudWatchAgentAdminPolicy & AmazonEC2RoleforSSM)
Download and install the agent onto the EC2 instance (https://s3.amazonaws.com/amazoncloudwatch-agent/linux/amd64/latest/AmazonCloudWatchAgent.zip). You can download it from inside the EC2 or install it automatically using AWS System Manager selecting the package AWS-ConfigureAWSPackage
Configure and start the CloudWatch Agent
A log group has many streams. A stream has many events. And inside of each stream, the events are guaranteed to be in order.
cloudwatch:DeleteAlarms
,cloudwatch:PutMetricAlarm
, cloudwatch:PutCompositeAlarm
An attacker with this permissions could significantly undermine an organization's monitoring and alerting infrastructure. By deleting existing alarms, an attacker could disable crucial alerts that notify administrators of critical performance issues, security breaches, or operational failures. Furthermore, by creating or modifying metric alarms, the attacker could also mislead administrators with false alerts or silence legitimate alarms, effectively masking malicious activities and preventing timely responses to actual incidents.
In addition, with the cloudwatch:PutCompositeAlarm
permission, an attacker would be able to create a loop or cycle of composite alarms, where composite alarm A depends on composite alarm B, and composite alarm B also depends on composite alarm A. In this scenario, it is not possible to delete any composite alarm that is part of the cycle because there is always still a composite alarm that depends on that alarm that you want to delete.
The following example shows how to make a metric alarm ineffective:
This metric alarm monitors the average CPU utilization of a specific EC2 instance, evaluates the metric every 300 seconds and requires 6 evaluation periods (30 minutes total). If the average CPU utilization exceeds 60% for at least 4 of these periods, the alarm will trigger and send a notification to the specified SNS topic.
By modifying the Threshold to be more than 99%, setting the Period to 10 seconds, the Evaluation Periods to 8640 (since 8640 periods of 10 seconds equal 1 day), and the Datapoints to Alarm to 8640 as well, it would be necessary for the CPU utilization to be over 99% every 10 seconds throughout the entire 24-hour period to trigger an alarm.
Potential Impact: Lack of notifications for critical events, potential undetected issues, false alerts, suppress genuine alerts and potentially missed detections of real incidents.
cloudwatch:DeleteAlarmActions
, cloudwatch:EnableAlarmActions
, cloudwatch:SetAlarmState
By deleting alarm actions, the attacker could prevent critical alerts and automated responses from being triggered when an alarm state is reached, such as notifying administrators or triggering auto-scaling activities. Enabling or re-enabling alarm actions inappropriately could also lead to unexpected behaviors, either by reactivating previously disabled actions or by modifying which actions are triggered, potentially causing confusion and misdirection in incident response.
In addition, an attacker with the permission could manipulate alarm states, being able to create false alarms to distract and confuse administrators, or silence genuine alarms to hide ongoing malicious activities or critical system failures.
If you use SetAlarmState
on a composite alarm, the composite alarm is not guaranteed to return to its actual state. It returns to its actual state only once any of its children alarms change state. It is also reevaluated if you update its configuration.
Potential Impact: Lack of notifications for critical events, potential undetected issues, false alerts, suppress genuine alerts and potentially missed detections of real incidents.
cloudwatch:DeleteAnomalyDetector
, cloudwatch:PutAnomalyDetector
An attacker would be able to compromise the ability of detection and respond to unusual patterns or anomalies in metric data. By deleting existing anomaly detectors, an attacker could disable critical alerting mechanisms; and by creating or modifying them, it would be able either to misconfigure or create false positives in order to distract or overwhelm the monitoring.
The following example shows how to make a metric anomaly detector ineffective. This metric anomaly detector monitors the average CPU utilization of a specific EC2 instance, and just by adding the “ExcludedTimeRanges” parameter with the desired time range, it would be enough to ensure that the anomaly detector does not analyze or alert on any relevant data during that period.
Potential Impact: Direct effect in the detection of unusual patterns or security threats.
cloudwatch:DeleteDashboards
, cloudwatch:PutDashboard
An attacker would be able to compromise the monitoring and visualization capabilities of an organization by creating, modifying or deleting its dashboards. This permissions could be leveraged to remove critical visibility into the performance and health of systems, alter dashboards to display incorrect data or hide malicious activities.
Potential Impact: Loss of monitoring visibility and misleading information.
cloudwatch:DeleteInsightRules
, cloudwatch:PutInsightRule
,cloudwatch:PutManagedInsightRule
Insight rules are used to detect anomalies, optimize performance, and manage resources effectively. By deleting existing insight rules, an attacker could remove critical monitoring capabilities, leaving the system blind to performance issues and security threats. Additionally, an attacker could create or modify insight rules to generate misleading data or hide malicious activities, leading to incorrect diagnostics and inappropriate responses from the operations team.
Potential Impact: Difficulty to detect and respond to performance issues and anomalies, misinformed decision-making and potentially hiding malicious activities or system failures.
cloudwatch:DisableInsightRules
, cloudwatch:EnableInsightRules
By disabling critical insight rules, an attacker could effectively blind the organization to key performance and security metrics. Conversely, by enabling or configuring misleading rules, it could be possible to generate false data, create noise, or hide malicious activity.
Potential Impact: Confusion among the operations team, leading to delayed responses to actual issues and unnecessary actions based on false alerts.
cloudwatch:DeleteMetricStream
, cloudwatch:PutMetricStream
, cloudwatch:PutMetricData
An attacker with the cloudwatch:DeleteMetricStream
, cloudwatch:PutMetricStream
permissions would be able to create and delete metric data streams, compromising the security, monitoring and data integrity:
Create malicious streams: Create metric streams to send sensitive data to unauthorized destinations.
Resource manipulation: The creation of new metric streams with excessive data could produce a lot of noise, causing incorrect alerts, masking true issues.
Monitoring disruption: Deleting metric streams, attackers would disrupt the continuos flow of monitoring data. This way, their malicious activities would be effectively hidden.
Similarly, with the cloudwatch:PutMetricData
permission, it would be possible to add data to a metric stream. This could lead to a DoS because of the amount of improper data added, making it completely useless.
Example of adding data corresponding to a 70% of a CPU utilization over a given EC2 instance:
Potential Impact: Disruption in the flow of monitoring data, impacting the detection of anomalies and incidents, resource manipulation and costs increasing due to the creation of excessive metric streams.
cloudwatch:StopMetricStreams
, cloudwatch:StartMetricStreams
An attacker would control the flow of the affected metric data streams (every data stream if there is no resource restriction). With the permission cloudwatch:StopMetricStreams
, attackers could hide their malicious activities by stopping critical metric streams.
Potential Impact: Disruption in the flow of monitoring data, impacting the detection of anomalies and incidents.
cloudwatch:TagResource
, cloudwatch:UntagResource
An attacker would be able to add, modify, or remove tags from CloudWatch resources (currently only alarms and Contributor Insights rules). This could disrupting your organization's access control policies based on tags.
Potential Impact: Disruption of tag-based access control policies.
Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE) Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)