Ed. Note: We’ve put together this three-part series to discuss what you need to know about anomaly detection, the typical adoption cycle of analytics to DevOps monitoring, and how anomaly detection adds value to cloud monitoring for DevOps teams. This is part 1; Part 2 explores the three types of monitoring tools used by DevOps teams. Part 3 discusses how to fit anomaly detection into a DevOps workflow.
What is Anomaly Detection?
Anomaly detection is the process of identifying observations or patterns of observations in a data set that do not conform to expected behavior. “One of these things is not like the other” – sounds easy, right? Of course, when you’re working with tens of thousands of system and application metrics that change from minute to minute, the game becomes exponentially more difficult. At Metricly, we tend to characterize this as, “humanly impossible.”
Four Possible Outcomes of Anomaly Detection
When talking about anomaly detection, there are four specific types of results: True positives, true negatives, false positives, and false negatives. Here’s a quick reference chart with explanations below:
|True||You have a problem and get an alarm.||You don’t have a problem and don’t get an alarm.|
|False||You don’t have a problem but you do get an alarm.||You do have a problem and don’t get an alarm.|
This is the ideal scenario and exactly how anomaly detection is supposed to work. Unfortunately, it’s not always that simple.
Congrats! Your anomaly detection method wasn’t fooled into a false alarm – and you weren’t woken up at 3 a.m. for a problem that doesn’t exist.
This is sometimes called “crying wolf”. The alarms are false alarms. They waste time and undermine confidence in the monitoring system. This is bad, but not the worst outcome.
This is the worst. A problem is occurring that could lead to a serious outage and your team is blissfully ignorant because your monitoring system is “asleep at the switch.” Adding insult to injury, it’s often the case these missed alarms are caught by impacted users (or your boss!)
Luckily, plenty of tools exist to ensure you see every true positive, and aren’t bothered by false positives or false negatives.
Check out Part 2, which explores the three types of monitoring tools!
About Metricly Monitoring and Analytics
See how machine learning and anomaly detection impact your alarm quality and inform mission-critical decisions in dynamic environments. Metricly is available as a 21-day free trial.
The Elastic Block Storage (EBS) service provides storage space for your Elastic Cloud Compute account and viewing the EC2 DashboardTry Metricly Free