Anomaly Detection at the Edge

In the wake of IoT becoming ubiquitous, there has been a large interest in the industry to develop novel techniques for anomaly detection at the Edge. Example applications include, but not limited to, smart cities/grids of sensors, industrial process control in manufacturing, smart home, wearables, connected vehicles, agriculture (sensing for soil moisture and nutrients). What makes anomaly detection at the Edge different? The following constraints be it due to the sensors or the applications necessitate the need for the development of new algorithms for AD.

  • Very low power and low compute/memory resources
  • High data volume making centralized AD infeasible owing to the communication overhead
  • Need for low latency to drive fast action taking

Guaranteeing privacy In this talk we shall throw light on the above in detail. Subsequently, we shall walk through the algorithm design process for anomaly detection at the Edge. Specifically, we shall dive into the need to build small models/ensembles owing to limited memory on the sensors. Further, how to training data in an online fashion as long term historical data is not available due to limited storage. Given the need for data compression to contain the communication overhead, can one carry out anomaly detection on compressed data? We shall throw light on building of small models, sequential and one-shot learning algorithms, compressing the data with the models and limiting the communication to only the data corresponding to the anomalies and model description. We shall illustrate the above with concrete examples from the wild!

Watch more Spark + AI sessions here
or
Try Databricks for free

Video Transcript

– Hi everyone, welcome to our talk Anomaly Detection at the Edge. Arun Kejariwal with my colleague, Ira Cohen. We have known each other for several years, we have worked in the space of anomaly detection in different capacities. In my previous life, I’ve worked on anomaly detection in the context of marketing, in the context of infrastructure. On the other hand, I co founded a company called Anodot, which has built a great product around this topic. He will talk more about the company and the product, the customers later in the talk.

If you have any questions, please feel to reach out to us via twitter, our twitter handles are there on the slide deck. In general, feel free to ask questions after the talk via chat, we’ll be more than happy to answer any questions. Without any further ado, let’s dive straight in.

Cloud computing has enabled scale innovation connection over the last decade or so, going forward edge computing is supposed to be the next big thing where it will complement cloud computing by providing more real time value, more immersive experiences, via more data production and intelligence at the front end, where the people are where the things are.

In fact, IDC forecasts that there will be around 46 billion devices, generating around 80 zettabytes of data in the next five years. Now, you may wonder that what will make edge computing successful. So there are new technologies like 5G, which are coming up very soon. On the application side Internet of Things, which comprises broadly speaking of sensors, and actuators is wrapping up pretty fast. Now as we all know that during the COVID pandemic, we are all stuck in our homes. So what if there is a network outage, or there is a machine downtime? In such scenarios it’s critical to be able to still provide value to the customer. Hence, in such scenarios, edge computing plays a very vital role.

Now, Spark Summit is a industry centered forum.

So one may wonder what is the business opportunity behind edge computing? So by certain estimates from Gartner and McKinsey, it is expected to have a market share of around 9 billion in the next four years. And from a use case perspective, there are plenty of them like real time automated decision making. And the criticality of edge computing stems from the fact that we need to be able to extract insights at the edge, or into by increasing data volume. And I’m going to talk more about it in the subsequent slides. And, of course, there is an element of efficiency because as much as insights may be helpful, but cost is also a very practical matter. So on there, so broadly speaking, there are four driving factors guiding the growth of edge computing. One is how do we minimize the latency behind insight extraction? Then how do we address the data growth and the bandwidth limitations? And as I already mentioned that system should work even in the presence of connection issues, whether there is a machine failure. And last but not the least, and how do we guarantee or provide high level of privacy and security to the end customer.

So broadly speaking, there are several aspects associated with edge computing. One is on the compute side as the data volume grows, it’s not tenable to keep growing the data centers because they are very power hungry and very expensive. So over the last few years, federated learning has emerged as a very promising paradigm where the computation is done on the edge itself. Now, to make it possible, we need high density ultra low read, write latency on the devices itself. Because the data volume keeps increasing as you can imagine that video consumption is on a steep rise. Now if one were to provide video analytics on the edge, this will definitely require high density storage.

Moving on, as I briefly talked about earlier, data security is also a critical challenge. Now in the context of S devices, given their limited compute capability, the challenges that how do we provide advanced authentication and encryption support given that limited compute capability. Last but not least, on the dependability side of things, ensuring continuous operation, even in the wake of network outage is going to be a big challenge.

And it’s not only about network outage. In general, you can imagine that users may be on 5G, they may be on 4G on 3G, the network connectivity is not uniform across the globe. So how do we ensure that we provide good experience to the end customer? Now, unlike most of the talks, where the core idea is presented without any insight into the application here, both Ira and I we will walk the audience through what are the different use cases so that the audience can walk away with the core techniques and explore how they can apply in their respective domains. So in the next few slides, I’ll walk through some of the use cases in the context of edge computing. So as illustrated on the slide, there are several domains where as computing can play a vital role. For example, autonomous vehicles, it can increase situational awareness, which essentially corresponds to sharing data around weather, about road conditions, about peer traffic. So this can improve the driving experience and also help mitigate the number of accidents.

Similarly, on the manufacturing side, it can help with providing insights around predictive maintenance of turbines of motors and other heavy equipment.

Likewise, there are plenty of applications in the realm of telecommunications, retail, how do you do hyper targeting and more in the real time context. Let’s say you’re in downtown and there is a concert going on, can you be recommended any discounted tickets, given your location in real time.

And, of course, on a more personal basis, healthcare and life sciences presents a tremendous opportunity for edge computing. And given the world we live in today where we have all been locked in in our homes due to COVID-19, this is a great example of where edge computing can help, how do we do remote sensing to identify clusters of people who may be asymptomatic, or they may be exhibiting certain symptoms. Likewise how do we monitor the onset of the disease? How do we detect anomalies in any lab test reports? So here on the bottom right, I show one of the example anomaly reports from Anodot where they applied their product to detect a sharp increase in the number of new cases for one of the countries.

Continuing on the different use cases in different industries we have energy and utilities. Agriculture is a tremendous opportunity. Today around the globe, a large percentage of the production gets wasted because of lack of insights around crop quality, about insects, about water availability and so on and so forth. For the last decade and a half data centers have grown by leaps and bounds. They constitute a significant percentage of the energy consumption around the globe. So how can we make these data centers more efficient? Can we monitor metrics like humidity, airflow, and other data center metrics to improve their efficiency? In the realm of finance, we all know that most of the payments are done digitally today, especially in the COVID-19 world. So how do we make ourselves more robust against any potential fraud?

We all may be making payments through our phones so can we have some sort of a payment bot on our phones where it can detect any potential fraudulent activity?

Broadly speaking, the use cases can be categorized into three buckets, more on the business side, on the people side and that what we know as quality Internet of Things. Now either you can have interaction between self-interaction. For example, a business can have a self-interaction in the form of distributed business processing, or different domains can talk to each other, or interact with each other. For example, businesses can have a more immersive experience with people via online content delivery. So how do we personalize that content on the edge itself? Likewise, people may interact with things, how do we provide more immersive experience? And in that context, one of the key aspects is around AR, VR. It’s still a way out before it gets to the masses. But how do you personalize that experience on the device itself?

Moving on, there’s a plethora of use cases ranging from video analytics from security productivity. I just talked about virtual reality. Like in a multiplayer game, how do we tailor the game based on how the different players are playing the game. So it’s more dynamic instead of being static. In the augmented reality realm, one can customize the shopping experience on the device itself, and this will ensure that we are not limited to any connectivity issues, which the end user may experience.

Just to quickly wrap up the suite of use cases, more closer to home, smart homes provide a great opportunity. How can we build intelligence on smart meters? How do we increase the energy efficiency of our homes? Smart transportation, data reporting, environmental monitoring presence, another few use cases. And again this is more on the personal side, like how do we track air quality, water quality? For instance during the COVID-19 pandemic, there was certain anxieties around, “Hey, is the water drinkable?” Has that got contaminated? So, conceivably, we can sprinkle sensors in the water reservoir and we can monitor the water quality in real time at the edge itself.

Again, more closer to us, privacy is a big deal.

So can we support ideas like differential privacy on the device itself so that while providing customized experience to the end user, the privacy is not compromised. Automation in the realm of the industries, how do we monitor the performance of robots, of drones? We talked about health and safety earlier. And then last but not least conversational interfaces. Many of us have gotten used to using the CD, or Google or Cortana. How do we tailor these digital assistants by the respective use cases? So my use case can be different from Ira’s use case? Can we optimize the digital assistant on a per user basis on their device itself so that there is no information leakage from one device to another.

And more at a broader scale, we have plenty of opportunities in the realm of smart cities around logistics, insurance is another big opportunity, my driving style and the number of miles that I drive may significantly vary from how much Ira drives out there in Israel. So can we tailor the insurance policy based on our different driving styles so that we pay accordingly? And even in infrastructure context, like railways, can you monitor the condition of the different railway, cars and then provide insights around predictive maintenance.

So far we have talked about more on the software side. So there are plenty of opportunities on the hardware side as well for the use cases as per McKinsey is around a $200 billion market. Now, what is interesting here is that, I would have expected that public sector, or health care would have been at the top two of the potential market, but it turns out based on the research, travel and transport is at the top. Now, this is sort of understandable because in the context of healthcare, we have regulations such as HIPAA compliance in the public sector and utilities context. And there are challenges around regulation so this will take a while to grow, but we do expect these two segments to provide major opportunities for edge computing.

So switching gears to more artificial intelligence, there is AI and the edge have come in. So broadly speaking, there are three flavors either you can have visual response, you can have an auditory response, or you can have a tactile response. So here one of the use cases is around…

So as I mentioned one of the use cases is around facial recognition, which can be used for authentication and similarly there are other applications around vision, around AR, VR language translation is big.

So document language translation there are different flavors you can have text to speech, which is commonly used, especially like for reading emails when you’re driving and there can be other flavors where a text is converted into acoustic features, then it’s converted into waveforms to analyze the text itself. There are several challenges in this regard like the model you develop for text to speech, for instance, has to be really small because the memory you have on your devices is pretty limited. Also as I mentioned early in the talk, one of the key aspects is that how do you provide insights in real time. So in that context, one has to make a space time, trade off speed, accuracy, trade off and there are other challenges as mentioned on this slide like you need to have a small… Devices have a small form factor, they have to withstand rugged environments. So these does have implications from an algo design perspective. So jumping right into the AI realm broadly speaking you have either training workloads, or you have inference workloads, training workloads are typically very compute heavy. So one has to design robust algorithms, which are not compute and memory intensive. On the inference side, there can be many metrics for prediction like velocity, orientation trajectory activity, there are many signals which one can use either from accelerometer, gyroscope and so on and so forth.

But main premise here is that the availability of data is huge. The number of metrics you want to predict is also pretty huge. So how do we facilitate this? So federated learning, as I mentioned earlier is one paradigm where we federate the incident training across different devices to make this happen.

So on the federated learning side, we have is wide suite of applications like mobile keyboard, vocal classifiers and so on and so forth. The challenges as I mentioned earlier, is going to limited compute and memory on the devices. Convergence time can be a potential issue and then if you are leveraging federated learning and communication between devices can be a challenge.

So given the challenges to quickly brush through we have one way to design algorithms is that they need to be one pass so that they are fast.

The other flavor is that the algorithms have to be incremental in nature so that we can meet the real time constraints.

So in the incremental context, there may be challenges around numerical stability, how do you build incremental algorithms when you have a wide set of signals. So then essentially you have data in a high dimensional space so that may be a big challenge in itself.

So far we have talked about the use cases about the applications. Now everything rests on the data, the incoming data being of high fidelity, otherwise, as most of you nowadays may have heard, it may be a case of garbage in garbage out. Now, data fidelity is especially a big challenge in the context of S devices, because, as I mentioned, these devices have to withstand rugged environments, there may be connectivity issues. So you may bump into issues like missing data, or you can have anomalies in the data. And that’s where anomaly detection comes into play in the context of edge computing. Now you may wonder that, “Hey, what’s new in this?” Anomaly detection has been studied for over 125 years.

And here this slide essentially highlights some of the techniques which have flavors of techniques, which have been used in wide variety of contexts outside of edge computing. Now, these techniques are typically not viable because of concept drift, the underlying distribution may change because of you’re connected, your connectivity dropping in real time, you do a variety of data, you can have communication bottleneck between devices. And of course, most importantly, the algorithms which have been proposed over the last hundred years didn’t have any real time constraints in mind. So one of the common ways to develop these techniques is called sketching. So there are different flavors of sketching depending on the application, some of these are listed here on the slide. So Ira will walk you through some of these techniques in a couple of minutes. So in the context of privacy and security, there are techniques which have been proposed more recently to provide tamper proof resistance. So tampering can be thought over the anomaly in the data. So that’s another use case for anomaly detection at the edge. So these are some of the algorithms, which have been proposed specifically for anomaly detection at this. These are brought essentially variants of techniques, which have been in use for a while, but they’ve been sort of slice and dice to make it fit on the devices because of low memory and how to make them run faster on the devices due to low compute. So now I will hand over to Ira as to walk us through some concrete use cases based on his interaction with the customers of Anodot. – So thank you Arun.

To introduce myself again, I’m Ira Cohen. I’m the chief data scientist of Anodot and we’re a company that developed a product that does anomaly detection, both as a service and also potentially at the edge. So I’m gonna talk about a few of these use cases right now.

Anodot’s Anomaly Detection Steps

So first, I’m very happy to be at this virtual conference. First, I didn’t have to travel far, I didn’t have to shave. But I have to wear my reading glasses in order to see my laptop screen. Because if I look at the big screen, it looks like I’m looking somewhere else. So I hope it’s not weird for you listening to this talk. So let me talk about Anadot’s anomaly detection, and how do we do anomaly detection. And we do it basically in a sequential manner, not just because of edge anomaly detection cases, but even when you wanna scale, non-edge case to millions and billions of time series, doing it using non sequential algorithms is very expensive computationally, and then the benefit of the anomalies is smaller than the cost of running that platform. So you have to weigh those in as well. The way we do it is we collect the data continuously as a streaming service and analyze all the data on the stream itself to detect, to learn the normal patterns. And then, based on those normal patterns, we can detect anomalies whenever they arrive. These models of learning normal behaviors have to keep getting updated sequentially. So any algorithm that you take, you have to adapt it to be sequential if it’s not designed that way from the beginning, otherwise, it’s very hard to scale this up. Now, another important point that is both important at the edge but even more critical at the non edge cases, you want to correlate the anomaly. So suppose you have a host of sensors, and you can detect there are anomalies on the edge that can be done very fast. You still wanna be able to correlate them into a concise story to tell you all the anomalies that occur that are related to each other, because those correlations often lead to an actionable insight. And then the algorithms can get feedback if it’s available and improve themselves.

So this is how it looks like sequential updating of the models and learning one particular time series as time goes on. In this case, you see on the left, when the time series you have very few samples, your normal pattern is very wide, or the baseline is very wide, which means the pattern is not lined well yet. But as time goes on, the algorithm should adapt themselves and get more and more information about the time series. And in this case, it first detects that there is a daily pattern and then applies it to the model, and then it detects that there’s a weekly pattern. And then applies it to the model. And at the end, you get a very tight baseline, which is the output of that machine learning model that learns the normal behavior. So visual output of it.

And the thing is, and especially when we’re talking about sensors from multiple source, and you can get a lot of different behaviors of time series. So there isn’t really a single model that fits them all. And Arun listed at several different models. And I can tell you, some of them work on some data, some of them work better on other data. We have found that the right approaches and assembling of models fitting the right model to the right behavior of the data or assembling multiple models together in some cases. And these are real examples of both time series and anomalies that are system detected, which actually have very different behaviors and require different algorithms to learn them on returns.

Now, the correlation helps in two regards. Firstly, it helps understand what happened. And then where it happened because you might have a lot of sensors firing various anomalies and the correlation themselves assuming that some of them are not just symptoms but are also indicative of root causes, they can help understand the root cause much faster. And that’s really another aspect that is very important, which is typically more expensive computationally, which means if you do it at the edge, you actually do need to make your algorithms much faster. Now, anomaly detection by itself is meaningless unless you inform somebody that the anomaly happened. And once you start informing the alerts, the notion of false positives becomes very critical to minimize and a lot of the minimization is done, first of all by applying the right algorithm to the data so it can capture as accurately as possible as normal patterns, but that we have found that by itself is not enough because not every anomaly is created equal, and not every anomaly is interesting as another. So we baked into our products and (indistinct) these kind of things have to be baked in the any type of product that does alerting various filtering mechanisms that allow the user or the system to automatically discard some anomalies as not interesting and send only the right alerts. So it starts from scoring anomalies, giving various parameters and optimized the duration of the anomaly to the delta, allowing the user to see simulations of past anomalies, so they can do things with the filter to filter based on that and add influencing factors that help you filter anomalies, or decide whether they’re important or not. And that lets you add context and correlation. And at the end of the day, if the users can provide us a feedback for alerts in the form of this was good or bad, it can loop back and in semi supervised fashion improve the naturally unsupervised algorithms that are used in anomaly detection.

So with all of these basically, we have a product that is being widely used by a lot of different companies in various different industries and use cases.

Who is using it?

And the reason they can use it is because we really baked into the product a lot of different algorithms for capturing many types of behaviors, and the focus on false positives.

Why are they using it?

So this is NICE. These are the main use cases that people uses today for revenue and cost monitoring, partner monitoring, customer experience monitoring. But what does this have to do with the edge you must be asking yourself.

But what about the edge?

So let’s talk about edge use cases as well. And I think in the context of COVID-19, it’s actually is pushing it quite fast along in a variety of way.

The first way is monitoring health, monitoring health. So remote monitoring of health is not new. It’s been talked about for a long time. In this day and age COVID-19, it’s actually pushing it quite significantly. And here you have watches that can measure a lot of things about your health, and detect whether something becomes anomalous either a sleep pattern or blood oxygen, or anything like that, whether you’re sick or not. And especially if you are a carrier of COVID-19, you might not wanna go out to anywhere, and you might want your device to inform that your doctor that they should call you to see whether you need to go to the hospital or not. In the hospitals themselves, COVID-19 actually presented a new type of issue, especially in ICUs, where it’s dangerous for the doctors and nurses to be present in close proximity to the patients that they have to take care of. Normal ICU, you have a lot of staff for each patient depends on the geography but typically at least one staff member for a patient, one or two patients. In COVID-19, first, some hospitals in some regions saw an overload of patients in ICU. And the doctors, the nurses should their time with inside the ICU next to the patient should be minimized to only the cases where they’re absolutely needed. So that lends itself quite nicely to edge cases. And I’m gonna talk about monitoring patients actually get the volume, which is the case of a pandemic. So you wanna monitor patients at home, you wanna monitor them in a hospital on the ventilator, and whether or not and the problem is the existing techniques in health care for alerting on deterioration of patients is actually quite lacking. It’s well known was known even before, but here it actually exploded a lot more. For example blood oxygen, if you’re ventilated, if it goes below 90%, all alarm bells will go off. But when it reaches that, you might have missed the deterioration that happened for a few hours, or days from the patient.

And at that point, might be already be too late. Now, the other side of it is that a lot of these static thresholds built into the devices are generating many alerts, we see it in the IT space, anybody working in IT systems knows they get a lot of false positives. It turns out in ICUs and in healthcare it’s the same, because while specific thresholds and specific parameters might be correct, they oftentimes lead to too many alerts and alerts fatigue for patients. The requirement is remote monitoring early warning score, to identify the deteriorating conditions and minimizing false positives. So just like the business use case. So let’s look at some of these examples. So this is an example of a real of data from a watch. And it’s looking at the respiratory rate, which is important in the COVID case being shown that deviations and respiratory rates actually indicated deterioration of disease sometimes. So if the problem is each person has their own respiratory rate that is normal and might even change throughout the day and at night. So you have to adjust the model to the data of that particular person. You cannot assume some static thresholds here. So this is an example of an anomaly of respiratory rates that went down for one person being monitored.

Now, this is an example from an ICU patient looking at and here the correlation comes into play looking at multiple parameters being monitored through the monitoring of an ICU patient, sending it looking at it device itself finding the anomalies and sending that information to the NOC or the patient (indistinct).

A lot of hospitals just created an outside room, or they can observe these vitals but also get alerted. But again, when there is a lot of patients, the alerts are really critical because there isn’t an eyeball looking at every single patient and their health and their health situation right now. And a lot of times they miss deterioration, as we see in this anomaly here.

Early Warning Score

Now showing what we learned is that showing graphs to doctors and nurses like that, oftentimes, they don’t know how to interpret them, looking at graphs over time.

The right way to show is to give some score, and there’s been a lot of work, especially in the last few months, trying to provide an early warning score for the duration of ICU patient. And these scores that you see here is actually for a patient over 36 hour period, where the score goes up to anything about seven above seven is already critical. And what the doctor can see at any given time is what is the score of the patient from zero to any number above seven based on all the anomalies that were detected in the vital signs of that patient.

So what are the benefits early detection so you improve outcomes, the system is constantly monitoring, you can scale it so you reduce the load on the medical staff by using this type of autonomous monitoring or machine learning based approach, and you reduce the risk to the staff. You actually when you monitor people at home, you also reduce the risk of them going out and infecting other people. So this is quite beneficial and working (indistinct). So yes with this, we’ve come to the end of the talk. And we’ve gone a long way. And the slide here writes from the engines to edge computing and being able to actually run these machine learning models on edge computing. And we see the benefits of these edge computing use cases. And as I demonstrated here, in the notion of a pandemic, these benefits become even more important for keeping everybody safe. So thank you very much. And I think now we can take questions.

Watch more Spark + AI sessions here
or
Try Databricks for free
« back
Arun Kejariwal
About Arun Kejariwal

Independent

Until recently, Arun was he was a statistical learning principal at Machine Zone (MZ), where he led a team of top-tier researchers and worked on research and development of novel techniques for install-and-click fraud detection and assessing the efficacy of TV campaigns and optimization of marketing campaigns, and his team built novel methods for bot detection, intrusion detection, and real-time anomaly detection; and he developed and open-sourced techniques for anomaly detection and breakout detection at Twitter. His research includes the development of practical and statistically rigorous techniques and methodologies to deliver high performance, availability.

About Ira Cohen

Anodot

Ira Cohen is a cofounder and chief data scientist at Anodot, where he's responsible for developing and inventing the company's real-time multivariate anomaly detection algorithms that work with millions of time series signals. He holds a PhD in machine learning from the University of Illinois at Urbana-Champaign and has over 12 years of industry experience.