Nowadays, cybercriminals have stepped up their game, and they already use advanced techniques to penetrate organisation defences and steal critical data causing millions in losses. To be more precise, IBM released a study where they identify that the average total cost of a data breach is $3.62 million. Organisations’ cyber defence departments have to reinvent their defence mechanisms to keep up with the new threats and the regular SIEM systems in use are not enough. These systems were not designed to deal with the enormous volumes of data nor to make use of artificial intelligence.
These limitations converge into a significant problem for organisations’ last line of defence, the security analysts team. At Marionete, we have been helping one of our clients (German multinational company) to expand and enhance their current defences. The legacy systems thwart the cyber analysts’ efforts to identify threats in time. Hence we have designed and implemented a next-generation data analytics platform which leverages the current SIEM by extending its limited storage with a data lake and by strength the alarm system with AI capabilities. On this talk, we will explore the solution architecture, how we have leveraged Apache Spark for A) complex ETL jobs B) ML pipelines to detect cyber threats. Per day, the lake collects 6 TB of log data, thus decoupling storage from processing allow scaling each component independently.
With this architecture, security analysts can perform their forensic assays over years of historical data without the system compromises the query performance or availability. On the other hand, Apache Spark empowers our Data Scientist to train models on years of historical data but taking only a few training hours. As outcome, models deployed to production have reduced the vast amount of false positives in comparison to the SIEM alerting system.
Session hashtag: #SAISExp13
Carlos works as Lead Cloud Engineer at Marionete.co.uk. Certified as an AWS Solutions Architect, he is currently helping Siemens Cyber Defence Department to build a next generation analytics platform, leading the technical team, which consists of both DevOps engineers and Data Scientists. Previously, Carlos has worked at a Financial Institution that manages more than £20 billion of assets, helping them to design a data-driven strategy amongst other things. During his spare time, Carlos teaches post-graduates in Data Science and is a local speaker and co-organiser of the Lisbon Apache Spark Meetup group.