Joao Da Silva

Lead Data Engineer, Avast

Lead Data Engineer at Avast and long time Scala developer with big focus on Big Data and Machine Learning pipelines.

Past sessions

At Avast we complete over 17 million phishing detections a day, providing crucial online protection for this type of attacks.

In this talk Joao Da Silva and Yury Kasimov will present the MATS stack for productionisation of Machine Learning and their journey into integrating model tracking, storage, cross-system orchestration and model deployments for a complete and modern machine learning pipeline.

One can integrate MATS stack into their existing ecosystem without disruption, no need to migrate to clean AWS all of a sudden.

MATS stack consists of adopting MLFLow, Airflow, Tensorflow and Spark to form a cross-system orchestrated ML pipeline into a standard set of well integrated tools which data scientists at Avast can adopt.

They will use Angler, an internal machine learning project for detecting phishing URLs to demonstrate how MATS stack was leveraged for this ML Pipeline, walking the audience through all stages of the Angler pipeline: data transformations and enrichments in Spark, training of models, experiment tracking and serving of the models. The pipeline is useful for fast and reproducible experiments and it allows a fast progression from research to production.

Speakers: Joao Da Silva and Yury Kasimov

Summit Europe 2019 AI on Spark for Malware Analysis and Anomalous Threat Detection

October 16, 2019 05:00 PM PT

At Avast, we believe everyone has the right to be safe. We are dedicated to creating a world that provides safety and privacy for all, not matter where you are, who you are, or how you connect. With over 1.5 billion attacks stopped and 30 million new executable files monthly, big data pipelines are crucial for the security of our customers. At Avast we are leveraging Apache Spark machine learning libraries and TensorflowOnSpark for a variety of tasks ranging from marketing and advertisement, through network security to malware detection.

This talk will cover our main cybersecurity usecases of Spark. After describing our cluster environment we will first demonstrate anomaly detection on time series of threats. Having thousands of types of attacks and malware, AI helps human analysts select and focus on most urgent or dire threats. We will walk through our setup for distributed training of deep neural networks with Tensorflow to deploying and monitoring of a streaming anomaly detection application with trained model.

Next we will show how we use Spark for analysis and clustering of malicious files and large scale experimentation to automatically process and handle changes in malware. In the end, we will give comparison to other tools we used for solving those problems.