Jörg Schad is VP of Engineering and Machine Learning at ArangoDB. In a previous life, he has worked on or built machine learning pipelines in healthcare, distributed systems at Mesosphere, and in-memory databases. He received his Ph.D. for research around distributed databases and data analytics. He’s a frequent speaker at meetups, international conferences, and lecture halls.
May 26, 2021 12:05 PM PT
Many powerful Machine Learning algorithms are based on graphs, e.g., Page Rank (Pregel), Recommendation Engines (collaborative filtering), text summarization, and other NLP tasks. Also, the recent developments with Graph Neural Networks connect the worlds of Graphs and Machine Learning even further.
Considering data pre-processing and feature engineering which are both vital tasks in Machine Learning Pipelines extends this relationship across the entire ecosystem. In this session, we will investigate the entire range of Graphs and Machine Learning with many practical exercises.
October 26, 2016 05:00 PM PT
The current craze of Docker has everyone sticking their processes inside a container... but do you really understand cgroups and how they work? Do you understand the difference between CPU Sets and CPU Shares? Spark is a Scala application that lives inside a Java Runtime, do you understand the consequence of what impact the cgroup constraints have on the JRE? This talk starts with a deep understand of Java's memory management and GC characteristics and how JRE characteristics change based on core count. We will continue the talk looking at containers and how resource isolation works. The session will detail specifically the difference between CPU sets and CPU shares and memory management. The session will close with a deep understanding of the consequences of running the JRE in a CPU share environment and the potential for pseudo-random behavior of running in a heterogeneous datacenter.
October 25, 2017 05:00 PM PT
There are an ever increasing number of use cases, like online fraud detection, for which the response times of traditional batch processing are too slow. In order to be able to react to such events in close to real-time, you need to go beyond classical batch processing and utilize stream processing systems such as Apache Spark Streaming, Apache Flink, or Apache Storm. These systems, however, are not sufficient on their own. For an efficient and fault-tolerant setup, you also need a message queue and storage system. One common example for setting up a fast data pipeline is the SMACK stack. SMACK stands for Spark (Streaming) - the stream processing system Mesos - the cluster orchestrator Akka - the system for providing custom actors for reacting upon the analyses Cassandra - the storage system Kafka - the message queue Setting up this kind of pipeline in a scalable, efficient and fault-tolerant manner is not trivial. First, this workshop will discuss the different components in the SMACK stack. Then, participants will get hands-on experience in setting up and maintaining data pipelines.
Session hashtag: #EUeco1