Sarah Hantosi Albertsson

Development Engineer, Scania CV

Sarah works as a development engineer in the Connected Intelligence team at Scania’s R&D section. A team who create innovative and scalable data products to ensure actionable insights. Together with her team she finds the technical architecture and implementation to realise business opportunities. Whether that’s implementing big data pipelines, tweaking Kafka throughput, optimising algorithms utilising Spark or building data visualisations for logistics optimisation. With a degree in Cognitive Science she is at her best where she can combine her interest in people, service design and technology. Where cross-functional teams and working end-to-end is part of her daily routine.

Past sessions

At Scania we have around +400 000 connected vehicles, trucks and buses, roaming the Earth and continuously collecting and sending us GPS coordinates globally. To create value based on this data we need to build scalable data pipelines enabling actionable insights and applications for our company and our customers. Building scalable big data pipelines that achieve this require constant improvement work and transformation. We will bring a talk on how we've architected a continuous delivery Spark streaming pipeline that we iteratively improve on by refining our code, our algorithms and the data deliverables. The pipeline runs on our data lake using Spark Streaming, stateful and stateless, and the data products are being pushed to Apache Kafka. In the build pipeline we use Jenkins, Artifactory and Ansible to test, deploy and run. We will present the technical architecture of a pipeline and the major changes it have gone through, what triggered the changes, their respective implementation and adaptation and what it taught us.

Today we deploy new generations of our pipeline with just a mouse click, but it has been a journey getting here. We believe that people with general awareness of the challenges of big data, the possibilities of the streaming paradigm and the need for CI/CD will find this talk enlightening. Topics will highlight:

  • What each deployed generation of our pipeline have taught us about Spark Streaming and Kafka.
  • How we utilise the power of DevOps tools.
  • How we enable and ensure the delivery of quality assured data products at scale.