Embark is on a mission to move the entire trucking industry forward, by building self-driving technology to make highways safer and the transport of goods more efficient. With a unified approach to data and machine learning, Databricks is helping Embark’s vision come to life. They are now able to analyze dense sensor data to safely and swiftly iterate on the models that lend the trucks their autonomy. Once their mission is achieved, Embark’s trucks will be able to pay attention to their surroundings 24 hours a day without worry of getting tired or distracted, and ultimately move freight safely and more efficiently.
According to the US Department of Transportation, approximately 94% of vehicular accidents can be attributed to human error. Addressing this challenge began with capturing the world around the trucks, which translated to a massive amount of recorded HiDEF video, LiDAR, and RADAR data to train their models. Unsurprisingly, this amount of uniquely dense data presented a variety of technical challenges. For starters, Embark was working with in-house machines and individual laptops to access the data and run queries and analysis. But because of its scale, they were limited to the analysis of only 30 seconds of data at a time. When you have 37k hours of recorded data, analyzing 30-second clips just doesn’t work.
In addition to limited data access, the use of single laptops prevented the team from storing enough meaningful data in a single location, and transferring it to and from their office was slow and costly. Any small failure or change meant a ton of engineering time and multiple days of machine time just to get the results they needed. “We could only view maybe 30 seconds of data at a time. And that made it really difficult to make informed decisions,” said Jason Snell, Lead Software Engineer of Embark’s Developer Platform team.
Once Embark implemented Databricks, everything changed for the better. The unified approach to data analytics made ingestion and ETL easy to manage, and the distributed platform was able to scale to meet their data needs. Better yet, the Embark team went from single laptops with limited memory to hundreds of machines in the cloud, simplifying cluster, and ultimately allowing data scientists to actually focus on data rather than infrastructure.
Cross-team collaboration also received a boost with interactive Notebooks, which allowed the teams to write code and visualize all the results in one place, as well as easily share and collaborate with various teams (data engineers, product management, data, and data science).
“Databricks has allowed us to unlock over 35,000 hours of recorded data from our trucks,” said Jason. “Our engineers can essentially access this data whenever they want, in whatever size they want, with whatever resolution they want.”
From an analytical standpoint, Embark’s data analysts are able to generate dashboards — powered by Databricks — via Tableau to better understand how their software and motion sensors are performing.
Databricks has ultimately enabled Embark to achieve the massive scale they need for data processing and model training, all while remaining as conscious of safety as ever. Before they deploy a new model on the road, for example, the Embark team measures its performance offline and ensures that it meets their safety criteria. Prior to Databricks, every change would take days. Databricks has allowed Embark to scale this offline analysis to just minutes, powering significantly faster and safer iteration.
“Building a self-driving truck has never been done before,” added Jason. “It requires a huge amount of creativity, problem-solving, and intuition, and without Databricks we simply wouldn’t be able to do this.”
Building a self-driving truck has never been done before. It requires a huge amount of creativity, problem-solving, and intuition, and without Databricks we simply wouldn’t be able to do this.”
– Jason Snell, Lead Software Engineer of Embark’s Developer Platform team
Technical Talk at Spark + AI Summit EU 2019