customer story
Designing the 5th mode of transportation with data

Virgin Hyperloop Track

INDUSTRY: Transportation

SOLUTION: Autonomous transportation safety and efficiency

TECHNICAL USE CASE: Data ingest & ETL, Machine learning

Virgin Hyperloop is pushing the boundaries of rapid mass transportation systems with the goal of making high-speed travel more accessible than ever before, while reducing pollution, emission of greenhouse gases, and transit times. To start, they needed to answer key technical and business questions around safety, cost, operational routes, and passenger demand. These questions need to be accurately answered to get buy-in from regulators, operators, and governments. With a start-up sized data team and massive volumes of data, the task seemed impossible to achieve, but once they employed Databricks, they were able to massively reduce time-to-insight and iterate faster to deliver the promise of ultra-fast hyperloops.

Bringing a mode of transportation to life before building it

The folks at Virgin Hyperloop are working on making the ultimate transportation dream a reality: moving passengers and cargo at airline speeds, but on the ground and at a fraction of the cost. Introduced conceptually back in 2012 by Tesla and SpaceX, a successful Hyperloop system is destined to rule the future. It’s highly energy efficient in its usage of electromagnetic motors for levitation and propulsion, and using battery power such that there will be no actual direct emissions into the environment. The data team is currently acting as the backbone that’s driving this revolutionary project forward, with their eyes on a better, healthier future.

In order to one day build a commercially viable system, Virgin Hyperloop’s data teams must start by collecting and analyzing a large, diverse quantity of data, including Devloop Test Track runs, numerous test rigs, and various simulation, infrastructure and socio economic data to ensure it meets safety standards while optimizing for daily operational costs. Essentially, they’re showing how the Hyperloop system will perform even without building any hardware.

“The data team is crucial in driving forth and supporting success in product development, project development, safety, and certification,” said Jerome Wei, Sr. Director Machine Intelligence & Analytics . “It fuels our ability to transform our insights and results into visualizations and data narratives that help decision makers, executives, our board, and prospective customers understand what we bring to the table.”

As one can imagine, the amount of data needed at this stage is not insignificant. When the volume increased from megabytes to gigabytes, and their processing time started to go from minutes to hours, the teams started losing steam. So the decision was made to take a more enterprise and scalable approach to handling data. The catch? As is the case for most startups, Virgin Hyperloop had to take resource constraints into consideration, specifically with development maintenance, as well as general support.
Virgin Hyperloop could have created an entire on-premises system to handle its substantial processing needs, Wei said. However, an on-premises system would not only take up a lot of space, but would also remain underutilized most of the time which creates unnecessary cost overruns.

Enabling next-generation transportation with Databricks

Virgin Hyperloop ultimately decided to use Databricks for its big data processing needs and ability to support analytics and ML workloads. And with the release of Koalas, an open source tool providing data scientists using pandas with a way to scale their existing big data workloads, Virgin Hyperloop quickly scaled its pandas code with very few code changes, reducing its data processing time by as much as 95%. Meanwhile, MLflow was employed to track experiments and assess outputs of its simulation runs to ensure optimal safety and demand forecasting.

The Hyperloop data team uses Databricks to run scenarios that help predict passenger demand — for example, by hour or day, for particular origins and destinations. Through these simulations, they were able to show how Hyperloop would be operationally more cost-effective than other modes of mass transportation by leveraging passenger demand data to reduce the number of operating vehicles based on the aforementioned variables by 70%.

“Instead of developing our own solution to track and analyze simulation runs, we use MLflow Tracking as a generic experimental logging and visualization tool,” explained Patryk Oleniuk, Data Engineer. “We just treat every simulation as an experiment and log it as such. And we find it super convenient and very cost effective. We actually saved so much time by not having to develop the simulation tracking tool ourselves.”

In addition to saving Virgin Hyperloop precious time and money, Databricks has also allowed them to work closer together as a team. Notebooks provide an interactive space where data scientists, analysts, and engineers alike can easily access data, quickly generate computations, perform them at scale, and then share them with each other as well as the results. Team productivity has seen a significant boost as a result of the improved collaboration.

“As a small team, we don’t have the bandwidth or resources to be able to manage and develop and integrate an analytics platform. Databricks has allowed us to really leapfrog past those roadblocks and deliver value to our executives and our shareholders in a timeline that would not be possible otherwise,” Wei adds.

Faster, more efficient, and cost-effective everything

The Virgin Hyperloop team is acutely aware of how important their work is. It’s more than a nice-to-have; an emissions-free mode of rapid transportation is necessary for the survival of our planet, making the Databricks partnership and its ability to speed up operations mission-critical. In fact, the Hyperloop data team was able to deliver their platform with just three people in a matter of six months, and today they run hundreds of data-heavy experiments at a reduced processing time of 95% which not only accelerates innovation, but greatly reduces overall infrastructure costs. This meant they were able to increase speed to market for new analytics projects from what used to take up to 6 months to only a few days.

“Databricks is an integral part of our team’s work because it has made it possible for us to run and analyze thousands of runs with such a quick turnaround time, and easily pull valuable data insights,” said Sandhya Raghavan, senior data engineer at Virgin Hyperloop.

They’ve also saved precious time by not having to develop any of their simulation tracking tools, and are, as a result, well on their way to realizing their ambitious goals within years instead of decades.

  • 70%
    Projected cost reduction in Hyperloop operations
  • 95%
    Reduction in data processing time
  • 10X
    Reduction in execution time; from hours to minutes


Meet the great data team that’s behind Virgin Hyperloop



Databricks has allowed our small data team to innovate quickly, and deliver insights and proof points that help turn Hyperloop skeptics into believers.”

– Jerome Wei, senior director of machine 
intelligence and analytics, Virgin Hyperloop