Virgin Hyperloop One is the leader in realizing a Hyperloop mass transportation system (VHOMTS), which will bring the cities and people closer together than ever before while reducing pollution, emission of greenhouse gases, transit times, etc. To build a safe and user friendly Hyperloop, we need to answer key technical and business questions, including: – ‘What is the safe maximum speed the hyperloop can go?’ – ‘How many pods (the vehicles that carry people) do we need to fulfill a given demand?’ These questions need to be accurately answered to convince regulators, operators, and governments so that we may realize our ambitious goals within years instead of decades. To provide answers to those questions we’ve built a large-scale and configurable simulation framework, that takes a diverse set of configurations like route information, demand and population information, pod performance parameters. How do we reduce time-to-insight so we can iterate on Hyperloop models faster? We have developed a generic execution and analytics framework around our core system simulation to achieve key objectives of scale, concurrency and speed. In this presentation, we will discuss the design of this framework, challenges encountered, and how these challenges were addressed.
We will showcase the following points in detail:
– Hi, my name is Patryk and together with Sandhya, we work at Virgin Hyperloop One. We will present today how did we manage to scale our simulations and analytics for the purpose of Generative Hyperloop Design. So let’s jump into the agenda of our presentation today. We will first tell you what are we doing at Virgin Hyperloop One and what is our subgroup, Machine Intelligence and analytics doing. We will then show you an example of our simulation, and what are the design challenges of building a data pipeline for running hundreds of the simulation on the cloud. We’re gonna show you which were the tools we use short demos of these tools, followed by a demo of sample simulation run, with our pipeline. Then we’ll quickly jump into one of the conclusions that we discovered using the pipeline that is related to demand modeling.
So we first have Hyperloop One our startup, with around 250 employees in Downtown Los Angeles. And we are building new transportation system that is based on a vacuum tube and small passenger or cargo vehicles that we call pods. It offers you a direct emissions because we are using electromagnetic levitation propulsion, and because of that, we can provide very short travel times, but not only because of that, we are also on demand. So imagine ordering your Hyperloop ride using the web or phone app. We’ve run hundreds of tests at scale and have a 500 meter test track in Nevada desert and we are constantly planning and building new tests and enhancements. So please follow us to see how Hyperloop is becoming a reality and learn about the future of transportation we are building for you.
So with this new transportation system, people are asking of the questions about safety cost, what is the real travel time and many many other. And our group Machine Intelligence and Analytics, is doing operational research for Hyperloop. We are providing analytics products for engineers and business teams, so that they can show governments, partners and investors that Hyperloop is the solution they’re looking for. For example, we compare Hyperloop with high-speed rail airplanes and other modes of transport. We are answering all these questions using simulation and internal and external data. For example, we could answer the question, what is the optimal vehicle capacity for a given geography? And how many passengers can we realistically handle in a given scenario? And again, we answer all these questions using data.
So our sample data flow, would be to gather demand where people wanna travel for a given geography then fits route alignments in the land, using our 3D alignment optimizer and geospatial data. Then we schedule the trips for the passengers using the demand and Hyperloop alignments and run the simulation. After the simulation, we then compute performance and cost metrics and usually we discover something could be improved. For example, maybe we can save some more parameters. So over time, we run more and more simulation from simulations for more and more parameter sweeps and after a few weeks of running a (indistinct) bunch of answers to the questions asked. So, I mentioned running the simulation, we call our simulation software Sim Loop and it’s an agent-based high fidelity transportation system simulation that we developed in-house. It can simulate the Hyperloop alignment pod, which the vehicle, the physical behavior of the pod control systems interactions with the outside world as well as passengers and Hyperloop stations that we call portals. So let’s take a short tour in our simulation software, after which Sandhya will tell you more about the pipeline. (upbeat music) – Hi, I’m Sandhya. So we just now saw the Hyperloop simulation video, now I’ll walk you through the flow of the Hyperloop simulation. So we first start by capturing the demand and ridership patterns of existing role in consideration. And then we do ML models to predict the future demand. And with that demand data, we send it into a trip scheduler, which will then do the pod allocation and departure schedule for us. We also have a track optimizer which gets the possible optimized alignment for the routing configuration based on the geography cost. We have additional sweep parameters as well, like say, for example, allowing whether ridership is, I mean, whether we can ride in parallel, or a ride-sharing is allowed. And those all are fed into a simulation software, which then use the simulation for us, which we just saw. Those simulation results are then analyzed, to give us some metrics to answer (mumbles). So imagine doing this again and again for different set of parameters, different set of alignments and files. And we have to make sure that our analytics platform is able to handle this volume and is robust enough to run this again and again.
So why do we need to be flexible and why do we need to be fast? So we need to have accurate answers and we need to have them right now as soon as possible. So we need to have it flexible and fast. And we have data volume changes, occurring based on different alignments and different use cases (mumbles). And also we are going to get, we’re getting data from multiple engineering teams and different file formats and different processes, so we have multiple data sources. And also see we have to add new applications to our system, say a pricing model prediction, it has to be seamless enough. And being one of the kind technologies, and we have to keep up with the latest of all the technology and tools. We might need to migrate between certain platforms. So for all these reasons, we need our platform to be flexible and fast.
So this is not something new to set up such a robust and dynamic platform. So a lot of big companies like Google and Uber, have already shown us the way. Precisely, they like adopt the cloud technologies, they ensure there is micro service architecture for flexibility, they use distributed computing to be fast enough. They are also auto-scalable to meet the large demands and they use the Lambda/Kappa architecture to be dynamic as possible. In order to implement those architectural guidelines, we have a lot of challenges being a startup. So the first challenge is that we have resource constraints, specifically with development maintenance. We also have limited DevOps support in order to support such a platform. And in addition being one of a kind technology, we need to have security implementations in place.
And lastly, we need to have traceability and governance in our platform, so that the quality of our product is insured because we are directly leading the passengers. So this is our analytics platform tech stack. We have a couple of tools primarily open source tools, which help us with our tech stack. NiFi is our data flow manager, which helps us very much in place with the flexibility and microservice architecture. Parquet is a data format for columnar ends using it for huge demand and huge growing demand of data. Spark is a Compute Engine which is open source, but we are using the Databricks platform, so as to ease us with DevOps and other limitations which we had. And we use MLflow for our analysis. So in all these, in addition to all these, we also use other, like cloud based technologies like Docker, Amazon EC2 and S3 and Kubernetes as well. These all make a very flexible and nice stable platform for us.
So this is our overall architecture of analytics platform. So as you can see, on the left hand side, we have a couple of files which one has input, specifically say for the velocity profiles of the portal models and vehicle models, etcetera. The information and the relationship between these files are consisted on our database. And we also have an additional configuration application, which our users use, for primarily picking and choosing the different files for the simulation to run. Once they hit the Run the NiFi takes control, it does some transformation, cleanses the files and runs the simulations on our EC2 engines and (indistinct). And once the simulations are completed, the control comes back to NiFi and is then given to Spark for analysis job. Once analysis jobs are complete, the results are stored as experiments on MLflow. And we use MLflow to analyze the different batch run results and compare them and see what questions we have answered. And how much of it is like, like gives us the information and we have a report for it. So, this is (mumbles).
So, now, my colleague Patryk, he will explain about how we use Spark and Databricks in our platform. – So, let me tell you our Hyperloop data story. About a year ago, our data sizes started to grow from megabytes to gigabytes and our processing time started to go from minutes to hours. And our Python scripts especially Panda scripts, were running out of memory and so we decided we need a more enterprise and scalable approach to handling our data. We tried different solutions, and obviously we found that Spark and its family is de-facto standard solution and it’s great with so many connectors and tools around.
So we decided we are gonna have a bunch of Spark workers and the drivers and Hadoop File System and S3 mounted in the file system. This is already a big infrastructure. So we already knew we don’t have enough DevOps to manage that. And this is why we are using Databricks. But then we also realized that our previous code, was coded in Pandas and to use it with Spark, we would have to use either PySpark or Scala. And we, and particularly me, since I would have to do it weren’t particularly happy about that, since we would have to rewrite all thousands of lines of code in pandas and we weren’t experts in PySpark or Scala.
And this is where a Koalas package comes in, exactly when we needed more or less one one year ago around in April, Databricks open source Koalas package, that is doing exactly what we needed. So it’s translating our pandas API into PySpark so we can scale our pandas code very easily with the familiar pandas API.
So now let’s talk about another part of our system which is called MLflow. We use it to log, track and analyze our simulation runs and results.
So for those of you who don’t know, MLflow is an open source tool that allows you to log machine learning models, its parameters and metrics. So for example here using this UI, if you run a bunch of models and log them, you can see the parameters and metrics for all of them and then select the best model.
Then you can deploy it to production using MLflow projects
and models on the top right on the slide. But in this presentation, we are particularly interested in MLflow Tracking. So let’s talk about it more. We found that MLflow Tracking, serves its purpose very well also beyond machinery. Our simulation runs are also expecting a lot of input parameters that we can sweep and also have a lot of outputs that we score to create numerical metrics. Instead of developing our own solution to track (indistinct) and analyze simulation runs, we use MLflow Tracking as a generic experimental logging and visualization tool. We just treat every simulation as an experiment and log it as such. And we find it super convenient and very cost effective. We actually saved so much time by not having to develop the simulation tracking tool ourselves. And it integrates very well with external tools for API’s. For example, there is an API to query and export experiments to a panda’s data frame, that we use a lot. And we found out MLflow serves this purpose very well for tracking our simulation and our simulation actually contain a lot of AI algorithms. So instead of MLflow, we actually call it AI flow.
Now, I will let my colleague Sandhya explain NiFi, our data integration and pipeline tool.
– So Apache NiFi. NiFi is a powerful data flow manager. And for any data driven analytics, you need to have a very good data flow. So it is data agnostic. NiFi also supports transformations and data routing has 250 plus inbuilt components. It also has a MiNiFi and a NiFi registry extensible weather. And so for example, now say if we want to merge two JSON files, which have the same file pattern and perform transformations and put them in S3. So first you go and choose the get file component. You give the parameters the file and the pattern you wanna fake, and then you can use the merge file component, merge content component, to give the header and the footer information and how you wanna merge the two files. You can also… The connector between the two components, you can see that you can do prioritizations and buffering mechanisms. And in addition for each component, as you can see, we can also have scheduling mechanisms, like you can put the schedule in them and you can use the (indistinct) transformation component for putting the transformations and add S3 attributes and also use the put file component along with the AWS credentials as controller services. Now, if I wanna run, I can pick and choose and run whichever component I want to. So I have just run the first component here and I can see that the files which are picked up queue in the flow and I will be able to see the different flow content and the flow attributes which are related for that particular file. Once I confirm I can choose to run the remaining part of the flow as well. And after the flow has completed I will be able to see the provenance of the particular, whether the file has been put in S3 or not. And whether it was successful and what was the content of the file which we just put. So, this is the complete flow.
So, Apache NiFi is also good at Real-Time Command and Control. Say for example if you wanna set up the concurrency control, or for a particular component you wanna set up whether the task needs to be concurrent and how concurrent it needs to be and where the execution needs to be running on all nodes etcetera and the schedule needs to be arranged as well. You can also set up whether the queue prioritizations like whether it has to be first in first out and buffer and etcetera. In addition, you can also add control services for external connectors, like say Postgres or AWS and on top of all this NiFi also gives the power to set up good data security using SSO or SSL basically.
The most important feature of NiFi, is power of provenance. It gives inbold provenance and we need provenance and lineage for efficient traceability in our analytics platform. So in this screenshot, as you can see, it’s a screenshot of a portfolio components provenance event. And these are the details of the provenance event. And the right hand side, you can see a tree like icon. When you click it, you will be able to see the complete lineage of that particular provenance event. So each and every circle here, is a competence, provenance data. And once you click any one of the circles, you will be able to see the detailed description of what was that event, what was the flow file, its bite size, and what were the attributes of that particular flow file. And in addition, you’ll also be able to see the content like whether, like the input or output of that flow file, or connection, and you’ll also be able to replay the provenance event if you need to check if anything is going wrong. This helps with good traceability for us.
We just saw all the different components that are used as a part of our pipeline. So we would like to, like to show you a small demo of our complete pipeline. And we would first start by showing you the configurator application, which a user uses to pick and choose the files. Then we will show how the control is transferred to NiFi and then we’ll also show how it is simulation is executed in parallel and how we use MLflow to analyze the results.
So first we’ll start off with the configuration application, where the user pick and chooses the files. As you can see, this is our configurated window. You can use the add button to add particular batch and pick up the alignment for that particular branch. And based on the executables available, you can select each of executables, whichever you wanna run. And then when you hit real dependencey, you will be able to see the dependency between the executables and what are the files needed for. So when you accept that, you will be able to pick and choose the different files, whichever was available for that particular file sets. And as you can see, you can be able to choose the different files. You can also see in the tabs, on the top of the tabs some numbers coming up. Those are the numbers of the different scenarios which are going to be executed as a part of this file choices, which you are making. So if you choose different set of files, the combination means more scenarios. And as you can see, you can also sweep different parameters and the more parameters you choose, those also contribute to different combinations of scenarios which you wanted. So when you hit Run, you will be able to see the total number of scenarios calculated. And when you (indistinct) here again, you will see that the successful batch has been created. So once this batch has been created, the control then transfers to NiFi.
So now we will see how NiFi picks up the control seamlessly from the batch, after the batches created and how the flow progresses. This is a sample flow which we have. And once the NiFi control is passed over here, you would be able to see the data flowing through and you can see the provenance of the data which is coming in. You’ll be able to monitor and view the content of the data. For example, this is a view content. And you will also be able to replay the content in order to see whether if you have any bugs or anything. And as you can see here in this screen, there are multiple some simulations running on EC2, all sweeped off by NiFi. So once it is kicked off, you can also check what yielded to that particular execution , by checking in the details of that particular flow file content, the attributes and the content. And also the best part to see is the provenance lineage. You can see that, where the specific command came from and what all of the files which are needed for that particular command to execute and how the simulation is executed.
So this is how cloud flow basically.
So once the simulation is executed, and our analysis metrics job is running on Spark, all the results are saved on MLflow as experiments and Patryk will explain about it. – So thanks Sandhya. Now I’ll talk about MLflow, our metrics listing and analytics platform. So as you can see here, this is MLflow window, and every row is a simulation run. And for every simulation run, we have some metrics and parameters. And if we click on the row, we can drill down into a single simulation run. We can again, for the simulation as the parameters and metrics for that simulation. But the cool part is, for every simulation, we compute a bunch of artifacts like for this simulation, for example, I created the demand scatterplot. And it’s very interactive here in the window, you can explore, you can can interact with the interactive plotly plot, as well as I created here a histogram of the trip times. So for example, here, we can explore that between different cities that the trip times are different for passengers.
But then, what if you want to compare a bunch of experiments? So to compare a bunch of simulations, we select all of this, and click Compare. And again, here we have a big table where each column is a simulation run and we see parameters and metrics. But here, we also have a way of exploring that data. So if we select the sample parameter and sample metrics, we can see which simulation is the best and we can also see a contour plot like 3D plots.
So you can see two parameters in one matrix and here we can see one of these two simulations may be performed the best, so this will be our further exploration.
So I hope you enjoyed our pipeline run for our simulation. And now I would like to share one of the conclusions we drew using this pipeline. As I told you before, Hyperloop is on demand. And because it’s on demand, we found that we can provide increased efficiency and passenger convenience. But there is one particular challenge with the on demandingness of Hyperloop, that I would like to discuss and analyze with you.
So the the challenge is, we would like to predict areas of high or low demand so that we can redistribute our vehicles ahead of time, from the areas of low demand to the areas of high demand. Imagine hybrid network like in the picture on the bottom left. If there’s a high demand in the center, we would like to redistribute the vehicles so that they are able to pick up passengers right when they need to. And for that purpose, we feed the demand prediction model to our simulation. And we train these models using historical demand data we gathered, for our analysis and we trained multiple models using (indistinct) Keras LSTM and GRU and actually we trained it using a Horovod in Spark. And we also run ARIMA and Prophet models and swept the input parameters to these models using Spark UDF as distributed sweeps.
This is a rough example of one of our models and the blue line you can see, shows a ground truth for the demand for a given origin destination pair. And the model prediction is shown with a green line and the prediction history is shown with the red line. And this model is trying to predict hourly demand for three days in advance, for a single origin destination pair.
So as a conclusion from that analysis, the best model we fed into our simulations using the pipeline, was able to improve the number of required vehicles and costs by even up to 70%. And this is a really great finding. We also found that sometimes simpler models like ARIMA or Prophets, sometimes outperform Keras. And additionally to improve the models, we correlated the weather data and the surrounding events data from PredictHQ, with the transportation demand and we definitely found a correlation that provided a great improvement to our demand models. As a general conclusion, I hope you learned a lot about our hybrid project. And I hope you can see we are doing really cool stuff. And by the way, we are still hiring, so please check out our careers page. Our story of Databricks is an amazing partnership, but also a very lucky coincidence with Koalas. And that’s a great company, we love using their tools. We also showed you the design of our system that is running (mumbles) hundreds of data heavy experiments, and how did we achieve it with a minimal development effort. And we actually coded this platform with two to three people in six months. And here I’d like to give a shout out to our front end engineer Justin, he is great. He really helped us figure all this out. And it’s actually the integration between the tools that took the most time. And the tools that did the job for us were NiFi, Spark, Mlflow and Parquet.
If you want to learn more about Hyperloop is reinventing transportation, feel free to reach us on Twitter.
Virgin Hyperloop One
Patryk is a Data Engineer at Virgin Hyperloop One, a company building the 5th mode of transportation. He graduated from EPFL (Swiss Federal Polytechnique in Lausanne) with Information Technologies major. Previously, he was working at CERN, where he wrote test software for the world's biggest particle accelerator, as well as National Instruments and Samsung R&D. When he isn't glued to a computer screen, he spends time road-tripping California with his friends.
Virgin Hyperloop One
Sandhya Raghavan is a Senior Data Engineer at Virgin Hyperloop One, where she helps building the data analytics platform for the organization. She has 13 years of experience working with leading organizations to build scalable data architectures, integrating relational and big data technologies. She also has experience implementing large-scale, distributed machine learning algorithms. Sandhya holds a bachelor's degree in Computer Science from Anna University, India. When Sandhya is not building data pipelines, you can see her travel the world with her family or pedaling a bike.