Skip to main content
Company Blog

iPass and Databricks

iPass is the world’s largest Wi-Fi provider, yet we don’t own a single hotspot. You can think of us as the Uber of Wi-Fi. And though it may sound simple, it is actually quite complicated. That is because integrating more than 160 commercial Wi-Fi providers from around the world into a single network is difficult. Not only do hotspots require constant monitoring to ensure a consistent user experience, but as a technology, Wi-Fi is also more fragile than most people think. You can expect a consistent user experience from your home Wi-Fi network, but commercial-grade hotspots do not share the same characteristics.

Signal strength at home is usually good. Bandwidth is in the tens of megabits, with only a handful of devices usually attaching to the home network. Single access point architecture prevails, and there is no provisioning requirement beyond the access key.

Public and commercial hotspots are usually farther away from the user, their signal strength is worse, bandwidth more limited and shared across many devices. Often, the hotspots themselves break and need to be replaced, or at least regularly maintained. Add community hotspots to the mix, and this Wi-Fi ecosystem becomes fairly unmanageable.

Although Wi-Fi can be unpredictable, here at iPass, we had a big idea. What if we could monitor every single, solitary hotspot in the world? Measuring speed, availability, performance and location so that we could recommend which hotspot a device should connect to and which one it should not. I'm saying a device, not a user. Think about cellular networks for a second; you as the user do not decide which cell tower to attach to, the device does. In terms of the user experience, it just happens.

Luckily for us, our years of connectivity experience and our own cutting-edge technology came together. iPass industry knowledge and experience met “big data” and our Wi-Fi service platform, iPass SmartConnect™, was born.

With iPass SmartConnect, we aimed to keep our users always best connected. Not only to Wi-Fi but to the best available network, Wi-Fi or cellular. That was the company’s founding mission, after all. Our backend measures and analyses connectivity and quality data to ensure a device is connected to the best network every time. Our iPass SmartConnect SDK and iPass SmartConnect backend can make the same recommendation for IoT devices, ensuring they are also always best connected.

First off, we needed to group and categorize individual access points. For example, hotels can have many access points that share common infrastructure. So, instead of learning about every hotspot separately, we wanted to group them into logical “venues” and monitor them as an entity.

Easier said than done, though, especially if you don't actually know where the venue is or how many hotspots there are at any given location.

Moreover, from experience, we knew that many connection failures are the result of a user simply walking away from an access point. Here in the Bay Area, for instance, you’ll find access points at virtually every traffic light. Once you connect to one of them, your device will remember it and try to connect you every time you are nearby.

So here you are, driving along, listening to Pandora or talking on Skype, while your phone decides to jump onto a hotspot. But in the meantime, you pull away. Wi-Fi connection will fail and cellular interface will need to renegotiate data connection.

Nowadays, your device knows when you’re on the move, as your device has an accelerometer. But your device can’t tell whether the hotspot is moving as well. If iPass SmartConnect can detect motion patterns and tag hotspots as moving, we can recommend hotspots that are moving with you, like hotspots on a plane or train.

That was our plan, technically challenging and innovative. And in early 2016, iPass asked me to relocate to headquarters in Redwood Shores, California to lead this exciting project.

We had a ton of technical debt and the company was not architected with big data in mind. We were a hardcore RDBMS shop.

The team started small, just an engineer and myself. So in the spirit of a true startup, we rolled up our sleeves and got to work.

We had some previous experience playing with Kafka, Storm, Hadoop and Hbase, but being a team of two, we did not have the time to build an in-house big data platform.

It was obvious to us that in order to build and grow our platform, we had to move out of our self-managed data center and into the cloud. So in order to allow for rapid development and on-demand scaling, I started vetting cloud big data vendors.

At that time, iPass was already using AWS for newly released products and services.
Being small and smart, we did not see a lot of value in managing two “clouds.”

Platform-wise, one solution stood out above all the others: Apache Spark. It can batch, it can stream, it is mostly actively developed, and their offices are close by. That comes in handy. And it leads the way in efficiency among all other big data technologies.

So Apache Spark it was.

A new wave of business meant we needed to turn iPass SmartConnect from concept to reality and quick. We needed to start developing, right away.

We could not afford to build and maintain Spark clusters, or manage EMR. Instead, we needed to focus on writing business logic. So we scheduled our first call with Databricks.

The Databricks platform was very promising - almost too good to be true. Full separation of storage and computation, easy to use, high-level API, each job creates its own cluster, no more library dependency conflicts, various cluster sizes and instance types, all Spark versions, web-based development and a fantastic team of very smart people to help us out.

We signed on the dotted line. And shortly thereafter, we got our Databricks instance up and running.

It took us a few weeks to write the first iteration of the tracking and ranking job. iPass SmartConnect was launched.

Since then, we’ve developed all of the features mentioned above. Some of them are in beta, but the speed of development has been like nothing we had experienced or managed before.

Apart from iPass SmartConnect, we also use Databricks for ETLs, streaming, analytics, machine learning, data lake processing and reporting. It has become a central point of our entire data flow.

The team has grown, and we are now delivering new data products in very agile, short sprint cycles. We have accomplished all of this with a small team. And we no longer worry about hardware, maintenance, scalability or (lack of) resource challenges, like maintaining and expanding old infrastructure in our own datacenter, not to mention hardware and licensing costs.

Our success story would not be complete if I did not mention the tremendous help we received from the engineering team at Databricks. We have been using Spark in a fully mature production environment, and the team’s response was truly incredible. We are very happy to have the brainpower and dedication of Databricks’ engineers on our side.

So what have we learned? Simply this. If you have a vision and need to get to market quickly, do not try to re-invent the wheel by building everything yourself. Instead, focus on business logic, and the delivery of sellable and scalable products. And let the guys at Spark do what they do best.