Meet the Data Team
As one of the world’s largest digital libraries, Scribd is on a mission to change the way the world reads. Scribd leverages data and AI to uncover interesting ways to motivate their users into reading more and exploring.
Behind the magic of Scribd’s recommendations is a visionary Data Team. With unusual foresight into the future, they knew the only way forward with their data organization was to unify, democratize and make a full move to the cloud. It may have taken an imminent breaking point, but the data team – Tyler, platform owner; Amir, data scientist; Alex, data architect and engineer; and Stas, data engineer–was ready to usher the company into the future.
As Scribd’s digital library grew to 60 million titles, operational costs skyrocketed. Data storage grew complex. The entire business relied on data-backed insights to deliver a delightful reading experience to customers. The Data Team worked as fast as they could to keep up with the greater demand.
“We basically had a multi-cloud, multi-vendor big data setup,” Alex cites as the reason for many of their pains as a team.
It’s typical for data teams to be siloed, to experience bottlenecks in workflow and heavy dependencies on each other to get work done. It is highly common for Data Teams to be codependent, relying on each other just to access information and yet find it extremely difficult to collaborate or even look at the same piece of code at the same time. Often, there is shape-shifting and role melding, while data scientists have to moonlight as data engineers to get the data in the right shape for their own experiments.
Just after Amir’s first day at Scribd, Tyler and Amir had a conversation that would change the course of the company. Since Scribd’s data was stored in several different locations, collaboration came at a high cost of time and effort.
But Tyler and Amir envisioned a team that worked better together, more independently and more collaboratively, in the cloud, with access to data for all.
Tyler thought back on the conversation that changed it all.
“I think it’s very rare to find someone in a data science role who has intuition about what good infrastructure should be and can be. Since Amir started, I had a champion for Databricks in data science saying we need something a lot better than we have.”
I left the meeting thinking, ‘Yes. He gets it. We’re going to build it together.’ And it happened.
Scribd was able to call upon Databricks to unify and operationalize their data, thereby democratizing the data for the entire team. They immediately saw the benefits of simplifying the management of their data analytics workflows, significantly improving development velocity and cross-team collaboration. The machine learning lifecycle, which used to be a highly manual and error-prone process, is now automated and time-saving. Operational costs dropped by 30%–50%.
A team that used their vision to get this far, they’re not letting up now. They see a future in which data is available as streams, with a freshness of 10 seconds instead of 24 hours, at a scale they never imagined before Databricks. “It’ll be a game changer for what we can do analytics-wise,” Stas said.
They also see a future built on machine learning models. Databricks MLflow has allowed the Scribd Data Team to develop machine learning models but not to stop there – to package these models, wrap them, prepare them, version them, then hand them off to the engineering team to serve and deploy. This process has saved time that was wasted in the handoff, where traditionally data scientists would give research code to the engineering team who would then redo all of it.
Most importantly, these advancements will change the game for the way the world reads. The Data Team’s vision to move to the cloud where they could be more open, collaborative and dynamic has been instrumental in enabling the world’s largest digital library to deliver a personalized reading experience to their customers.
Check out Scribd’s job openings and apply to join their killer Data Team.