Skip to main content

We are happy to announce that Elsevier Labs has deployed Databricks as its unified content analysis platform, providing significant productivity gains for the entire team and reducing typical project lengths from weeks to just days.

Elsevier Labs is the advanced R&D group within Elsevier - a global provider of scientific information, publishing over 2,500 journals and 33,000 book titles while building web-based information solutions for professionals in science, technology, and medicine.

They needed a fast and scalable analytics platform to develop new methods to extract insights from the published content. Their development process frequently required the application of complex natural language processing (NLP) algorithms to millions of articles and interpretation of the results. Prior to Databricks, Elsevier Labs’ productivity was severely hampered because:

  • There was substantial manual data movement during the analytics workflow
  • The steep learning curve of their legacy analytics platform prevented code reuse
  • Presenting findings required significant additional time to build reports and UIs

Databricks enabled Elsevier Labs to effortlessly manage Apache Spark clusters, access their data, collaboratively develop cutting-edge algorithms, and present their findings in a single platform. With the Databricks integrated workspace, the Elsevier Labs team was able to:

  • Create, scale, and terminate Spark clusters without specialists with big data DevOps expertise.
  • Directly access data in S3 buckets and collaboratively perform analysis in a notebook environment, using Python, Scala, SQL, or even R.
  • Present findings to senior management and share results across the entire organization with account-based access control of notebooks.

As a result of deploying Databricks, Elsevier Labs enabled five times more people to contribute to content analysis algorithm development, growing the number of contributors from 3 to 15. Moreover, the people who use Databricks are significantly more productive, reducing typical project lengths from weeks to just days.

To try out Databricks for yourself, sign-up for a 14-day free trial today!

Try Databricks for free

Related posts

How to Extract Market Drivers at Scale Using Alternative Data

Watch the on-demand webinar Alternative Data Analytics with Python for a demonstration of the solution discussed in this blog and/or download the following...

Introducing the Natural Language Processing Library for Apache Spark

October 19, 2017 by David Talby in
This is a community blog and effort from the engineering team at John Snow Labs, explaining their contribution to an open-source Apache Spark...

A Guide to Data Science, Python, and Advanced Analytics Talks at Spark + AI Summit 2019

March 20, 2019 by Sophie Seddighzadeh in
With a tsunami of data, scale of computing resources available, and rapid development of easy-to-learn open source Machine Learning frameworks, data science and...
See all Announcements posts