Skip to main content

This is a guest blog from our partners at MongoDB
Bryan Reinero and Dana Groce

We are happy to announce that the MongoDB Connector for Apache Spark is now officially certified for Azure Databricks. MongoDB Atlas users can integrate Spark and MongoDB in the cloud for advanced analytics and machine learning workloads by using the MongoDB Connector for Apache Spark which is fully supported and maintained by MongoDB.

The MongoDB Connector for Apache Spark exposes all of Spark’s libraries, including Scala, Java, Python, and R. MongoDB data is materialized as DataFrames and Datasets for analysis with machine learning, graph, streaming, and SQL APIs. The MongoDB Connector for Apache Spark can take advantage of MongoDB’s aggregation pipeline and rich secondary indexes to extract, filter, and process only the range of data it needs – for example, analyzing all customers located in a specific geography. This is very different from simple NoSQL data stores that do not offer secondary indexes or in-database aggregations and require the extraction of all data based on a simple primary key, even if only a subset of that data is needed for the Spark process. This results in more processing overhead, more hardware, and longer time-to-insight for data scientists and engineers.

Additionally, MongoDB’s workload isolation makes it easy for users to efficiently process data drawn from multiple sources into a single database with zero impact on other business-critical database operations. Running Spark on MongoDB reduces operational overhead as well by greatly simplifying your architecture and increasing the speed at which analytics can be executed.

MongoDB Atlas, our on-demand, fully-managed cloud database service for MongoDB, makes it even easier to run sophisticated analytics processing by eliminating the operational overhead of managing database clusters directly. By combining Azure Databricks and MongoDB, Atlas users can make benefit of a fully managed analytics platform, freeing engineering resources to focus on their core business domain and deliver actionable insights quickly.

Try Databricks for free

Related posts

The Quest for Hidden Treasure: An Apache Spark Connector for the Riak NoSQL database

August 11, 2016 by Pavel Hardak in
View this notebook in Databricks This is a guest blog from our friends at Basho. Pavel Hardak is a director of product management...

Accelerate AI-Driven Innovation in Insurance with Databricks and MongoDB

November 8, 2023 by Marcela Granados and Jeff Needham in
Insurance companies have seen a tremendous shift in modernization. Traditionally known for the use of legacy systems, leading carriers are modernizing their infrastructure...

Burning Through Electronic Health Records in Real Time With Smolder

Check out the solution accelerator to download the notebook referred throughout this blog. In previous blogs , we looked at two separate workflows...
See all Company Blog posts