Apache Spark Earns Datanami Awards for Machine Learning, Real-time Analytics, and More
Today, the Datanami Readers’ and Editors’ Choice Awards recognized the sweeping changes Apache Spark is bringing to the Big Data landscape with four awards:
- Readers' Choice – Best Big Data Product or Technology: Machine Learning
- Readers' Choice – Best Big Data Product or Technology: Real-Time Analytics
- Readers' and Editors' Choice – Top 5 Open Source Projects to Watch
- Readers' Choice - Best Big Data Startup: Databricks
Determined through a nomination and voting process with input from the global Big Data Community and Datanami editors, the awards highlight key trends, shine a spotlight on technical breakthroughs, and capture a critical cross section of the state of the industry.
While at UC Berkeley, Databricks Chief Technologist Matei Zaharia created Spark to unify different types of workloads under a fast and flexible engine. He believed that solving Big Data problems needed a simple way to merge a multitude of analytical and data processing techniques under one platform.
These awards validate our recent efforts with the Spark community. We’ve focused in simplifying Spark streaming by creating a unified interface across all components, including machine learning MLlib, so that users can build end-to-end continuous applications in an incremental fashion. We see a need among users to combine streaming with online machine learning by combining real-time training, periodic batch training, and prediction serving behind the same unified API.
Our team is working hard with the community towards the next Apache Spark release, with many more innovation in machine learning and real-time analytics planned. They will share more details about the projects they have been working on in the coming months at the Databricks blog, so stay tuned.
Meanwhile, you can try Apache Spark 2.0, which lays the foundation for Structured Streaming APIs and new DataFrame-based APIs for model persistence and machine learning pipelines in MLlib.