Helping businesses get insights out of their data, fast, is core to the mission of Elasticsearch. Being able to live wherever a business stores their data is obviously critical to that mission, and Hadoop is one of the leaders in providing a way for businesses to store massive amounts of data at scale. Over the course of the past year, we have been working hard to bring the power of our real-time search and analytics engine to the Hadoop ecosystem. Our Hadoop connector, Elasticsearch for Apache Hadoop, is compatible with the top three Hadoop distributions – Cloudera, Hortonworks and MapR – and today has achieved another exciting milestone: Spark certification.
Spark is rapidly emerging as a popular processing and analysis tool for Hadoop-like and other data stores. We continue to see it in many of our customers' Hadoop distributions and beyond, and have been working together with Databricks as well as our respective open source communities to bring better connectivity between the two technologies. The combination of Elasticsearch with Spark adds the capabilities of a full-blown search engine that enhances data discovery and exploration - whether it be in a live, customer-facing environment, or behind the scenes for internal analysis - to Spark's unified processing engine. Through Elasticsearch for Apache Hadoop Map/Reduce support, Spark applications can interact with Elasticsearch just as they would with an HDFS resource, allowing them to index and analyze data transparently, in real-time. Our data visualization tool, Kibana, can also be used to explore massive amounts of data in Elasticsearch through easy-to-generate pie charts, bar graphs, scatter plots, histograms and more.
Businesses continue to adopt Elasticsearch to help them get to the last mile of their Hadoop deployments by providing the ability to ask, iterate and extract actionable insights from their data. A lot of them are in industries like healthcare, finance and telecommunications and have extremely large and sensitive amounts of data they need to mine. Elasticsearch for Apache Hadoop lets them access data, like log files, in minutes instead of hours, so they can detect fraud, identify service issues and analyze customer behavior, letting them come to resolutions faster and giving their rockstar developers the tools they need to directly impact the bottom line of their business.
We couldn’t be more thrilled to be officially “Certified on Spark”; our Hadoop connector is the first step in our roadmap to make the two more natively integrated, bringing businesses even more advanced search and analytics capabilities to their data.
If you’re going to Spark Summit, Holden Karau from Databricks will be showing how to streamline search indexing with Elasticsearch and Spark in this session on Monday, June 30 at 3:00pm.
We are also holding a webinar about how Elasticsearch can used for real-time insights on your Hadoop and Spark deployments on Wednesday, August 20th - you can register for that here.
And last but not least, if you’d like to get started, download Elasticsearch for Apache Hadoop here and let us know what you think!
