Elasticsearch And Apache Lucene For Apache Spark And MLlib - Databricks

Elasticsearch And Apache Lucene For Apache Spark And MLlib

Download Slides

Spark’s MLlib makes it a snap to apply machine-learning algorithms to huge datasets. However, especially when dealing with unstructured text, data input always requires some preprocessing before it can be fed to your ML algorithms. But how do you prepare the unstructured text you want to process? And what if it is not just in English, but also in Mandarin, Thai, or Arabic? Elasticsearch’s rich analysis capabilities, all powered by Lucene, make it perfectly suited for processing and tokenizing data for machine-learning tasks all in real time, no matter which language you are looking at-not to mention searching through. So how do we marry Spark with Elasticsearch? Costin Leau gives an overview of Elastic’s current efforts to enhance Elasticsearch’s existing integration with Spark, going beyond Spark core and Spark SQL by focusing on text processing and machine learning. Attendees will leave with a thorough understanding of how Elasticsearch, Spark, and Spark’s MLlib can make it much easier to search through and analyze data, no matter the text-based input.

Learn more:

  • ElasticSearch
  • Application Spotlight: Elasticsearch
  • Using Spark and Elasticsearch for real-time data analysis
  • Building a Dataset Search Engine with Spark and Elasticsearch