Elasticsearch - Databricks

Elasticsearch

Glossary Item
« Back to Glossary Index
Source Databricks

Elasticsearch is a NoSQL, distributed database that stores, retrieves, and manages document-oriented and semi-structured data. Furthermore, it is an open source, RESTful search engine built on top of Apache Lucene and released under the terms of the Apache License. It is Java-based, thus available for many platforms that can search and index document files in diverse formats.

The data stored in Elasticsearch is in the form of schema-less JSON documents; similar to No-SQL databases.

Elasticsearch relies on flexible data models to build and update visitor profiles to meet the demanding workloads and low latency required for real-time engagement.

ElasticSearch Main Use Cases

ElasticSearch can be used for multiple purposes, such as:

  • Logging and Log Analysis:  The ecosystem built up around Elasticsearch has made it one of the easiest to implement and scale logging solutions.
  • Scraping and Combining Public Data: Elasticsearch has the flexibility needed to take in multiple different sources of data and keep it all manageable and searchable.
  • Full-Text Search: ElasticSearch is document oriented. It stores and indexes documents. Indexing creates or updates documents. Once the indexing is finished, you can search, sort, and filter complete documents—not rows of columnar data.
  • Event Data and Metrics: Elasticsearch is also known for working great well on time-series data such as metrics and application events. No matter the technology you are using ElasticSearch probably has the needed components to easily grab data for common applications; and in the rare case that it doesn’t, adding that capability is quite easy.

Data Visualization

Elasticsearch allows you to search and filter through all sorts of data via a simple API. The API is RESTful, so users can not only use it for data-analysis but can also use it in production for web-based applications.

Currently, Elasticsearch includes faceted search, a functionality that allows you to compute aggregations of your data.
Here are some of the most relevant features:

  • It provides a scalable search solution.
  • Performs near-real-time searches.
  • Provides support for multi-tenancy.
  • Streamlines backup processes and ensures data integrity.
  • An index can be recovered in case of a server crash.
  • Uses Javascript Object Notation (JSON) as well as Java application program interfaces (APIs).
  • Automatically indexes JSON documents.
  • Indexing uses unique type-level identifiers.
  • Each index can have its own settings.
  • Searches can be done with Lucene-based querystrings.
« Back to Glossary Index