Hosted Spark - Databricks

Hosted Spark

Glossary Item
« Back to Glossary Index
Source Databricks

Apache Spark is a fast and general cluster computing system for Big Data built around speed, ease of use, and advanced analytics that was originally built in 2009 at UC Berkeley. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. In addition, it also supports several other tools such as Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.

Spark provides two modes for data exploration:

  • Interactive
  • Batch

For a simplified end-user interaction, Spark is also provided to organizations in a unified hosted data platform. In the absence of direct access to Spark resources by remote applications, the user had to face a longer route to production.

In order to overcome this obstacle, there have been created services that enable remote apps to efficiently connect to a Spark cluster over a REST API from anywhere. These interfaces support the execution of snippets of code or programs in a Spark context that runs locally or in Apache Hadoop YARN.

Hosted Spark interfaces proved to be turnkey solutions as they facilitate the interaction between Spark and application servers, streamlining the architecture required by interactive web and mobile apps.

Hosted Spark services provide the following features:

  • Interactive Scala, Python, and R coverings
  • Batch submissions in Scala, Java, Python
  • Multiple users are able to share the same server
  • Allows users to submit jobs from anywhere through REST
  • No code change is required do be done to your programs

Organizations can now easily overcome the existing bottlenecks that impede their ability to operationalize Spark, and instead, focus on capturing the value promised by big data.

 

« Back to Glossary Index