Alternative Data - Databricks

Alternative Data

Glossary Item
« Back to Glossary Index
Source Databricks

Alternative data is information gathered by using alternative sources of data that others are not using;  non-traditional information sources. Analysis of alternative data can provide insights beyond that which an industry’s regular data sources are capable of providing.

However, what exactly is considered to be alternative data varies from one industry to another as it depends on the traditional data sources that you and your competitors are already using.

Typical Alternative Data Types

When we talk about alternative data, there are a couple of data types that are mainly used:

  • satellite data,
  • mobile data
  • sensor data
  • web data.

However, alternative data can also include:

  • Geolocation (foot traffic)
  • Credit card transactions
  • Email receipts
  • Point-of-sale transactions
  • Social media posts
  • Online browsing activity
  • Shipping container receipts
  • Product reviews
  • Price Trackers
  • Weather and micro-climates
  • Flight and shipping trackers


In recent years, the increase in data coming from mobile devices, satellites, sensors, and websites has led to vast amounts of structured, semistructured and unstructured data, that we refer under the generic term of big data.

Using alternative data allows you to gain unique insights, competitive industry advantage, and boosted profits.  You can combine data sets from different sources to get a clear overview of company-specific, competitive market landscapes.

There are three main ways that can be used to access alternative data:

  • Acquisition of Raw data
  • Third-party Licensing
  • Web scraping (or web harvesting, or web data extraction). A web scraper is an Application Programming Interface(API) that extract data from a web site and is capable of gathering key insights on the desired topic necessary to thrive in your industry. Newer forms of web scraping involve listening to data feeds from web servers. For example, JSON is commonly used as a transport storage mechanism between the client and the web server.

Automated scraping techniques

  • HTML Parsing: HTML parsing is done using Java scripts, and targets linear or nested HTML pages.
  • DOM Parsing: Document Object Model, or DOM, defines the style, structure and the contents contained within the XML files.
  • Vertical Aggregation: Vertical aggregation platforms are created by organizations featuring a huge computing power which are targeting specific verticals.
  • XPath: XML Path Language or XPath is a query language that can be used on XML documents.
  • Google Docs: Google sheets can be used pretty much the same as if you were writing a scraper in a programming language like Python or Ruby, as a result, it is a good and quick way to introduce the basics of certain types of scrapers.
  • Text Pattern Matching: This is a regular expression-matching technique that uses the UNIX grep command, and clubbed with popular programming languages like Perl or Python.
« Back to Glossary Index