The What and Why of Big Data Analytics - Databricks

Big Data Analytics

Glossary Item
« Back to Glossary Index
Source Databricks

Before Hadoop, both storage and compute technology was limited;  as a result, the analytics process was long and rigid.

In order to get every new data source ready to be stored it had to go through a lengthy process, usually known as ETL. Once the data was ready, it had to be stored in a database or data warehouse, and into a static data model. The main problem with that approach is that this 3-part process could take up to 18 months to implement or change.  On it was needed 3 months just to integrate a new data source.

Big Data Analytics

Businesses can no longer operate at this operate.

Big data analytics is the solution that came with a different approach.

Big data analytics helps organizations harness their data and use it to uncover information including hidden patterns, unknown correlations, market trends and customer preferences that can help organizations identify new opportunities and make informed business decisions

Advantages of using big data analytics:

  • Cost reduction. Big data technologies such as Hadoop and cloud-based analytics can help companies decrease their expense when it comes to costs related to storing large amounts of data
  • Improved decision making. With the speed of Hadoop and in-memory analytics, combined with the ability to analyze new sources of data, businesses are offered  the data-driven insights needed to analyze information on the spot
  • New products and services. With the help of big data analytics companies can analyze customer needs to give customers what they want in terms of products and services
  • Fraud detection: Big data analytics is also used to prevent fraud mainly in the financial services industry.

Big data analytics is the often complex process of examining large and varied data sets — or big data that has been generated by various sources such as Artificial Intelligence (AI), mobile, social and Internet of Things (IoT). The amount of digital data that exists is growing at a fast pace, doubling every two years.

Big data analytics takes advantage of advanced analytic techniques to analyze really big data sets that include structured, semi-structured and unstructured data, from various sources, and in different sizes from terabytes to zettabytes.

Most common data types involved in Big Data analytics:

  • Web data. Customer level web behavior data such as page views, searches, reading
    reviews, purchasing,
  • Text data (email, news, Facebook feeds, documents, etc) is one of the biggest and
    most widely applicable types of big data.
  • Time and location data. GPS and mobile phone, as well as Wi-Fi connection, makes
    time and location information a growing source of data.
  • Geographic data: This type of data is related to roads, buildings, lakes, addresses, people, workplaces, and transportation routes, and have been generated from geographic information systems.
  • Real-time media: real-time streaming of live or stored media data.
  • Smart grid and sensor data. Sensor data are collected nowadays from cars, oil pipes,
    windmill turbines, and they are collected in extremely high frequency.
  • Social network data. Within social network sites like Facebook, LinkedIn, Instagram,
    it is possible to do link analysis to uncover the network of a given user.
  • Network data: data related to very large networks, for example, Facebook and Twitter or  technological networks such as the Internet, telephone and transportation networks

Linked data: this type of data has been raised using standard Web technologies like HTTP, RDF, SPARQL, and URIs

« Back to Glossary Index