Data Lakes Support All Data TypesA data lake holds big data from many sources in a raw, granular format. It can store structured, semi-structured, or unstructured data, which means data can be kept in a more flexible format so we can transform it when we’re ready to use it.
Benefits of a Data LakeEach data element in a lake gets assigned a unique identifier and is tagged with a set of extended metadata tags. Whenever there is a business question risen, the data lake can be queried for relevant data, and that smaller set of data can then be analyzed to help answer the question. You can apply various types of analytics to your data such as SQL queries, big data analytics, full-text search, real-time analytics, even machine learning can be used to uncover insights. Data lakes are usually configured on a cluster of scalable commodity hardware. As a result, data can be dumped in the lake in case it will be needed at a future date without worrying about storage capacity. In addition, the clusters could exist on-premises or in the cloud. The term data lake is usually associated with Hadoop-oriented object storage.
Hadoop Data LakesThe use of Hadoop in relation to data systems is extremely compelling as it provides a low-cost approach to data storage. Hadoop has proven to work great even for very large organizations. A Hadoop data lake is a data management platform which stores data in the Hadoop Distributed File System "HDFS" across a set of clustered compute nodes Its main usage is to process and store nonrelational data. Some of the types of data that can be processed are log files, internet clickstream records, sensor data, JSON objects, images, and social media posts.
Back to glossary