Advanced Analytics with HyperLogLog Functions in Apache Spark
This is a community guest blog from Sim Simeonov, the founder & CTO of Swoop and IPM.ai. Pre-aggregation is a common technique in the high-performance analytics toolbox. For example, 10 billion rows of website visitation data per hour may be reducible to 10 million rows of visit counts, aggregated by the superset of dimensions used...