CUSTOMER
STORY

Scaling data capabilities at the speed of business growth

1 million+

Videos analyzed in the database

98%

Improvement in query performance

25%

Overall cost savings

Person holding smartphone watching a video outdoors.

Product descriptions:

Databricks SQL Lakeflow Spark Declarative Pipelines

AnyClip is a visual intelligence company whose AI-powered platform turns traditional video into interactive content that is fully searchable, measurable, personalized and monetized for consumer and corporate audiences. AnyClip’s client roster includes Fortune 500 companies with huge video libraries, which AnyClip tools analyze frame by frame, recognizing people, brands and topics among other data points — all of which are stored and processed for reporting and querying. As the company experienced rapid growth, the amount of data, and its history, proved too voluminous for their initial Amazon Redshift data stack to handle, leading to painfully slow data query times and an unstable environment. After evaluating Snowflake, BigQuery and Databricks SQL, AnyClip migrated from Redshift to Databricks SQL to solve these and other scaling issues. Today the system is stable and highly responsive, leading to easy and unlimited report gathering, hyperefficient workflows and faster and better performance insights.

Time-consuming data queries lead to frustration

AnyClip’s innovative AI platform transforms how video is produced and consumed around the globe. Every video AnyClip receives is analyzed to the millisecond. The data is then presented to internal and external stakeholders via customized dashboards that reveal overall performance, audience engagement, revenue tracking and a plethora of other viewership metrics. That data can then be analyzed or further queried to drive future content strategies and video investment decisions.

Accomplishing this wasn’t always easy. With the previous data stack using Redshift, queries yielded widespread frustration rather than insights. “People would write to me and say things like, ‘What’s going on?’” explained Gal Doron, Head of Data at AnyClip. “‘Why am I not getting any results for my query? It’s just running endlessly.’”

Doron understood the frustration firsthand. He and his team were mired in their own manual tasks — like rewriting tables due to rogue data arriving late or from disparate sources — which sometimes brought the system to a halt. All while trying to sustain a toolset that worked adequately when the company was smaller but proved insufficient as AnyClip experienced rapid growth.

With more videos arriving daily — total video inventory is now in the millions — costs and inefficiency were growing. “We very quickly got to a point where we had two options: Either scale enormously in terms of machines in Redshift, which would double and triple our costs, or find a different solution,” said Doron. “That’s when I started looking, and that’s where Databricks hugely solved that issue.”

Making data work for end users

In their hunt for a new solution, the AnyClip team quickly realized that leveraging materialized views in Databricks SQL could address their most pressing issues and transform data querying from a tedious task into a simple and empowering activity.

Materialized views are a data warehousing construct that store the results of a precomputed query. By precomputing results, materialized views accelerate SQL analytics and BI reports while also reducing costs. Databricks SQL materialized views are especially powerful because they’re built on Spark Declarative Pipelines, bringing query incrementalization and data engineering best practices into the world of data warehousing.

Using materialized views in Databricks SQL had immediate and profound impacts for AnyClip. “We’ve seen query performances improve by 98% with some of our tables that have several terabytes of data,” affirmed Doron. “Previously when users tried to run queries, it could take hours. With Databricks, it takes something between half a minute to three minutes.”

Another example is how the rush for reporting on Monday mornings, when everyone comes back from the weekend and wants the latest data, has changed. “All the reports were scheduled for the exact same time, and previously, everything got stuck,” Doron related. “Now it’s one hundred percent more stable in terms of performance.”

Perhaps most notable, Doron no longer worries about the data stack keeping pace with business growth. “Databricks hugely solved that issue,” he proclaimed. “Sometimes I have very small data models and I don’t need to scale there. But when I have a huge data model, I can scale and it’ll be much cheaper — overall about 25% cheaper and close to 50% faster — and very accurate. I don’t need to scale the whole warehouse. I can just scale in one place.”

Delta Lake Liquid Clustering, a data layout technique that replaces table partitioning and ZORDER, also helped AnyClip with query performance. According to Doron, the team saw “orders of magnitude improvement in performance with Liquid Clustering” versus the traditional approach of partitioning plus ZORDER as they migrated data warehouse and analytics workloads from Redshift to Databricks SQL.

Putting power in the hands of stakeholders

Doron runs a lean engineering team. Therefore, his business users, both internal and external, are the de facto data analysts. With Databricks, Doron can let curiosity guide his team. “If they need a report, they just create it,” he stated. “They’re doing it on their own and we’re seeing some amazing stuff.”

For example, a vice president of marketing recently decided to analyze election videos to determine if audience engagement changed when female candidates were on screen versus male candidates. “She did it alone — a whole dashboard with comparison charts and graphs and more,” Doron noted. “That’s how easy it is for our business users who are using my system.”

Ultimately, Doron knows the new system is doing its job because he’s not receiving any of the complaints he used to. “It’s much quieter now,” he shared.