In today's data-driven world, the fusion of visual assets and analytical capabilities unlocks a realm of untapped potential. Image datasets are crucial in developing and training Generative AI (GenAI) technologies. We are thrilled to announce a groundbreaking collaboration that brings the vast collection of Shutterstock imagery to the Databricks Marketplace — our first listing of Volume (aka non-tabular) datasets on our Marketplace. This free sample dataset, which consists of 1,000 images and accompanying metadata sourced from Shutterstock's 550+ million image library, is available for immediate access. This blog will explore Shutterstock's image library on Databricks Marketplace and the industry use cases.
Traditional data marketplaces are restricted and only offer tabular data or simple applications - so the value to data collaborators is limited. They also don't provide tools to evaluate the data sets. Databricks Marketplace is an open marketplace that enables you to share and exchange data assets such as tabular datasets, volumes, notebooks, and AI models across clouds, regions, and platforms. Since launching in June, Databricks Marketplace has over 1,800 listings from over 180 providers.
"Shutterstock is bringing its vast collection of nearly a billion creative content assets to the Databricks Marketplace, a platform renowned for fostering open data and AI collaboration”, as per Aimee Egan, Chief Enterprise Officer, Shutterstock. According to Egan, “This integration provides unparalleled access to our extensive library of ethically-sourced visual content, propelling responsible AI and ML initiatives forward across various industries. We are excited to add Delta Sharing as a method to deliver data. Customers utilizing our rich dataset on Databricks can tap into new opportunities, catalyze product innovations, and secure a competitive advantage."
Shutterstock's datasets incorporate all the metadata, including keywords, descriptions, geo-locations, and categories, making organizing and searching for images easier. Examples of datasets include a wide range of industry categories like food and beverage, transportation and autonomous vehicles, animals and wildlife, clothing and apparel, travel, tourism and hospitality, etc.1 Shutterstock's image library plays a pivotal role in GenAI, serving as a foundational resource for training advanced AI models and multimodal models like OpenAI Dall-E.
"Shutterstock is bringing its vast collection of nearly a billion creative content assets to the Databricks Marketplace, a platform renowned for fostering open data and AI collaboration."— Aimee Egan, Chief Enterprise Officer, Shutterstock
Watch the demo below to learn more about Shutterstock's listing, how to access it and query it using a notebook.
With Shutterstock's listing on the marketplace, here are common use cases across industries that drive innovation:
Volumes are a type of object in Unity Catalog that simplifies the integration of non-tabular data as a collection of directories and files that you can access, store and manage in your governance framework.
As we recently announced, you can now share Volumes through Delta Sharing available in Public Preview. With Volume Sharing, you can securely share extensive collections of non-tabular data such as PDFs, images, videos, audio files and other documents – along with tables, notebooks and AI models – across clouds, regions and accounts.
This free sample dataset from Shutterstock represents the first Volume-based listing offered on the Databricks Marketplace. With access to Shutterstock's diverse collection of images and accompanying metadata, you can use Volume Sharing to incorporate this dataset into Generative AI applications using a Retrieval Augmented Generation (RAG) technique without copying the data.
Volume Sharing helps accelerate collaboration between business units or partners, as well as helping to onboard new collaborators across clouds, platforms, and regions. Data providers on Databricks Marketplace, such as Shutterstock, can now easily share any non-tabular data with consumers seamlessly and simply. This approach democratizes data access and significantly reduces the time and resources required to obtain and utilize high-quality datasets.
Let's walk through an example of a fictitious retailer, Berkeley FoodMart that wants to improve the description of products on its website. Well-optimized product listings are more likely to appear prominently in search engine results, attracting potential customers and increasing organic traffic. Additionally, optimized titles and descriptions compel users to click on the listings, resulting in higher click-through rates and more visitors exploring products.
The challenge? Berkeley FoodMart is like other grocers with 50,000 products in their store with 20% turnover each year, translating into hundreds of thousands or millions needing appropriate description. It's cost-prohibitive to manually maintain descriptions for all products. Given these costs, existing descriptions are often limited in breadth.
Berkeley FoodMart will leverage Shutterstock's diverse image datasets retrieved from Databricks Marketplace to help automate this. To automate the metadata and description of products on their website, Berkeley FoodMart will use Shutterstock's immense library of images, including brand and product data, and their own internal images to generate image-to-text analytics.
The future of AI and data-driven innovation is bright, and with tools like these at our disposal, there's no limit to what we can achieve together. Let's embark on this exciting journey and transform the landscape of technology and creativity.
Sources