Special thanks to Bill Abramovich & David Gray @Epsilon, Tanishq Bhalla @HealthVerity, Itai Weiss @ Nimble, JB Kole @ Mostly.ai for their valuable insights and contributions to this blog.
This blog is the second installment in a new quarterly series that will showcase the latest listings, introduce new providers, and highlight exciting notebooks. The series reflects the impressive growth of the Databricks Marketplace.
Introducing Our New Data Providers
In Q2 2024, Databricks Marketplace continued to expand its offerings, welcoming 47 new data providers and adding over 115 new listings. This brings the total to more than 230 data providers and over 2,200 listings. This quarterly update highlights four new data providers: HealthVerity, Epsilon, Nimble, and Mostly.ai. Each brings unique and valuable datasets to the Marketplace.
Spotlight on Four New Additions to Databricks Marketplace
While all these data providers offer exceptional data products that span multiple industries, ensuring valuable insights and robust analytics capabilities, we cannot highlight them all in this single blog. Therefore, we are particularly excited to highlight four new launches: Nimble, HealthVerity, Epsilon, and Mostly.ai. These four have been hand-picked for their unique offerings, supporting notebooks/demos and overall business impact.
1. HealthVerity: Advancing Healthcare Analytics
Use Case: Healthcare Claims Data Analysis
The HealthVerity taXonomy dataset, on the Databricks Marketplace, is the nation's most comprehensive closed claims dataset, encompassing over 245 million patient journeys from more than 225 payers, including commercial, Medicare, and Medicaid. This dataset is highly curated, de-duplicated, and HIPAA-certified, ensuring that it is research-ready from day one. It includes detailed patient data across all age groups, races, and geographies, offering a superior volume of rare and orphan diseases.
Once accessed through the Databricks Marketplace, data scientists can use this data set to enhance AI models and develop predictive analytics and machine learning algorithms. For example, a Databricks customer could use this dataset to look for patterns in breast cancer treatments, see how different treatments affect patient outcomes, or examine how patient characteristics influence treatment success. By using this data, healthcare providers can make better decisions, tailor treatments to individual patients, and ultimately improve care and outcomes for those battling breast cancer.
Explore the dataset and notebook here: HealthVerity Dataset.
2. Epsilon: Revolutionizing Marketing Strategies
Use Case: Fill in Missing Contact Data on Your Customers
Epsilon’s Contact Complete enables marketers to fill in missing contact information on their customer file and identify duplicate records. This enhanced customer information is achieved by populating names and addresses for records where only phone, email, or name/zip is known information.
Accurate and robust customer information is critical in multiple facets of customer relationships. This added level of foundational data enables clients to:
- Identify duplicate records.
- Increase platform activation rates via improved match rates.
- Know your customer better via higher append rates for data enrichment.
- Improved measurement across channels via improved customer recognition.
Imagine a retail company that aims to enhance its marketing strategies by more meaningful engagement with customers. Take a look at how the data engineer, data scientist, business analyst and marketing manager would collaborate together using Epsilon’s Contact Complete Service:
- Data Engineer: Responsible for the initial setup of data flows to and from Epsilon via Delta Sharing. Also responsible for ingesting the enhanced contact information into their customer data platform.
- Data Scientist: Responsible for empowering this improved customer identity to enhance all aspects of the customer journey via duplicate record identification, enhanced matching to 3rd party data sources, increased platform onboarding rates, and more accurate measurement.
- Business Analyst: This position focuses on analyzing the results of improved customer identity to generate more informed insights and strategic decision-making.
- Marketing Manager: Uses insights from a more robust ecosystem to develop and implement targeted marketing strategies, create personalized content, manage campaigns, and measure their effectiveness.
Discover the dataset here: Epsilon Dataset
3. Nimble: Optimizing Retail Operations
Improve pricing strategy and inventory management
With Nimble's integration into the Databricks Marketplace, businesses can now seamlessly enhance their Databricks Intelligence Platform by integrating real-time, domain-specific web data. This connection enables users to extract maximum value from their AI and BI applications, generating prescriptive insights that drive business success.
Imagine a supermarket chain aiming to refine its pricing strategy and improve inventory management. With Nimble's dataset now accessible through the Databricks Marketplace, the chain can leverage real-time competitor pricing and inventory data across millions of SKUs and multiple channels. By integrating this data through the Databricks Delta Sharing, the supermarket can ensure it is always working with the freshest, most accurate information. This integration allows for dynamic price adjustments, optimized inventory levels, and minimizes instances of overstock and stockouts. As a result, the retailer stays competitive and responsive to market changes, quickly adapting to pricing trends and inventory demands.
See what other customers are saying about Nimble and Databricks
"By leveraging Nimble’s capabilities and the power of Databricks Delta Sharing, we reduced the time needed to respond to negative customer sentiment from weeks to mere hours. Nimble provides comprehensive, real-time visibility into customer opinions about our products and brands across all online channels, empowering us to act swiftly and effectively with data ready to use at any moment."— Leading consumer packaged goods (CPG) company
"Nimble's solution, combined with Databricks Delta Sharing, empowers us to surpass our financial targets by enriching our data and updating dashboards faster than any competitor tracking the same 140 tech stocks. With automatic feeds of signals from across the public web, Nimble uncovers insights in places others overlook or cannot access, ensuring our data is ready and actionable, giving us a competitive edge."—Leading financial services (Buy Side) firm
Discover the dataset here: Nimble Datasets
4. Mostly.ai: Enhancing Data Privacy
Use Case: Synthetic Data Generation
MOSTLY AI’s Solution Accelerator on the Databricks Marketplace leverages GenAI to create high-quality, privacy-preserving synthetic data. Synthetic data helps maintain privacy and compliance without sacrificing data utility and ensures faster and safer data access.
Imagine you are a data scientist at a bank needing to analyze sensitive transaction data without risking privacy breaches. Traditional methods of data anonymization are not safe and cumbersome to implement. With MOSTLY AI’s synthetic data, you can generate realistic, anonymous datasets that closely mirror your original data.
It starts with a data scientist training a synthetic data generator using the MOSTLY AI package, where the model learns the statistical properties of the original data. Critical configuration details, such as the generator ID and API key, are securely saved in the Unity Catalog. The synthetic data model is then registered in the Unity Catalog, making it accessible without exposing sensitive production data. Finally, the registered model is used to generate synthetic data, which is stored in the Unity Catalog for easy access and downstream use. This approach ensures privacy, maintains data utility, and accelerates the development of AI and machine learning projects.
Take a look at the demo here
Discover the Solution Accelerator and MOSTLY AI Assets here: Mostly.ai Listings
Additional New Providers on Databricks Marketplace
Below are some additional new providers representing a facet of the diverse offerings available on Databricks Marketplace.
Domain | Provider Name |
Marketing & Consumer Insights | Gain Dynamics provides public and open data sources in Spain and Latin America in the consumer behavior space. They monitor the behavior of over 2 million households in Spain and Latin America. NCSolutions helps marketers and media companies enhance advertising performance by providing CPG insights. |
Financial and Economic Analysis | OptionMetrics distributes its options, futures, beta, and dividend forecast databases to enable organizations to construct and test investment strategies, perform empirical research, and assess risk. Stocktwits distributes products to help users monitor messages and sentiment across their platform - a large investing community. |
Real Estate and Moving | GapMaps provides location intelligence and demographic data, which empowers decision makers to refine their network strategies with greater confidence and reduced risk. Reomnify provides comprehensive geospatial, real estate and web datasets, driving unique insights for companies worldwide. |
Healthcare and Life Sciences | Symmetric Information offers a dataset detailing the Anthem PPO negotiated rates with Internal Medicine providers. Shaip provides 20+ datasets that include physician dictation and de-identified EHR data. |
AI and Machine Learning | Kobai is a graph-based semantic layer. Their Genie Spaces Accelerator Kit demo enables quick setup of Genie Spaces for conversational chat Bitext offers pretrained verticalized models designed to fine-tune and enhance the performance of LLMs in various applications, particularly in customer support. |
Conclusion
To get started with the Databricks Marketplace, visit databricks.marketplace.com. You can also learn more about how partners and customers are driving innovation with Databricks Marketplace by watching the recent sessions at Data + AI Summit, 2024