Simplifying Product On-Boarding with Generative AI
Summary
- Generative AI enhances product onboarding by improving data accuracy, consistency, and completeness for both suppliers and retailers.
- AI can process and cleanse product descriptions, analyze images, and generate relevant search terms to streamline the process.
- Databricks’ platform supports seamless integration of AI workflows, offering batch or real-time processing for scalable solutions.
Registering new products can be a complex and time-consuming process for both suppliers and retailers. Retailers often report issues with incomplete, inaccurate, or low-quality product information, which hinders the onboarding process. Suppliers, on the other hand, often find themselves overwhelmed by redundant or overlapping requests for information and struggle to provide the extensive details required by their retail partners. With the number of products available, especially on online sites, continually expanding, the need to improve this process for both parties is only growing, and through the use of generative AI, we can do just that.
Using Generative AI to Tackle Common Product Data Challenges
How we might approach this opportunity depends on the particular challenges we face during product on-boarding. At a minimum, we might inspect various elements like product names and descriptions and ask a generative AI model if these details are consistent and, if not, why. We might also look for common issues like the inclusion of misspelled words, abbreviations and technical specs that belong in other sections and ask the model to cleanse these for us (Figure 1).
Description Before Applying Gen AI |
Description After Applying Gen AI |
58-inch gas grill features 4 tube burners and 1 side burner Stainless-steel construction in satin finish with painted sides and back 60,000 BTUs of LP gas; cast-iron grill panels 706 square inches of cooking surface; rear rack for buns, etc. Measures 64 by 21 by 37-1/2 inches; 1-year warranty |
This 58-inch gas grill features a stainless-steel construction with a satin finish, four tube burners, and a side burner, providing 60,000 BTUs of power. It has 706 square inches of cooking space, a rear rack for storage, and a durable cast-iron grill panel. |
Figure 1. A sample product’s before and after description after the Llama 3.1 8B Instruct model was asked to make the text more accessible.
Taking things a step further, we might request a model to examine the images associated with a product and extract an item description with which we might compare other elements to again check for consistency (Figure 2).
Product Image |
Generated Description |
The product in the image is a stainless steel grill with a lid, four burners, and a side shelf. The grill has a rectangular shape with a rounded top and a flat bottom. It features four burners along the top, each with a knob for adjusting the flame. A side shelf provides additional space for food preparation or storage. The grill is supported by a stand with wheels, allowing for easy mobility. The overall design suggests a high-quality, durable grill suitable for outdoor cooking. |
Figure 2. A product’s image and a description extracted using the Llama 2.3 11B Vision model.
To assist with searches, we might ask the model to use the provided as well as the extracted descriptions (and related metadata) to suggest keywords and search terms (Figure 3).
Suggested Keywords & Phrases |
stainless-steel | 58-inch | gas | grill | four-burner | side-burner | 60,000-BTU | 706-square-inch | cast-iron | grill-panel | silver | satin-finish | cooking-space | rear-rack | storage | outdoor-kitchen | patio-grill | large-grill | heavy-duty-grill | commercial-grade-grill | high-power-grill |
Figure 3. Search terms generated for the grill described in Figures 1 and 2 using the Llama 3.1 8B Instruct model.
We might also ask the model to determine key properties from the image, such as the item’s primary and use that information to address any details a supplier may not have provided during registration (Figure 4).
Product Image |
Extracted Color |
Silver |
Figure 4. A product’s image and the primary color as determined using the Llama 2.3 11B Vision model.
One of the core challenges with using these models these ways is that the outputs may not always conform to the constraints we may define for a field. For example, we might extract a value of Silver for the primary color of an appliance when we require the color to align with supported choices of either Grey or Metallic. In these scenarios, we might provide the model with a list of acceptable choices and ask it to limit its response to the one best aligned with the item being inspected.
Still another approach might be to use various properties to perform a semantic search, a generative AI technique where in text or images are converted into numerical indices where conceptually similar items tend to be positioned close to one another. Using this technique with a pre-approved set of high-quality item details, we might identify closely related items and retrieve relevant properties, such as their position in a product hierarchy, from them.
Armed with a wide range of approaches, we have choices to make as to how we will structure the application as well. In early implementations, we are seeing organizations implement batch processes, validating and correcting data inputs after supplier submittal, so that existing product on-boarding procedures aren’t disrupted. Once prompts and models are adequately tuned to provide reliable results, we often see interest in moving towards the development of new onboarding applications where generative AI is employed at the time of data entry, identifying issues as they emerge and prompting suppliers with suggested alternatives. Both approaches can be effective but differ in terms of the change management involved.
Employing the Databricks Platform to Build the Solution
Whether batch or real-time, the implementation of these generative AI workflows is simplified by the Databricks Data Intelligence Platform. With support for a wide variety of data formats, Databricks can process the structured and unstructured data inputs with ease. Due to its open nature, the platform supports a wide range of generative AI models, many of the most popular of which are pre-integrated for easier access. Peripheral technologies such as a vector store, a specialized database enabling semantic search, is also pre-integrated, simplifying implementation.
Regarding the application to be constructed, Databricks also provides support for batch and real-time workflows allowing data to be processed behind the scenes as new information arrives. For those instances where an interactive, user-facing application is preferred, the built-in application capabilities of the platform simplify the construction and deployment of scalable, integrated solutions to both internal and external audiences.
The breadth of capabilities in the Databricks Data Intelligence Platform allows organizations looking to build product on-boarding solutions to focus on the details of what they want to enable and not how they might bring together the pieces needed to build it.
Want to See This in Action?
To help demonstrate how organizations might use generative AI on the Databricks Data Intelligence Platform to solve common product on-boarding problems, we’ve built a new solution accelerator demonstrating numerous techniques. Using product images and metadata from the Amazon Berkeley Objects (ABO) Dataset, we demonstrate how these techniques may be employed in a batch processing workflow to identify and correct numerous issues. Withholding some details from the generative AI models, we are able to spot check the corrections being made in order to gain confidence that our selected models are performing as expected. We encourage those organizations interested in using gen AI to solve product on-boarding challenges to review our code, take inspiration from the techniques shown, borrow any code which works for them and get started building their product on-boarding solutions today with Databricks.
Download our Solution Accelerator for Prodcut Onboarding with Generative AI.