Image-generating technologies offer significant benefits for retail and consumer goods companies. By using generative models that produce both stylized and photo-realistic images from user prompts, marketing professionals, designers, and product development teams can quickly and effectively explore new ideas and designs. The primary requirement for using this AI technology is the ability of the user to clearly articulate a concept. Small teams of individuals focused on a shared objective can then pass prompts to the AI, generating visualizations that help them evaluate ideas and spark new ones. In a process facilitated by such technology, teams can reduce upfront investment costs, accelerate time to feedback and ultimately engage in a more creative process that leads to new, innovative and differentiating content and design concepts.
But while using models pre-trained on large volumes of generic images is great for producing cohesive imagery, most organizations seek to mimic patterns, designs and aesthetics specific to a particular brand or domain. In these instances, fine-tuning a model to understand these elements can be helpful in producing outputs better aligned with the needs of the organization. In this blog post, we will introduce the core concepts of how a model might be aligned in this manner with the hope that this helps our customers achieve more of the immediate benefits of this amazing technology.
Fine-Tuning a Model with Custom Imagery
To illustrate how a model might be fine-tuned to reflect brand and domain knowledge, let's imagine a scenario where a furniture designer wishes to ideate on some new chair designs. In this scenario, the designer may have selected a well-regarded image-generating model such as Stable Diffusion XL which has been trained on a large body of images assembled from the internet.
While this model is capable of producing a wide range of images, the designer may wish to enhance the model's understanding of the chairs it has produced in the past. Knowledge of these items will help the model produce images aligned with the general direction of the brand, something that's very important to the company as it seeks to establish a specific sense of design with its customers.
To help enable this, the designer has their team take some photos of some of their key products. Each item is captured from different angles so that the model will have insights into how the items should be rendered in different configurations. But what's critical here is that an overwhelming number of images are not needed as the designer builds on the general knowledge already baked into the Stable Diffusion model.
For each of the images associated with a given style of chair, a description is provided. Each description contains a unique name (token) for each of the items that is the subject of the picture. This token helps the model not only identify the specific item in the image but learn how this image might differ from the other images against which it has been trained. The remainder of the description is kept succinct as not to interfere with knowledge the model has already accumulated from prior training on other images.
Chair | Token | Description |
---|---|---|
BCNCHAR | A photo of a BCNCHAR chair taken from the side. | |
EMSLNG | A photo of an EMSLNG chair taken from the front. | |
HSMNCHR | A photo of a HSMNCHR chair taken from the side. | |
EMSRCK | A photo of a EMSRCK chair taken from the side. | |
NRMCHR | A photo of a NRMCHR chair taken from the front. |
Figure 2. Descriptions for each of the five chairs selected by the sample furniture design company
Using the DreamBooth framework for the fine-tuning of image-generating models, the off-the-shelf Stability Diffusion XL model is fine-tuned. The resulting model is saved for re-use and now the model can produce outputs better aligned with the designer and their team. Figure 3.
Original Stable Diffusion XL | Fine-Tuned Stable Diffusion XL |
---|---|
Figure 3. Output images from the original Stability Diffusion XL model and a version of the model fine-tuned with the images in Figure 1 provided the prompt "A photo of a brown leather (EMSLNG) chair"
Armed with this model, the design team can now explore new variations of their products (Figure 4) and even produce all-together new items reflective of the designs of previously produced items in their portfolio (Figure 5).
Enabling Model Customization with Databricks
The fine tuning of an image-generating model provides organizations with a powerful tool for the exploration of new ideas and designs. But in order to deliver this capability, they must be able to bring together a generative AI model with proprietary information assets, perform the heavy computational work of model fine-tuning and deploy the updated model in a manner that supports integration with a wide range of user applications. All of these capabilities and more are made available through the Databricks Data Intelligence Platform.
With Databricks, organizations have the ability to store, process and query both structured and unstructured information assets. Managed behind a centralized data governance layer, this data can be exposed to report consumers, analysts and data scientists to enable the widest range of consumption while preserving consistent controls over its utilization. With elastic scalability and support for the latest in GPU architectures, high performance workloads can be scaled effectively to ensure that organizations can turn around critical workloads operating on this data in a timely manner. And as an open platform, organizations can leverage both open source and proprietary models and enabling technologies, helping to ensure that as the organization's needs evolve, the platform can evolve with them.
Using built-in model management capabilities, off the shelf and customized models can be captured, evaluated, and transitioned to production deployment. Through native model serving, these models can be exposed using open and secure interfaces widely supported by modern applications and user interface technologies. With the Databricks Data Intelligence Platform, the process of turning your information assets into differentiating capabilities is greatly simplified which is why so many organizations are adopting it for the full breadth of the data and AI needs.
Want to see how Databricks can be used to fine-tune an image generating model to deliver brand-aligned images such as the ones shown above? Check out our latest solution accelerator. In the free to access notebooks, you will find step-by-step instructions and documented code illustrating the end-to-end process of turning an off-the-shelf model into a customized solution, tailored to your needs.
Check out our latest solution accelerator for creating brand-aligned images using generative AI.