Artists vs. Scientists: The Evolution of AI Workflows for Creatives

At first, GenAI creative tools focused on highly abstracted, prompt-only interfaces. Now, we’re seeing that start to change.

Image generation is much more of an art than a science. Unlike text generation, where a single prompt generates a response, generating the right image can require multiple models and manipulations in sequence, each bringing the generation closer to the desired output. It’s similar to how an artist would layer a painting: starting with an outline, then adding shading and color until the end image resembles her vision.Early tools in the image generation ecosystem abstract much of this away. On this side of the spectrum are single shot tools like DALL-E and Microsoft Designer that work best for text-to-image (txt2img) and try to generate an acceptable image from a single prompt. They use high quality large models and LLM parsing to pick up semantic nuance in the prompt in one run.The Stable Diffusion (SD) community, in contrast, is full of mad scientists (artists?) linking together techniques to create their own distinctive styles. As this new multi-step workflow pattern emerges, an ecosystem of tools like ComfyUI, A1111 and others have cropped up to support it. They’ve helped take the hacky work of up-/downscaling, model chaining, and U-Net cross-attention tweaking and enabled creators to turn it into a templated and replicable process. The resultsare pretty cool.Ultimately, we see space for great companies across all levels of abstraction. If you’ve used one of these frameworks or are building in the space, drop me a line atdlabruna@baincapital.com! I’d love to learn more about what you’re working on.