Introduction to MuLan and the Multi-Object Generation Challenge
Explore how an agentic system design can solve complex, multi-object image generation challenges by adding a layer of planning, control, and feedback on top of standard text-to-image models.
The problem space: Text-to-image generation
The one-shot process vs. an agentic architecture
In recent years, we’ve seen an explosion in the capabilities of text-to-image (T2I) models. These AI systems can take a simple text prompt and produce visually appealing, high-quality images in a single step. As the underlying models have improved, their ability to handle compositional requests has improved remarkably.
However, as agentic system designers, our goal is to think beyond the capabilities of ...