AI Features

Introduction to MuLan and the Multi-Object Generation Challenge

Explore how an agentic system design can solve complex, multi-object image generation challenges by adding a layer of planning, control, and feedback on top of standard text-to-image models.

The problem space: Text-to-image generation

The one-shot process vs. an agentic architecture

In recent years, we’ve seen an explosion in the capabilities of text-to-image (T2I) models. These AI systems can take a simple text prompt and produce visually appealing, high-quality images in a single step. As the underlying models have improved, their ability to handle compositional requests has improved remarkably.

However, as agentic system designers, our goal is to think beyond the capabilities of ...