VLM-Feedback Control and Human-in-the-Loop Interaction
Explore MuLan’s self-correction mechanism powered by a VLM-based feedback loop, and understand how its step-by-step process enables powerful human-AI collaboration.
In our last lesson, we saw how MuLan’s planner and progressive generator work together to build a complex image step-by-step. But what happens if the diffusion model makes a mistake in an early stage? Without a mechanism to catch and correct errors, these mistakes would cascade, ruining the final image.
A painter doesn’t just paint without looking; they constantly step back, critique their own work, and make corrections. To make its process robust, the MuLan system needs an internal “critic” that can do the same. This lesson explores the critic and how its step-by-step process unlocks powerful human-AI collaboration.
VLM-feedback for self-correction
This is the third and final pillar of MuLan’s architecture: a VLM-feedback control loop. After each object is generated, a Vision Language Model (VLM), such as LLaVA-1.5, is used as a critic.
Its job is to perform a number of ...