Problem

Slow reward pipelines

Many language-conditioned control methods rely on rendering the simulator and computing rewards with vision-language models (e.g., CLIP). That render-to-CLIP pipeline is expensive and makes real-time human-in-the-loop control impractical.

Continuous Control from Open-Vocabulary Feedback

PDF

Snapshot

Problem

Key idea

Reward model

Learning setup

Efficiency

Real-time control

Generalization

Evaluation