Motion is finished — read the paper / Open-vocabulary control without vision / 28.8× faster than CLIP rewards / Real-time instruction following /

Research paper

Continuous Control from Open-Vocabulary Feedback

Replacing visual rewards with motion-language alignment to enable fast, real-time instruction following in MuJoCo — without vision.

PDF

Embedded preview (with fallback).

If the viewer doesn’t load, use the Open PDF button.

Snapshot

What the project is about (and what it achieved).

Problem

Slow reward pipelines

Many language-conditioned control methods rely on rendering the simulator and computing rewards with vision-language models (e.g., CLIP). That render-to-CLIP pipeline is expensive and makes real-time human-in-the-loop control impractical.

Key idea

No vision needed

Compute reward directly from joint trajectories: use motion-language similarity as the reward signal, bypassing visual rendering entirely.

Reward model

MotionGPT

Convert MuJoCo motion features to a HumanML3D-style representation and score how well the motion matches the instruction using MotionGPT’s pretrained motion encoder.

Learning setup

Hierarchical RL

Train locomotion policies with PPO using a simple hierarchy: a high-level policy outputs target joint positions, and a low-level PD controller executes stable torques.

Efficiency

28.8× faster

Motion-language rewards compute in ~0.52ms (≈1,938 rewards/s) versus ~14.85ms for CLIP + rendering, with ~32.5× lower GPU memory usage.

Real-time control

Interactive loop

The speedup enables live instruction following: users can issue natural language commands and observe immediate agent responses at interactive rates.

Generalization

97–99% transfer

Policies trained on a single instruction retain similarity on paraphrases like “sprint forward” / “dash forward”, showing semantic understanding beyond memorization.

Evaluation

5 MuJoCo envs

Tested on Humanoid, Ant, HalfCheetah, Walker2d, and Hopper. Strong alignment on planar/quadruped locomotion, with stability/morphology limits on Humanoid/Hopper.