Research notes

Motion-language / motion control

Open-vocabulary natural-language instruction following without vision, using motion-language alignment + hierarchical RL.

PDF

Embedded preview (with fallback).

If the viewer doesn’t load, use the Open PDF button.

Snapshot

High-level overview.

Goal

Instruction-following

Enable MuJoCo agents to follow open-vocabulary natural-language instructions without vision by combining motion-language alignment with hierarchical reinforcement learning.

Motion representation

Motion tokens

Use a MotionGPT-style pipeline: VQ-VAE motion tokenization + motion-language alignment to connect text to motion.

Hierarchy

High/low level

High-level policy selects skills conditioned on language; low-level controller executes atomic motions learned from mocap.

Evaluation

MuJoCo

Evaluate on Humanoid, HalfCheetah, Ant, and manipulation tasks using motion-language alignment as the reward signal.