Autonomous RL environment factory
Create a job. The backend researches anchors, builds worlds, verifies rewards, and packages the gym.
End-to-end pipeline: data injection, company modeling, sandbox, database, tools, testing, task synthesis, baseline gate, RL loop, rewards, GRPO, KL guardrail, arena evolution, packaging.
12+Stages
Graph+ProgTasks
VCodeRewards
DockerOutput