I’m an LLM researcher with a passion for explaining scientific concepts to others.

Rubric-Based Rewards for RL by Cameron R. Wolfe, Ph.D.

Extending the benefits of large-scale RL training to non-verifiable domains...

Read on Substack