GRPO Llama-1B: This GitHub demo shows how to use GRPO to train custom LLMs.
Last updated 6 months ago