GRPO

参考材料

GRPO Llama-1B: This GitHub demo shows how to use GRPO to train custom LLMs.

Last updated