Curriculum Group Policy Optimization:
Adaptive Sampling for Unleashing the
Potential of Text-to-Image Generation
The paper will be announced later.
Clone this repository and install packages.
git clone https://github.com/baoteng-li/CGPO.git
cd CGPO
conda create -n cgpo python=3.10.16
pip install -e .We adopted the same reward model processing approach as Flow-GRPO. Since each reward model may rely on different versions, combining them in one Conda environment can cause version conflicts. To avoid this, we adopt a remote server setup inspired by ddpo-pytorch. You only need to install the specific reward model you plan to use. For more information, please refer to Flow-GRPO.
Please create a new Conda virtual environment and install the corresponding dependencies according to the instructions in reward-server.
Please install paddle-ocr:
pip install paddlepaddle-gpu==2.6.2
pip install paddleocr==2.9.1
pip install python-LevenshteinThen, pre-download the model using the Python command line:
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=False, lang="en", use_gpu=False, show_log=False)PickScore requires no additional installation.
Single-node training:
bash scripts/single_node/cgpo.shYou can adjust the parameters in config/grpo.py to tune different hyperparameters. An empirical finding is that config.sample.train_batch_size * num_gpu / config.sample.num_image_per_prompt * config.sample.num_batches_per_epoch = 48, i.e., group_number=48, group_size=24.
Additionally, setting config.train.gradient_accumulation_steps = config.sample.num_batches_per_epoch // 2 also yields good performance.
This project is based on Flow-GRPO. Thank you for your outstanding contributions to the community.