Reward Modelling
Reward models are used to generate a reward signal to be used during RL training for an image generation model. Typically, they take an image and return some reward. Some may use prompts while generating reward, but this is not neccesary. The library includes some toy rewards intended primarily for debugging.
Toy Rewards
Toy reward models for testing purposes
- class drlx.reward_modelling.toy_rewards.AverageBlueReward
Bases:
RewardModelRewards “blue-ness” of image
- forward(images, prompts)
Given any form of raw data (may not be tensors, may not even be batched), processes into reward scores. Inputs must all be iterable
- training: bool
- class drlx.reward_modelling.toy_rewards.JPEGCompressability(quality=10)
Bases:
RewardModelRewards JPEG compression potential of image (from https://arxiv.org/pdf/2305.13301.pdf)
- encode_jpeg(x, quality=95)
- forward(images, prompts)
Given any form of raw data (may not be tensors, may not even be batched), processes into reward scores. Inputs must all be iterable
- training: bool
Aesthetics
- class drlx.reward_modelling.aesthetics.Aesthetics(device=None)
Bases:
RewardModelReward model that rewards images with higher aesthetic score. Uses CLIP and an MLP (not put on any device by default)
- Parameters:
device (torch.device) – Device to load model on
- forward(images: ndarray, prompts: Iterable[str])
Given any form of raw data (may not be tensors, may not even be batched), processes into reward scores. Inputs must all be iterable
- training: bool
Pickscore (WIP)
- class drlx.reward_modelling.pickscore.PickScoreModel(**kwargs)
Bases:
NNRewardModelReward model using PickScore model from PickAPic
- preprocess(images: Iterable[Image], prompts: Iterable[str])
Preprocess images and prompts into tensors, making sure to move to correct device and data type
- training: bool