Reward Modelling

Reward models are used to generate a reward signal to be used during RL training for an image generation model. Typically, they take an image and return some reward. Some may use prompts while generating reward, but this is not neccesary. The library includes some toy rewards intended primarily for debugging.

Toy Rewards

Toy reward models for testing purposes

class drlx.reward_modelling.toy_rewards.AverageBlueReward

Bases: RewardModel

Rewards “blue-ness” of image

forward(images, prompts)

Given any form of raw data (may not be tensors, may not even be batched), processes into reward scores. Inputs must all be iterable

training: bool
class drlx.reward_modelling.toy_rewards.JPEGCompressability(quality=10)

Bases: RewardModel

Rewards JPEG compression potential of image (from https://arxiv.org/pdf/2305.13301.pdf)

encode_jpeg(x, quality=95)
forward(images, prompts)

Given any form of raw data (may not be tensors, may not even be batched), processes into reward scores. Inputs must all be iterable

training: bool

Aesthetics

class drlx.reward_modelling.aesthetics.Aesthetics(device=None)

Bases: RewardModel

Reward model that rewards images with higher aesthetic score. Uses CLIP and an MLP (not put on any device by default)

Parameters:

device (torch.device) – Device to load model on

forward(images: ndarray, prompts: Iterable[str])

Given any form of raw data (may not be tensors, may not even be batched), processes into reward scores. Inputs must all be iterable

training: bool

Pickscore (WIP)

class drlx.reward_modelling.pickscore.PickScoreModel(**kwargs)

Bases: NNRewardModel

Reward model using PickScore model from PickAPic

preprocess(images: Iterable[Image], prompts: Iterable[str])

Preprocess images and prompts into tensors, making sure to move to correct device and data type

training: bool