Denoisers
DRLX generally uses conditioned denoisers for diffusion modelling. Currently, the library is made with text conditioning in mind, the base classes are with generalizability in mind, and to this end the conditional denoiser supports any kind of conditioning signal that produces an embedding.
BaseConditionalDenoiser
- class drlx.denoisers.BaseConditionalDenoiser(config: ModelConfig, sampler_config: SamplerConfig | None = None, sampler: Sampler | None = None)
Bases:
ModuleBase class for any denoiser that takes a conditioning signal during denoising process, including text conditioned denoisers.
- Parameters:
config (ModelConfig) – Configuration for model
sampler_config (SamplerConfig) – Configuration for sampler (optional). If provided, will create a default sampler.
sampler (Sampler) – Can be provided as alternative to sampler_config (also optional). If neither are provided, a default sampler will be used.
- abstract decode(latent: Tensor) Tensor[Tensor]
Decode latent vector into an image (typically called in postprocess)
- abstract encode(pixel_values: Tensor[Tensor]) Tensor
Encode image into latent vector
- abstract forward(*inputs) Tensor[Tensor]
Forward pass for denoiser. Output varies based on prediction type.
- abstract get_input_shape() Tuple
Get input shape for denoiser. Useful during training + sampling when shape of input noise to denoiser is needed.
- Returns:
Input shape as a tuple
- Return type:
Tuple[int]
- abstract postprocess(output) ndarray
Called on the output from the model after sampling to give final image
- Returns:
Final denoised image as uint8 numpy array
- Return type:
np.ndarray
- abstract preprocess(*inputs) Tensor[Tensor]
Called on the conditioning input (typically: tokenizes text prompt)
- Returns:
Conditioning input embeddings (i.e. text embeddings) as tensors
- Return type:
torch.Tensor
- sample(**kwargs)
Use the sampler to sample an image. Will require postprocess to output an image. Note that different samplers have different outputs.
- Parameters:
kwargs – Keyword arguments to sampler
- Returns:
Varies per sampler but always includes denoised latent/images
- training: bool
LDMUNet
- class drlx.denoisers.ldm_unet.LDMUNet(config: ModelConfig, sampler_config: SamplerConfig | None = None, sampler: Sampler | None = None)
Bases:
BaseConditionalDenoiserClass for Latent Diffusion Model UNet denoiser. Can optionally pass sampler information, though it is not required. Generally used in tandem with a diffusers pipeline.
- Parameters:
config (ModelConfig) – Configuration for model
sampler_config (SamplerConfig) – Configuration for sampler (optional). If provided, will create a default sampler.
sampler (Sampler) – Can be provided as alternative to sampler_config (also optional). If neither are provided, a default sampler will be used.
- forward(pixel_values: Tensor[Tensor], time_step: Tensor[Tensor] | int, input_ids: Tensor[Tensor] | None = None, attention_mask: Tensor[Tensor] | None = None, text_embeds: Tensor[Tensor] | None = None) Tensor[Tensor]
For text conditioned UNET, inputs are assumed to be: pixel_values, input_ids, attention_mask, time_step
- from_pretrained_pipeline(cls: Type, path: str)
Get unet from some pretrained model pipeline
- Parameters:
cls (Type) – Class to use for pipeline (i.e. StableDiffusionPipeline)
path (str) – Path to pretrained pipeline
- Returns:
an LDMUNet object with UNet, Text Encoder, VAE, tokenizer and scheduler from pretrained pipeline. Also returns the pretrained pipeline in case caller needs it.
- Return type:
- get_input_shape() Tuple[int]
Figure out latent noise input shape for the UNet. Requires that unet and vae are defined
- Returns:
Input shape as a tuple
- Return type:
Tuple[int]
- postprocess(output: Tensor[Tensor], vae_device=None)
Post process
- preprocess(text: Iterable[str], mode='tokens', **embed_kwargs)
Preprocess text input, either into tokens or into embeddings.
- Parameters:
mode (str) – Either “tokens” or “embeds”
text (Iterable[str]) – Text to preprocess
- Returns:
Either a tuple of tensors for input_ids and attention_mask or a tensor of embeddings
- Return type:
Union[Tuple[Tensor, Tensor], Tensor]
- training: bool