zae_engine.models.builds package¶

Submodules¶

zae_engine.models.builds.autoencoder module¶

class zae_engine.models.builds.autoencoder.AutoEncoder(block: ~typing.Type[~zae_engine.nn_night.blocks.unet_block.UNetBlock | ~torch.nn.modules.module.Module], ch_in: int, ch_out: int, width: int, layers: ~typing.Sequence[int], groups: int = 1, dilation: int = 1, norm_layer: ~typing.Callable[[...], ~torch.nn.modules.module.Module] = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, skip_connect: bool = False)[source]¶

Bases: Module

A flexible AutoEncoder architecture with optional skip connections for U-Net style implementations.

Parameters:

block (Type[Union[blk.UNetBlock, nn.Module]]) – The basic building block for the encoder and decoder (e.g., ResNet block or UNetBlock).
ch_in (int) – Number of input channels.
ch_out (int) – Number of output channels.
width (int) – Base width for the encoder and decoder layers.
layers (Sequence[int]) – Number of blocks in each stage of the encoder and decoder.
groups (int, optional) – Number of groups for group normalization in the block. Default is 1.
dilation (int, optional) – Dilation rate for convolutional layers. Default is 1.
norm_layer (Callable[..., nn.Module], optional) – Normalization layer to use. Default is nn.BatchNorm2d.
skip_connect (bool, optional) – If True, adds skip connections for U-Net style. Default is False.

encoder¶

The encoder module that encodes the input image.

Type:: nn.Module

bottleneck¶

The bottleneck layer between the encoder and decoder.

Type:: nn.Module

decoder¶

The decoder module that reconstructs the input image.

Type:: nn.ModuleList

feature_vectors¶

Stores intermediate feature maps for skip connections when skip_connect is True.

Type:: list

up_pools¶

List of transposed convolution layers for upsampling in the decoder.

Type:: nn.ModuleList

fc¶

The final output convolutional layer.

Type:: nn.Conv2d

sig¶

Sigmoid activation function for the output.

Type:: nn.Sigmoid

hook_handles¶

List of hook handles for encoder and bottleneck hooks.

Type:: OrderedDict

feature_hook(module, input_tensor, output_tensor)[source]¶: Hooks intermediate feature maps for skip connections.

feature_output_hook(module, input_tensor, output_tensor)[source]¶: Hooks the final feature map before bottleneck.

forward(x)[source]¶

Defines the forward pass of the autoencoder.

Parameters:: x (torch.Tensor) – The input tensor. Shape: (batch_size, channels, height, width).
Returns:: The reconstructed output tensor. Shape: (batch_size, channels, height, width).
Return type:: torch.Tensor

get_hooks()[source]¶: Returns the list of encoder and bottleneck hook handles.

remove_hooks()[source]¶: Removes all hooks registered in the encoder and bottleneck.

class zae_engine.models.builds.autoencoder.VAE(block: ~typing.Type[~zae_engine.nn_night.blocks.unet_block.UNetBlock | ~torch.nn.modules.module.Module], ch_in: int, ch_out: int, width: int, layers: ~typing.Sequence[int], encoder_output_shape: ~typing.Sequence[int], condition_dim: int | None = None, groups: int = 1, dilation: int = 1, norm_layer: ~typing.Callable[[...], ~torch.nn.modules.module.Module] = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, skip_connect: bool = False, latent_dim: int = 128)[source]¶

Bases: AutoEncoder

Variational AutoEncoder (VAE) architecture extending AutoEncoder with optional Conditional functionality.

Parameters:

block (Type[Union[blk.UNetBlock, nn.Module]]) – The basic building block for the encoder and decoder (e.g., ResNet block or UNetBlock).
ch_in (int) – Number of input channels.
ch_out (int) – Number of output channels.
width (int) – Base width for the encoder and decoder layers.
layers (Sequence[int]) – Number of blocks in each stage of the encoder and decoder.
encoder_output_shape (Sequence[int]) – The shape of the encoder’s output (excluding batch size), e.g., [channels, height, width].
condition_dim (int, optional) – Dimension of the condition vector (e.g., number of classes for one-hot encoding). Default is None.
groups (int, optional) – Number of groups for group normalization in the block. Default is 1.
dilation (int, optional) – Dilation rate for convolutional layers. Default is 1.
norm_layer (Callable[..., nn.Module], optional) – Normalization layer to use. Default is nn.BatchNorm2d.
skip_connect (bool, optional) – If True, adds skip connections for U-Net style. Default is False.
latent_dim (int, optional) – Dimension of the latent space. Default is 128.

encoder¶

The encoder module that encodes the input image.

Type:: nn.Module

bottleneck¶

The bottleneck layer between the encoder and decoder.

Type:: nn.Module

decoder¶

The decoder module that reconstructs the input image.

Type:: nn.ModuleList

feature_vectors¶

Stores intermediate feature maps for skip connections when skip_connect is True.

Type:: list

up_pools¶

List of transposed convolution layers for upsampling in the decoder.

Type:: nn.ModuleList

fc¶

The final output convolutional layer.

Type:: nn.Conv2d

sig¶

Sigmoid activation function for the output.

Type:: nn.Sigmoid

fc_mu¶

Fully connected layer to generate mean of latent distribution.

Type:: nn.Linear

fc_logvar¶

Fully connected layer to generate log variance of latent distribution.

Type:: nn.Linear

fc_z¶

Fully connected layer to map sampled latent variable back to encoder channels.

Type:: nn.Linear

encoder_output_shape¶

The shape of the encoder’s output (excluding batch size), e.g., [channels, height, width].

Type:: List[int]

encoder_output_features¶

Total number of features after flattening the encoder’s output.

Type:: int

condition_dim¶

Dimension of the condition vector. If None, operates as standard VAE.

Type:: int or None

forward(x: Tensor, c: Tensor | None = None) → Tuple[Tensor, Tensor, Tensor][source]¶

Defines the forward pass of the VAE or CVAE.

Parameters:

x (torch.Tensor) – The input tensor. Shape: (batch_size, channels, height, width).
c (torch.Tensor or None, optional) – The condition tensor. Shape: (batch_size, condition_dim). If None, operates as standard VAE.

Returns:

The reconstructed output tensor, along with mu and logvar.

Return type:

Tuple[Tensor, Tensor, Tensor]

reparameterize(mu: Tensor, logvar: Tensor) → Tensor[source]¶

Reparameterization trick to sample from N(mu, var) from N(0,1).

Parameters:

mu (torch.Tensor) – Mean of the latent distribution.
logvar (torch.Tensor) – Log variance of the latent distribution.

Returns:

Sampled latent variable z.

Return type:

torch.Tensor

zae_engine.models.builds.cnn module¶

class zae_engine.models.builds.cnn.CNNBase(block: ~typing.Type[~zae_engine.nn_night.blocks.resblock.BasicBlock | ~zae_engine.nn_night.blocks.resblock.Bottleneck | ~torch.nn.modules.module.Module], ch_in: int, ch_out: int, width: int, layers: ~typing.Sequence[int], groups: int = 1, dilation: int = 1, norm_layer: ~typing.Callable[[...], ~torch.nn.modules.module.Module] = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>)[source]¶

Bases: Module

forward(x: Tensor) → Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_output_shape(input_size: Tuple[int, int, int, int]) → Tuple[int, int, int, int][source]¶

Calculate the encoder’s output shape based on a dummy input.

Parameters:: input_size (Tuple[int, int, int, int]) – The size of the input tensor (batch_size, channels, height, width).
Returns:: The shape of the encoder’s output tensor.
Return type:: Tuple[int, int, int, int]

initializer()[source]¶

make_body(blocks: List[Type[BasicBlock | Bottleneck]] | tuple[Type[BasicBlock | Bottleneck]], ch_in: int, ch_out: int, stride: int = 1) → Sequential[source]¶

zero_initializer()[source]¶

zae_engine.models.builds.nested_autoencoder module¶

class zae_engine.models.builds.nested_autoencoder.NestedUNet(in_ch=3, out_ch=1, width: int | Sequence = 32, heights: Sequence[int] = (7, 6, 5, 4, 4), dilation_heights: Sequence[int] = (2, 2, 2, 2, 4), middle_width: int | Sequence = (32, 32, 64, 128, 256))[source]¶

Bases: Module

Implementation of the U²-Net architecture.

Parameters:

in_ch (int, optional) – Number of input channels. Default is 3.
out_ch (int, optional) – Number of output channels. Default is 1.
width (Union[int, Sequence], optional) – Initial number of middle channels. Default is 32.
heights (Sequence[int], optional) – List of RSU block heights for each encoder layer. Default is (7, 6, 5, 4, 4).
dilation_heights (Sequence[int], optional) – List of dilation heights for each encoder layer. Default is (2, 2, 2, 2, 4).
middle_width (Union[int, Sequence], optional) – List of middle channels for each RSU block. Default is (32, 32, 64, 128, 256).

References

forward(x)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

zae_engine.models.builds.transformer module¶

class zae_engine.models.builds.transformer.BertBase(encoder_embedding: Module, encoder: Module, sep_token_id: int = 102, **kwargs)[source]¶

Bases: Module

BertBase is a specialized version of TransformerBase, including a pooler for processing the [CLS] token.

This class adds a pooler layer that processes the first token ([CLS]) from the encoder output, similar to the original BERT architecture. If a hidden dimension is provided during initialization, the pooler will be applied. Otherwise, only the encoder output is returned.

Parameters:

encoder_embedding (nn.Module) – The embedding layer for the encoder input.
encoder (nn.Module) – The encoder module responsible for transforming the input sequence.
dim_hidden (int, optional) – The hidden dimension used by the pooler layer. If provided, a pooler layer will be applied to the [CLS] token (first token) of the encoder output. Otherwise, only the encoder output is returned.
sep_token_id (int, optional) – The ID representing the [SEP] token, used to identify sentence boundaries. The default value is 102, which is the standard for Hugging Face’s BERT tokenizer. In BERT, the [SEP] token separates different sentences or segments, and is expected to be present once or twice in the input. An error will be raised if more than two [SEP] tokens are found in the input.

Notes

The default value for sep_token_id is 102, which corresponds to the [SEP] token in Hugging Face’s pre-trained BERT models. This token is used to separate sentences or indicate the end of a sentence. If you are using a different tokenizer or model, you may need to adjust this value accordingly.
If input_sequence is precomputed embeddings (dtype is float), the embedding layer is skipped, and position_ids and token_type_ids are not generated, as these are already embedded.

forward(input_sequence, src_mask=None, src_key_padding_mask=None)[source]¶: Performs the forward pass. If a hidden dimension (dim_hidden) is provided, the pooler is applied to the [CLS] token. Otherwise, it returns the encoder output as-is.

forward(input_sequence: Tensor, src_mask=None, src_key_padding_mask=None)[source]¶

Forward pass through the BERT model with an optional pooler.

If a hidden dimension is provided, the pooler is applied to the first token of the encoder output. Otherwise, the encoder output is returned as-is.

Parameters:

input_sequence (torch.Tensor) – The input tensor representing either input_ids (token IDs) or input embeddings. If dtype is int, it is assumed to be token IDs (input_ids). If dtype is float, it is assumed to be precomputed embeddings (inputs_embeds), and the embedding layer is skipped. In this case, position_ids and token_type_ids are not generated.
src_mask (torch.Tensor, optional) – Source mask for masking certain positions in the encoder input. Shape: (batch_size, seq_len).
src_key_padding_mask (torch.Tensor, optional) – Mask for padding tokens in the source sequence. Shape: (batch_size, seq_len).

Returns:

If dim_hidden is provided, returns the pooled output from the [CLS] token. Otherwise, returns the encoder output for the entire sequence. Shape: (batch_size, dim_hidden) if pooled, or (batch_size, seq_len, dim_hidden) if not.

Return type:

torch.Tensor

class zae_engine.models.builds.transformer.CoderBase(d_model: int, num_layers: int, layer_factory: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.transformer.TransformerEncoderLayer'>, dim_feedforward: int = 2048, dropout: float = 0.1, num_heads: int = 8, **factory_kwargs)[source]¶

Bases: Module

Base class for both Encoder and Decoder that defines the core structure of the transformer layers.

Parameters:

d_model (int) – The dimension of the embedding space (output size of each layer).
num_layers (int) – The number of layers in the encoder/decoder.
layer_factory (nn.Module, optional) – Custom layer module. Defaults to nn.TransformerEncoderLayer for encoders and nn.TransformerDecoderLayer for decoders.
norm_layer (#)
'LayerNorm'. (# The normalization layer to apply. Can be a string or custom nn.Module. Default is)
dim_feedforward (int, optional) – The dimension of the feedforward network. Default is 2048.
dropout (float, optional) – Dropout rate for regularization. Default is 0.1.
num_heads (int, optional) – Number of attention heads in multi-head attention. Default is 8.
factory_kwargs (dict, optional) – Additional arguments to pass to layer_factory when creating layers.

class zae_engine.models.builds.transformer.DecoderBase(d_model: int, num_layers: int, layer_factory: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.transformer.TransformerDecoderLayer'>, dim_feedforward: int = 2048, dropout: float = 0.1, num_heads: int = 8, **factory_kwargs)[source]¶

Bases: CoderBase

Decoder class that builds on CoderBase for decoding sequences based on the encoder’s memory.

Parameters:

d_model (int) – The dimension of the embedding space (output size of each layer).
num_layers (int) – The number of layers in the decoder.
layer_factory (nn.Module, optional) – Custom layer module. Defaults to nn.TransformerDecoderLayer.
norm_layer (str or nn.Module, optional) – The normalization layer to apply. Can be a string or custom nn.Module. Default is ‘LayerNorm’.
dim_feedforward (int, optional) – The dimension of the feedforward network. Default is 2048.
dropout (float, optional) – Dropout rate for regularization. Default is 0.1.
num_heads (int, optional) – Number of attention heads in multi-head attention. Default is 8.
factory_kwargs (dict, optional) – Additional arguments to pass to layer_factory when creating layers.

forward(tgt, memory, tgt_mask=None, memory_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None)[source]¶

Forward pass through the decoder.

Parameters:

tgt (torch.Tensor) – The input tensor representing the target sequence. Shape: (batch_size, seq_len, d_model).
memory (torch.Tensor) – The encoded memory output from the encoder. Shape: (batch_size, seq_len_src, d_model).
tgt_mask (torch.Tensor, optional) – A mask tensor to prevent attention to certain positions in the target sequence.
memory_mask (torch.Tensor, optional) – A mask tensor to prevent attention to certain positions in the memory sequence (from the encoder).
tgt_key_padding_mask (torch.Tensor, optional) – A mask tensor to prevent attention to padding tokens in the target sequence.
memory_key_padding_mask (torch.Tensor, optional) – A mask tensor to prevent attention to padding tokens in the memory sequence.

Returns:

The decoded output of the target sequence. Shape: (batch_size, seq_len_tgt, d_model).

Return type:

torch.Tensor

class zae_engine.models.builds.transformer.EncoderBase(d_model: int, num_layers: int, layer_factory: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.transformer.TransformerEncoderLayer'>, dim_feedforward: int = 2048, dropout: float = 0.1, num_heads: int = 8, **factory_kwargs)[source]¶

Bases: CoderBase

Encoder class that builds on CoderBase for encoding the input sequences.

Parameters:

d_model (int) – The dimension of the embedding space (output size of each layer).
num_layers (int) – The number of layers in the encoder.
layer_factory (nn.Module, optional) – Custom layer module. Defaults to nn.TransformerEncoderLayer.
norm_layer (#)
'LayerNorm'. (# The normalization layer to apply. Can be a string or custom nn.Module. Default is)
dim_feedforward (int, optional) – The dimension of the feedforward network. Default is 2048.
dropout (float, optional) – Dropout rate for regularization. Default is 0.1.
num_heads (int, optional) – Number of attention heads in multi-head attention. Default is 8.
factory_kwargs (dict, optional) – Additional arguments to pass to layer_factory when creating layers.

forward(src, src_mask=None, src_key_padding_mask=None)[source]¶

Forward pass through the encoder.

Parameters:

src (torch.Tensor) – The input tensor representing the source sequence. Shape: (batch_size, seq_len, d_model).
src_mask (torch.Tensor, optional) – A mask tensor to prevent attention to certain positions in the source sequence.
src_key_padding_mask (torch.Tensor, optional) – A mask tensor to prevent attention to padding tokens in the source sequence.

Returns:

The encoded output of the source sequence. Shape: (batch_size, seq_len, d_model).

Return type:

torch.Tensor

class zae_engine.models.builds.transformer.TransformerBase(encoder_embedding: Module, decoder_embedding: Module = None, encoder: Module = Identity(), decoder: Module = None)[source]¶

Bases: Module

A flexible Transformer model that supports both encoder-only and encoder-decoder architectures.

Parameters:

encoder_embedding (nn.Module) – The embedding layer for the encoder input.
decoder_embedding (nn.Module, optional) – The embedding layer for the decoder input. If not provided, encoder_embedding is used for both encoder and decoder.
encoder (nn.Module, optional) – The encoder module. Defaults to nn.Identity(), which can be replaced with any custom encoder (e.g., TransformerEncoder).
decoder (nn.Module, optional) – The decoder module. If None, the model operates as an encoder-only model (e.g., BERT). Otherwise, uses a decoder (e.g., for translation models).

Notes

If decoder is None, the model acts as an encoder-only transformer (similar to BERT).
If decoder is provided, the model functions as an encoder-decoder transformer (e.g., for translation tasks).
The forward pass adjusts based on the presence of the decoder.

forward(src, tgt=None, src_mask=None, tgt_mask=None, src_key_padding_mask=None, tgt_key_padding_mask=None)[source]¶: Forward pass through the model. If tgt and decoder are provided, both encoder and decoder are used. Otherwise, only the encoder is applied.

forward(src, tgt=None, src_mask=None, tgt_mask=None, src_key_padding_mask=None, tgt_key_padding_mask=None)[source]¶

Forward pass through the Transformer model.

Parameters:

src (torch.Tensor) – The input tensor representing the source sequence (e.g., for BERT-style models). Shape: (batch_size, seq_len).
tgt (torch.Tensor, optional) – The input tensor representing the target sequence (for models with a decoder). Shape: (batch_size, seq_len).
src_mask (torch.Tensor, optional) – Source mask for masking certain positions in the encoder input.
tgt_mask (torch.Tensor, optional) – Target mask for masking certain positions in the decoder input.
src_key_padding_mask (torch.Tensor, optional) – Mask for padding tokens in the source sequence.
tgt_key_padding_mask (torch.Tensor, optional) – Mask for padding tokens in the target sequence.

Returns:

If a decoder is provided, returns the output of the decoder. Otherwise, returns the output of the encoder.

Return type:

torch.Tensor

Module contents¶

class zae_engine.models.builds.DummyModel(*args, **kwargs)[source]¶

Bases: Module

forward(x)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.