Models

ViTExtractor

class oml.models.vit_dino.extractor.ViTExtractor(weights: Optional[Union[Path, str]], arch: str, normalise_features: bool, use_multi_scale: bool = False)[source]

Bases: IExtractor

The base class for the extractors that follow VisualTransformer architecture.

__init__(weights: Optional[Union[Path, str]], arch: str, normalise_features: bool, use_multi_scale: bool = False)[source]
Parameters
  • weights – Path to weights or a special key to download pretrained checkpoint, use None to randomly initialize model’s weights. You can check the available pretrained checkpoints in self.pretrained_models.

  • arch – Might be one of vits8, vits16, vitb8, vitb16. You can check all the available options in self.constructors

  • normalise_features – Set True to normalise output features

  • use_multi_scale – Set True to use multiscale (the analogue of test time augmentations)

draw_attention(image: Union[Image, ndarray]) ndarray[source]
Parameters

image – An image with pixel values in the range of [0..255].

Returns

An image with drawn attention maps.

Visualization of the multi-head attention on a particular image.

property feat_dim: int

The only method that obligatory to implemented.

ViTCLIPExtractor

class oml.models.vit_clip.extractor.ViTCLIPExtractor(weights: Optional[str], arch: str, normalise_features: bool = True)[source]

Bases: IExtractor

__init__(weights: Optional[str], arch: str, normalise_features: bool = True)[source]
Parameters
  • weights – Path to weights or special key for pretrained ones or None for random initialization. You can check available pretrained checkpoints in ViTCLIPExtractor.pretrained_models.

  • arch – Might be one of vitb16_224, vitb32_224, vitl14_224, vitl14_336.

  • normalise_features – Set True to normalise output features

property feat_dim: int

The only method that obligatory to implemented.

ResnetExtractor

class oml.models.resnet.extractor.ResnetExtractor(weights: Optional[Union[Path, str]], arch: str, gem_p: Optional[float], remove_fc: bool, normalise_features: bool)[source]

Bases: IExtractor

The base class for the extractors that follow ResNet architecture.

__init__(weights: Optional[Union[Path, str]], arch: str, gem_p: Optional[float], remove_fc: bool, normalise_features: bool)[source]
Parameters
  • weights – Path to weights or a special key to download pretrained checkpoint, use None to randomly initialize model’s weights. You can check the available pretrained checkpoints in self.pretrained_models.

  • arch – Different types of ResNet, please, check self.constructors

  • gem_p – Value of power in Generalized Mean Pooling that we use as the replacement for the default one (if gem_p == 1 or None it’s just a normal average pooling and if gem_p -> inf it’s max-pooling)

  • remove_fc – Set True if you want to remove the last fully connected layer. Note, that having this layer is obligatory for calling draw_gradcam() method

  • normalise_features – Set True to normalise output features

draw_gradcam(image: Union[ndarray, Image]) Union[ndarray, Image][source]
Parameters

image – An image with pixel values in the range of [0..255].

Returns

An image with drawn gradients.

Visualization of the gradients on a particular image using GradCam.

property feat_dim: int

The only method that obligatory to implemented.

ExtractorWithMLP

class oml.models.meta.projection.ExtractorWithMLP(extractor: IExtractor, mlp_features: List[int], weights: Optional[Union[Path, str]] = None, train_backbone: bool = False)[source]

Bases: IExtractor, IFreezable

Class-wrapper for extractors which an additional MLP.

__init__(extractor: IExtractor, mlp_features: List[int], weights: Optional[Union[Path, str]] = None, train_backbone: bool = False)[source]
Parameters
  • extractor – Instance of IExtractor (e.g. ViTExtractor)

  • mlp_features – Sizes of projection layers

  • weights – Path to weights file or None for random initialization

  • train_backbone – set False if you want to train only MLP head

LinearTrivialDistanceSiamese

class oml.models.meta.siamese.LinearTrivialDistanceSiamese(feat_dim: int, identity_init: bool)[source]

Bases: IPairwiseModel

This model is a useful tool mostly for development.

__init__(feat_dim: int, identity_init: bool)[source]
Parameters
  • feat_dim – Expected size of each input.

  • identity_init – If True, models’ weights initialised in a way when the model simply estimates L2 distance between the original embeddings.

forward(x1: Tensor, x2: Tensor) Tensor[source]
Parameters
  • x1 – Embedding with the shape of [batch_size, feat_dim]

  • x2 – Embedding with the shape of [batch_size, feat_dim]

Returns

Distance between transformed inputs.

TrivialDistanceSiamese

class oml.models.meta.siamese.TrivialDistanceSiamese(extractor: IExtractor)[source]

Bases: IPairwiseModel

This model is a useful tool mostly for development.

__init__(extractor: IExtractor) None[source]
Parameters

extractor – Instance of IExtractor (e.g. ViTExtractor)

forward(x1: Tensor, x2: Tensor) Tensor[source]
Parameters
  • x1 – The first input.

  • x2 – The second input.

Returns

Distance between inputs.

ConcatSiamese

class oml.models.meta.siamese.ConcatSiamese(extractor: IExtractor, mlp_hidden_dims: List[int], use_tta: bool = False, weights: Optional[Union[Path, str]] = None)[source]

Bases: IPairwiseModel, IFreezable

This model concatenates two inputs and passes them through a given backbone and applyies a head after that.

__init__(extractor: IExtractor, mlp_hidden_dims: List[int], use_tta: bool = False, weights: Optional[Union[Path, str]] = None) None[source]
Parameters
  • extractor – Instance of IExtractor (e.g. ViTExtractor)

  • mlp_hidden_dims – Hidden dimensions of the head

  • use_tta – Set True if you want to average the results obtained by two different orders of concatenating input images. Affects only self.predict() method.

  • weights – Path to weights file or None for random initialization

forward(x1: Tensor, x2: Tensor) Tensor[source]
Parameters
  • x1 – The first input.

  • x2 – The second input.

predict(x1: Tensor, x2: Tensor) Tensor[source]

While self.forward() is called during training, this method is called during inference or validation time. For example, it allows application of some activation, which was a part of a loss function during the training.

Parameters
  • x1 – The first input.

  • x2 – The second input.

freeze() None[source]

Function for freezing. You can use it to partially freeze a model.

unfreeze() None[source]

Function for unfreezing. You can use it to unfreeze a model.