Models

ViTExtractor
ViTCLIPExtractor
ResnetExtractor
ExtractorWithMLP
LinearTrivialDistanceSiamese
TrivialDistanceSiamese
ConcatSiamese

ViTExtractor 

class oml.models.vit_dino.extractor.ViTExtractor(weights: Optional[Union[Path, str]], arch: str, normalise_features: bool, use_multi_scale: bool = False)[source]

Bases: IExtractor

The base class for the extractors that follow VisualTransformer architecture.

__init__(weights: Optional[Union[Path, str]], arch: str, normalise_features: bool, use_multi_scale: bool = False)[source]

Parameters

weights – Path to weights or a special key to download pretrained checkpoint, use None to randomly initialize model’s weights. You can check the available pretrained checkpoints in self.pretrained_models.
arch – Might be one of vits8, vits16, vitb8, vitb16. You can check all the available options in self.constructors
normalise_features – Set True to normalise output features
use_multi_scale – Set True to use multiscale (the analogue of test time augmentations)

draw_attention(image: Union[Image, ndarray]) → ndarray[source]

Parameters: image – An image with pixel values in the range of [0..255].
Returns: An image with drawn attention maps.

Visualization of the multi-head attention on a particular image.

property feat_dim: int: The only method that obligatory to implemented.

ViTCLIPExtractor 

class oml.models.vit_clip.extractor.ViTCLIPExtractor(weights: Optional[str], arch: str, normalise_features: bool = True)[source]

Bases: IExtractor

__init__(weights: Optional[str], arch: str, normalise_features: bool = True)[source]

Parameters

weights – Path to weights or special key for pretrained ones or None for random initialization. You can check available pretrained checkpoints in ViTCLIPExtractor.pretrained_models.
arch – Might be one of vitb16_224, vitb32_224, vitl14_224, vitl14_336.
normalise_features – Set True to normalise output features

property feat_dim: int: The only method that obligatory to implemented.

ResnetExtractor 

class oml.models.resnet.extractor.ResnetExtractor(weights: Optional[Union[Path, str]], arch: str, gem_p: Optional[float], remove_fc: bool, normalise_features: bool)[source]

Bases: IExtractor

The base class for the extractors that follow ResNet architecture.

__init__(weights: Optional[Union[Path, str]], arch: str, gem_p: Optional[float], remove_fc: bool, normalise_features: bool)[source]

Parameters

weights – Path to weights or a special key to download pretrained checkpoint, use None to randomly initialize model’s weights. You can check the available pretrained checkpoints in self.pretrained_models.
arch – Different types of ResNet, please, check self.constructors
gem_p – Value of power in Generalized Mean Pooling that we use as the replacement for the default one (if gem_p == 1 or None it’s just a normal average pooling and if gem_p -> inf it’s max-pooling)
remove_fc – Set True if you want to remove the last fully connected layer. Note, that having this layer is obligatory for calling draw_gradcam() method
normalise_features – Set True to normalise output features

draw_gradcam(image: Union[ndarray, Image]) → Union[ndarray, Image][source]

Parameters: image – An image with pixel values in the range of [0..255].
Returns: An image with drawn gradients.

Visualization of the gradients on a particular image using GradCam.

property feat_dim: int: The only method that obligatory to implemented.

ExtractorWithMLP 

class oml.models.meta.projection.ExtractorWithMLP(extractor: IExtractor, mlp_features: List[int], weights: Optional[Union[Path, str]] = None, train_backbone: bool = False)[source]

Bases: IExtractor, IFreezable

Class-wrapper for extractors which an additional MLP.

__init__(extractor: IExtractor, mlp_features: List[int], weights: Optional[Union[Path, str]] = None, train_backbone: bool = False)[source]

Parameters

extractor – Instance of IExtractor (e.g. ViTExtractor)
mlp_features – Sizes of projection layers
weights – Path to weights file or None for random initialization
train_backbone – set False if you want to train only MLP head

LinearTrivialDistanceSiamese 

class oml.models.meta.siamese.LinearTrivialDistanceSiamese(feat_dim: int, identity_init: bool)[source]

Bases: IPairwiseModel

This model is a useful tool mostly for development.

__init__(feat_dim: int, identity_init: bool)[source]

Parameters

feat_dim – Expected size of each input.
identity_init – If True, models’ weights initialised in a way when the model simply estimates L2 distance between the original embeddings.

forward(x1: Tensor, x2: Tensor) → Tensor[source]

Parameters

x1 – Embedding with the shape of [batch_size, feat_dim]
x2 – Embedding with the shape of [batch_size, feat_dim]

Returns

Distance between transformed inputs.

TrivialDistanceSiamese 

class oml.models.meta.siamese.TrivialDistanceSiamese(extractor: IExtractor)[source]

Bases: IPairwiseModel

This model is a useful tool mostly for development.

__init__(extractor: IExtractor) → None[source]

Parameters: extractor – Instance of IExtractor (e.g. ViTExtractor)

forward(x1: Tensor, x2: Tensor) → Tensor[source]

Parameters

x1 – The first input.
x2 – The second input.

Returns

Distance between inputs.

ConcatSiamese 

class oml.models.meta.siamese.ConcatSiamese(extractor: IExtractor, mlp_hidden_dims: List[int], use_tta: bool = False, weights: Optional[Union[Path, str]] = None)[source]

Bases: IPairwiseModel, IFreezable

This model concatenates two inputs and passes them through a given backbone and applyies a head after that.

__init__(extractor: IExtractor, mlp_hidden_dims: List[int], use_tta: bool = False, weights: Optional[Union[Path, str]] = None) → None[source]

Parameters

extractor – Instance of IExtractor (e.g. ViTExtractor)
mlp_hidden_dims – Hidden dimensions of the head
use_tta – Set True if you want to average the results obtained by two different orders of concatenating input images. Affects only self.predict() method.
weights – Path to weights file or None for random initialization

forward(x1: Tensor, x2: Tensor) → Tensor[source]

Parameters

x1 – The first input.
x2 – The second input.

predict(x1: Tensor, x2: Tensor) → Tensor[source]

While self.forward() is called during training, this method is called during inference or validation time. For example, it allows application of some activation, which was a part of a loss function during the training.

Parameters

x1 – The first input.
x2 – The second input.

freeze() → None[source]: Function for freezing. You can use it to partially freeze a model.

unfreeze() → None[source]: Function for unfreezing. You can use it to unfreeze a model.

Models

ViTExtractor

ViTCLIPExtractor

ResnetExtractor

ExtractorWithMLP

LinearTrivialDistanceSiamese

TrivialDistanceSiamese

ConcatSiamese

ViTExtractor 

ViTCLIPExtractor 

ResnetExtractor 

ExtractorWithMLP 

LinearTrivialDistanceSiamese 

TrivialDistanceSiamese 

ConcatSiamese 