Losses

TripletLoss

class oml.losses.triplet.TripletLoss(margin: Optional[float], reduction: str = 'mean', need_logs: bool = False)[source]

Bases: Module

Class, which combines classical TripletMarginLoss and SoftTripletLoss. The idea of SoftTripletLoss is the following: instead of using the classical formula loss = relu(margin + positive_distance - negative_distance) we use loss = log1p(exp(positive_distance - negative_distance)). It may help to solve the often problem when TripletMarginLoss converges to it’s margin value (also known as dimension collapse).

__init__(margin: Optional[float], reduction: str = 'mean', need_logs: bool = False)[source]
Parameters
  • margin – Margin value, set None to use SoftTripletLoss

  • reductionmean, sum or none

  • need_logs – Set True if you want to store logs

forward(anchor: Tensor, positive: Tensor, negative: Tensor) Tensor[source]
Parameters
  • anchor – Anchor features with the shape of (batch_size, feat)

  • positive – Positive features with the shape of (batch_size, feat)

  • negative – Negative features with the shape of (batch_size, feat)

Returns

Loss value

TripletLossPlain

class oml.losses.triplet.TripletLossPlain(margin: Optional[float], reduction: str = 'mean', need_logs: bool = False)[source]

Bases: Module

The same as TripletLoss, but works with anchor, positive and negative features stacked together.

__init__(margin: Optional[float], reduction: str = 'mean', need_logs: bool = False)[source]
Parameters
  • margin – Margin value, set None to use SoftTripletLoss

  • reductionmean, sum or none

  • need_logs – Set True if you want to store logs

forward(features: Tensor) Tensor[source]
Parameters

features – Features with the shape of [batch_size, feat] with the following structure: 0,1,2 are indices of the 1st triplet, 3,4,5 are indices of the 2nd triplet, and so on. Thus, the features contains (N / 3) triplets

Returns

Loss value

TripletLossWithMiner

class oml.losses.triplet.TripletLossWithMiner(margin: ~typing.Optional[float], miner: ~oml.interfaces.miners.ITripletsMiner = <oml.miners.inbatch_all_tri.AllTripletsMiner object>, reduction: str = 'mean', need_logs: bool = False)[source]

Bases: ITripletLossWithMiner

This class combines Miner and TripletLoss.

__init__(margin: ~typing.Optional[float], miner: ~oml.interfaces.miners.ITripletsMiner = <oml.miners.inbatch_all_tri.AllTripletsMiner object>, reduction: str = 'mean', need_logs: bool = False)[source]
Parameters
  • margin – Margin value, set None to use SoftTripletLoss

  • miner – A miner that implements the logic of picking triplets to pass them to the triplet loss.

  • reductionmean, sum or none

  • need_logs – Set True if you want to store logs

forward(features: Tensor, labels: Union[Tensor, List[int]]) Tensor[source]
Parameters
  • features – Features with the shape [batch_size, feat]

  • labels – Labels with the size of batch_size

Returns

Loss value

SurrogatePrecision

class oml.losses.surrogate_precision.SurrogatePrecision(k: int, temperature1: float = 1.0, temperature2: float = 0.01, reduction: str = 'mean')[source]

Bases: Module

This loss is a differentiable approximation of Precision@k metric.

The loss is described in the following paper under a bit different name: Recall@k Surrogate Loss with Large Batches and Similarity Mixup.

The idea is that we express the formula for Precision@k using two step functions (aka Heaviside functions). Then we approximate them using two sigmoid functions with temperatures. The smaller temperature the close sigmoid to the step function, but the gradients are sparser, and vice versa. In the original paper t1 = 1.0 and t2 = 0.01 have been used.

__init__(k: int, temperature1: float = 1.0, temperature2: float = 0.01, reduction: str = 'mean')[source]
Parameters
  • k – Parameter of Precision@k.

  • temperature1 – Scaling factor for the 1st sigmoid, see docs above.

  • temperature2 – Scaling factor for the 2nd sigmoid, see docs above.

  • reductionmean, sum or none

forward(features: Tensor, labels: Tensor) Tensor[source]
Parameters
  • features – Features with the shape of [batch_size, feature_size]

  • labels – Labels with the size of batch_size

Returns

Loss value

ArcFaceLoss

class oml.losses.arcface.ArcFaceLoss(in_features: int, num_classes: int, m: float = 0.5, s: float = 64, smoothing_epsilon: float = 0, label2category: Optional[Dict[Any, Any]] = None, reduction: str = 'mean')[source]

Bases: Module

ArcFace loss from paper with possibility to use label smoothing. It contains projection size of num_features x num_classes inside itself. Please make sure that class labels started with 0 and ended as num_classes - 1.

__init__(in_features: int, num_classes: int, m: float = 0.5, s: float = 64, smoothing_epsilon: float = 0, label2category: Optional[Dict[Any, Any]] = None, reduction: str = 'mean')[source]
Parameters
  • in_features – Input feature size

  • num_classes – Number of classes in train set

  • m – Margin parameter for ArcFace loss. Usually you should use 0.3-0.5 values for it

  • s – Scaling parameter for ArcFace loss. Usually you should use 30-64 values for it

  • smoothing_epsilon – Label smoothing effect strength

  • label2category – Optional, mapping from label to its category. If provided, label smoothing will redistribute smoothing_epsilon only inside the category corresponding to the sample’s ground truth label

  • reduction – CrossEntropyLoss reduction

ArcFaceLossWithMLP

class oml.losses.arcface.ArcFaceLossWithMLP(in_features: int, num_classes: int, mlp_features: List[int], m: float = 0.5, s: float = 64, smoothing_epsilon: float = 0, label2category: Optional[Dict[Any, Any]] = None, reduction: str = 'mean')[source]

Bases: Module

Almost the same as ArcFaceLoss, but also has MLP projector before the loss. You may want to use ArcFaceLossWithMLP to boost the expressive power of ArcFace loss during the training (for example, in a multi-head setup it may be a good idea to have task-specific projectors in each of the losses). Note, the criterion does not exist during the validation time. Thus, if you want to keep your MLP layers, you should create them as a part of the model you train.

__init__(in_features: int, num_classes: int, mlp_features: List[int], m: float = 0.5, s: float = 64, smoothing_epsilon: float = 0, label2category: Optional[Dict[Any, Any]] = None, reduction: str = 'mean')[source]
Parameters
  • in_features – Input feature size

  • num_classes – Number of classes in train set

  • mlp_features – Layers sizes for MLP before ArcFace

  • m – Margin parameter for ArcFace loss. Usually you should use 0.3-0.5 values for it

  • s – Scaling parameter for ArcFace loss. Usually you should use 30-64 values for it

  • smoothing_epsilon – Label smoothing effect strength

  • label2category – Optional, mapping from label to its category. If provided, label smoothing will redistribute smoothing_epsilon only inside the category corresponding to the sample’s ground truth label

  • reduction – CrossEntropyLoss reduction

label_smoothing

oml.functional.label_smoothing.label_smoothing(y: Tensor, num_classes: int, epsilon: float = 0.2, categories: Optional[Tensor] = None) Tensor[source]

This function is doing label smoothing. You can also use modified version, where the label is smoothed only for the category corresponding to sample’s ground truth label. To use this, you should provide the categories argument: vector, for which i-th entry is a corresponding category for label i.

Parameters
  • y – Ground truth labels with the size of batch_size where each element is from 0 (inclusive) to num_classes (exclusive).

  • num_classes – Number of classes in total

  • epsilon – Power of smoothing. The biggest value in OHE-vector will be 1 - epsilon + 1 / num_classes after the transformation

  • categories – Vector for which i-th entry is a corresponding category for label i. Optional, used for category-based label smoothing. In that case the biggest value in OHE-vector will be 1 - epsilon + 1 / num_classes_of_the_same_category, labels outside of the category will not change