Miners

AllTripletsMiner

class oml.miners.inbatch_all_tri.AllTripletsMiner(max_output_triplets: int = 9223372036854775807, device: str = 'cpu')[source]

Bases: ITripletsMinerInBatch

This miner selects all the possible triplets for the given batch.

__init__(max_output_triplets: int = 9223372036854775807, device: str = 'cpu')[source]
Parameters
  • max_output_triplets – Number of all of the possible triplets in the batch can be very large, so we can limit them vis this parameter.

  • device – the device where to perform computations.

sample(features: Tensor, labels: Union[List[int], Tensor]) Tuple[Tensor, Tensor, Tensor]
Parameters
  • features – Features with the shape of [batch_size, feature_size]

  • labels – Labels with the size of batch_size

Returns

Batch of triplets

HardTripletsMiner

class oml.miners.inbatch_hard_tri.HardTripletsMiner[source]

Bases: ITripletsMinerInBatch

This miner selects the hardest triplets based on the distances between the features:

  • The hardest positive sample has the maximal distance to the anchor sample

  • The hardest negative sample has the minimal distance to the anchor sample

__init__()
sample(features: Tensor, labels: Union[List[int], Tensor]) Tuple[Tensor, Tensor, Tensor]
Parameters
  • features – Features with the shape of [batch_size, feature_size]

  • labels – Labels with the size of batch_size

Returns

Batch of triplets

TripletMinerWithMemory

class oml.miners.cross_batch.TripletMinerWithMemory(bank_size_in_batches: int, tri_expand_k: int)[source]

Bases: ITripletsMiner

This miner has a memory bank that allows to sample not only the triplets from the original batch, but also add batches obtained from both the bank and the original batch.

__init__(bank_size_in_batches: int, tri_expand_k: int)[source]
Parameters
  • bank_size_in_batches – The size of the bank calculated in the number batches

  • tri_expand_k – This parameter defines how many triplets we sample from the bank. Specifically, we return tri_expand_k * number of original triplets. In particular, if tri_expand_k == 1 we sample no triplets from the bank

sample(features: Tensor, labels: Tensor) Tuple[Tensor, Tensor, Tensor, Tensor][source]
Parameters
  • features – Features with the shape of (batch_size, feat_dim)

  • labels – Labels with the size of batch_size

Returns

Triplets made from the original batch and those that were combined from the bank and the batch. We also return an indicator of whether triplet was obtained from the original batch. So, output is the following (anchor, positive, negative, indicators)

HardClusterMiner

class oml.miners.inbatch_hard_cluster.HardClusterMiner[source]

Bases: ITripletsMiner

This miner selects the hardest triplets based on the distance to mean vectors: anchor is a mean vector of features of i-th label in the batch, the hardest positive sample is the most distant from the anchor sample of anchor’s label, the hardest negative sample is the closest mean vector of other labels.

The batch must contain n_instances for n_labels where both values higher than 1.

__init__()
sample(features: Tensor, labels: Union[List[int], Tensor]) Tuple[Tensor, Tensor, Tensor][source]

This method samples the hardest triplets in the batch.

Parameters
  • features – Tensor with the shape of [batch_size, embed_dim] that contains n_instances for each of n_labels

  • labels – Labels with the size of batch_size

Returns

n_labels triplets in the form of (mean_vector, positive, negative_mean_vector)

NHardTripletsMiner

class oml.miners.inbatch_nhard_tri.NHardTripletsMiner(n_positive: Union[Tuple[int, int], List[int], int] = 1, n_negative: Union[Tuple[int, int], List[int], int] = 1)[source]

Bases: ITripletsMinerInBatch

This miner selects hard triplets based on distances between features:

  • hard positive samples have large distance to the anchor sample

  • hard negative samples have small distance to the anchor sample

Toward the end of the training, annotation errors can affect final metric. If you are not sure about the quality of your dataset, you can use range instead of integer value for parameters and exclude combinations with the largest distances. For example instead picking 5 positive examples, you can use examples from the 2nd hardest to the 5th one.

__init__(n_positive: Union[Tuple[int, int], List[int], int] = 1, n_negative: Union[Tuple[int, int], List[int], int] = 1)[source]
Parameters
  • n_positive – keep n_positive positive samples with large distances. If the value is a range, minimal value has to be less than the available amount of labels in batches

  • n_negative – keep n_negative negative pipelines with small distances

Note

If both parameters are 1, the miner is equivalent to HardTripletsMiner. If both parameters are large enough, the miner can be equivalent to AllTripletsMiner

sample(features: Tensor, labels: Union[List[int], Tensor]) Tuple[Tensor, Tensor, Tensor]
Parameters
  • features – Features with the shape of [batch_size, feature_size]

  • labels – Labels with the size of batch_size

Returns

Batch of triplets

MinerWithBank

class oml.miners.miner_with_bank.MinerWithBank(bank_size_in_batches: int, miner: NHardTripletsMiner, need_logs: bool = True)[source]

Bases: ITripletsMiner

This is a class for cross-batch memory. This implementation uses only samples from the current batch as anchors and finds positive and negative pairs from the bank and the current batch using miner.

__init__(bank_size_in_batches: int, miner: NHardTripletsMiner, need_logs: bool = True)[source]
Parameters
  • bank_size_in_batches – Size of the bank.

  • miner – Miner, for now we only support NHardTripletsMiner

  • need_logs – Set True if you want to track logs.

sample(features: Tensor, labels: Tensor) Tuple[Tensor, Tensor, Tensor][source]
Parameters
  • features – Features with the shape [batch_size, features_dim]

  • labels – Labels with the size of batch_size

Returns

anchor, positive, negative

Return type

Batch of triplets in the following order