Samplers

BalanceSampler

class oml.samplers.balance.BalanceSampler(labels: Union[List[int], ndarray], n_labels: int, n_instances: int)[source]

Bases: IBatchSampler

This sampler takes n_instances for each of the n_labels to form the batches. Thus, the batch size is n_instances x n_labels. This type of sampling can be found in the classical Person Re-Id paper - In Defense of the Triplet Loss for Person Re-Identification.

The strategy for the dataset with L unique labels is the following:

  • Select n_labels of L labels for the 1st batch

  • Select n_instances for each label for the 1st batch

  • Select n_labels of L - n_labels remaining labels for 2nd batch

  • Select n_instances instances for each label for the 2nd batch

  • The epoch ends after L // n_labels.

Thus, in each epoch, all the labels will be selected once, but this does not mean that all the instances will be picked.

Behavior in corner cases:

  • If some label does not contain n_instances, a choice will be made with repetition.

  • If L % n_labels != 0 then we drop the last batch.

__init__(labels: Union[List[int], ndarray], n_labels: int, n_instances: int)[source]
Parameters
  • labels – List of the labels for each element in the dataset

  • n_labels – The desired number of labels in a batch, should be > 1

  • n_instances – The desired number of instances of each label in a batch, should be > 1

CategoryBalanceSampler

class oml.samplers.category_balance.CategoryBalanceSampler(labels: Union[List[int], ndarray], label2category: Dict[int, Union[str, int]], n_categories: int, n_labels: int, n_instances: int, resample_labels: bool = False, weight_categories: bool = True)[source]

Bases: IBatchSampler

This sampler takes n_instances for each of the n_labels for each of the n_categories to form the batches. Thus, the batch size is n_instances x n_labels x n_categories.

Note, to form an epoch of batches we simply sample L / n_labels batches with repetition.

__init__(labels: Union[List[int], ndarray], label2category: Dict[int, Union[str, int]], n_categories: int, n_labels: int, n_instances: int, resample_labels: bool = False, weight_categories: bool = True)[source]
Parameters
  • labels – Labels to sample from

  • label2category – Mapping from label to category

  • n_categories – The desired number of categories to sample for each batch

  • n_labels – The desired number of labels to sample for each category in batch

  • n_instances – The desired number of samples to sample for each label in batch

  • resample_labels – If True sample with repetition otherwise, otherwise raise an error in case of the labels lack in any category

  • weight_categories – If True sample categories for each batch with weights proportional to the number of unique labels in the categories

DistinctCategoryBalanceSampler

class oml.samplers.distinct_category_balance.DistinctCategoryBalanceSampler(labels: Union[List[int], ndarray], label2category: Dict[int, Union[str, int]], n_categories: int, n_labels: int, n_instances: int, epoch_size: int)[source]

Bases: IBatchSampler

This sampler takes n_instances for each of the n_labels for each of the n_categories to form the batches. Thus, the batch size is n_instances x n_labels x n_categories.

The strategy for the dataset with L unique labels and C unique categories is the following:

  • Select n_categories of C for the 1st batch

  • Select n_labels for each of the chosen categories for the 1st batch

  • Select n_instances for each of the chosen labels for the 1st batch

  • Define the set of available for the 2nd batch labels L^: these are all the labels L except the ones chosen for the 1st batch

  • Define set of available categories C^: these are all the categories corresponding to labels from L^

  • Select n_categories from C^ for the 2nd batch

  • Select n_labels for each category from L^ for the 2nd batch

  • Select n_instances for each label for the 2nd batch

  • Epoch ends after epoch_size steps

Behavior in corner cases:

  • If all the categories were chosen before epoch_size steps, the sampler resets its state and goes on sampling

from the first step.

  • If some class does not contain n_instances, a choice will be made with repetition.

  • If the chosen category does not contain unused n_labels, all the unused labels will be added to a batch and

the missing ones will be sampled from the used labels without repetition.

  • If L % n_labels == 1 then one of the labels must be dropped because we always want to have more than 1 label

in a batch to be able to form positive pairs later on.

__init__(labels: Union[List[int], ndarray], label2category: Dict[int, Union[str, int]], n_categories: int, n_labels: int, n_instances: int, epoch_size: int)[source]
Parameters
  • labels – Labels to sample from

  • label2category – Mapping from label to category

  • n_categories – The desired number of categories to sample for each batch

  • n_labels – The desired number of labels to sample for each category in batch

  • n_instances – The desired number of samples to sample for each label in batch

  • epoch_size – The desired number of batches in epoch