Samplers

BalanceSampler
CategoryBalanceSampler
DistinctCategoryBalanceSampler

BalanceSampler 

class oml.samplers.balance.BalanceSampler(labels: Union[List[int], ndarray], n_labels: int, n_instances: int)[source]

Bases: IBatchSampler

This sampler takes n_instances for each of the n_labels to form the batches. Thus, the batch size is n_instances x n_labels. This type of sampling can be found in the classical Person Re-Id paper - In Defense of the Triplet Loss for Person Re-Identification.

The strategy for the dataset with L unique labels is the following:

Select n_labels of L labels for the 1st batch
Select n_instances for each label for the 1st batch
Select n_labels of L - n_labels remaining labels for 2nd batch
Select n_instances instances for each label for the 2nd batch
…
The epoch ends after L // n_labels.

Thus, in each epoch, all the labels will be selected once, but this does not mean that all the instances will be picked.

Behavior in corner cases:

If some label does not contain n_instances, a choice will be made with repetition.
If L % n_labels != 0 then we drop the last batch.

__init__(labels: Union[List[int], ndarray], n_labels: int, n_instances: int)[source]

Parameters

labels – List of the labels for each element in the dataset
n_labels – The desired number of labels in a batch, should be > 1
n_instances – The desired number of instances of each label in a batch, should be > 1

CategoryBalanceSampler 

class oml.samplers.category_balance.CategoryBalanceSampler(labels: Union[List[int], ndarray], label2category: Dict[int, Union[str, int]], n_categories: int, n_labels: int, n_instances: int, resample_labels: bool = False, weight_categories: bool = True)[source]

Bases: IBatchSampler

This sampler takes n_instances for each of the n_labels for each of the n_categories to form the batches. Thus, the batch size is n_instances x n_labels x n_categories.

Note, to form an epoch of batches we simply sample L / n_labels batches with repetition.

__init__(labels: Union[List[int], ndarray], label2category: Dict[int, Union[str, int]], n_categories: int, n_labels: int, n_instances: int, resample_labels: bool = False, weight_categories: bool = True)[source]

Parameters

labels – Labels to sample from
label2category – Mapping from label to category
n_categories – The desired number of categories to sample for each batch
n_labels – The desired number of labels to sample for each category in batch
n_instances – The desired number of samples to sample for each label in batch
resample_labels – If True sample with repetition otherwise, otherwise raise an error in case of the labels lack in any category
weight_categories – If True sample categories for each batch with weights proportional to the number of unique labels in the categories

DistinctCategoryBalanceSampler 

class oml.samplers.distinct_category_balance.DistinctCategoryBalanceSampler(labels: Union[List[int], ndarray], label2category: Dict[int, Union[str, int]], n_categories: int, n_labels: int, n_instances: int, epoch_size: int)[source]

Bases: IBatchSampler

This sampler takes n_instances for each of the n_labels for each of the n_categories to form the batches. Thus, the batch size is n_instances x n_labels x n_categories.

The strategy for the dataset with L unique labels and C unique categories is the following:

Select n_categories of C for the 1st batch
Select n_labels for each of the chosen categories for the 1st batch
Select n_instances for each of the chosen labels for the 1st batch
Define the set of available for the 2nd batch labels L^: these are all the labels L except the ones chosen for the 1st batch
Define set of available categories C^: these are all the categories corresponding to labels from L^
Select n_categories from C^ for the 2nd batch
Select n_labels for each category from L^ for the 2nd batch
Select n_instances for each label for the 2nd batch
…
Epoch ends after epoch_size steps

Behavior in corner cases:

If all the categories were chosen before epoch_size steps, the sampler resets its state and goes on sampling

from the first step.

If some class does not contain n_instances, a choice will be made with repetition.
If the chosen category does not contain unused n_labels, all the unused labels will be added to a batch and

the missing ones will be sampled from the used labels without repetition.

If L % n_labels == 1 then one of the labels must be dropped because we always want to have more than 1 label

in a batch to be able to form positive pairs later on.

__init__(labels: Union[List[int], ndarray], label2category: Dict[int, Union[str, int]], n_categories: int, n_labels: int, n_instances: int, epoch_size: int)[source]

Parameters

labels – Labels to sample from
label2category – Mapping from label to category
n_categories – The desired number of categories to sample for each batch
n_labels – The desired number of labels to sample for each category in batch
n_instances – The desired number of samples to sample for each label in batch
epoch_size – The desired number of batches in epoch

Samplers

BalanceSampler

CategoryBalanceSampler

DistinctCategoryBalanceSampler

BalanceSampler 

CategoryBalanceSampler 

DistinctCategoryBalanceSampler 