Samplers
BalanceSampler
- class oml.samplers.balance.BalanceSampler(labels: Union[List[int], ndarray], n_labels: int, n_instances: int)[source]
Bases:
IBatchSamplerThis sampler takes
n_instancesfor each of then_labelsto form the batches. Thus, the batch size isn_instances x n_labels. This type of sampling can be found in the classical Person Re-Id paper - In Defense of the Triplet Loss for Person Re-Identification.The strategy for the dataset with
Lunique labels is the following:Select
n_labelsofLlabels for the 1st batchSelect
n_instancesfor each label for the 1st batchSelect
n_labelsofL - n_labelsremaining labels for 2nd batchSelect
n_instancesinstances for each label for the 2nd batch…
The epoch ends after
L // n_labels.
Thus, in each epoch, all the labels will be selected once, but this does not mean that all the instances will be picked.
Behavior in corner cases:
If some label does not contain
n_instances, a choice will be made with repetition.If
L % n_labels != 0then we drop the last batch.
- __init__(labels: Union[List[int], ndarray], n_labels: int, n_instances: int)[source]
- Parameters
labels – List of the labels for each element in the dataset
n_labels – The desired number of labels in a batch, should be > 1
n_instances – The desired number of instances of each label in a batch, should be > 1
CategoryBalanceSampler
- class oml.samplers.category_balance.CategoryBalanceSampler(labels: Union[List[int], ndarray], label2category: Dict[int, Union[str, int]], n_categories: int, n_labels: int, n_instances: int, resample_labels: bool = False, weight_categories: bool = True)[source]
Bases:
IBatchSamplerThis sampler takes
n_instancesfor each of then_labelsfor each of then_categoriesto form the batches. Thus, the batch size isn_instances x n_labels x n_categories.Note, to form an epoch of batches we simply sample
L / n_labelsbatches with repetition.- __init__(labels: Union[List[int], ndarray], label2category: Dict[int, Union[str, int]], n_categories: int, n_labels: int, n_instances: int, resample_labels: bool = False, weight_categories: bool = True)[source]
- Parameters
labels – Labels to sample from
label2category – Mapping from label to category
n_categories – The desired number of categories to sample for each batch
n_labels – The desired number of labels to sample for each category in batch
n_instances – The desired number of samples to sample for each label in batch
resample_labels – If
Truesample with repetition otherwise, otherwise raise an error in case of the labels lack in any categoryweight_categories – If
Truesample categories for each batch with weights proportional to the number of unique labels in the categories
DistinctCategoryBalanceSampler
- class oml.samplers.distinct_category_balance.DistinctCategoryBalanceSampler(labels: Union[List[int], ndarray], label2category: Dict[int, Union[str, int]], n_categories: int, n_labels: int, n_instances: int, epoch_size: int)[source]
Bases:
IBatchSamplerThis sampler takes
n_instancesfor each of then_labelsfor each of then_categoriesto form the batches. Thus, the batch size isn_instances x n_labels x n_categories.The strategy for the dataset with
Lunique labels andCunique categories is the following:Select
n_categoriesofCfor the 1st batchSelect
n_labelsfor each of the chosen categories for the 1st batchSelect
n_instancesfor each of the chosen labels for the 1st batchDefine the set of available for the 2nd batch labels
L^: these are all the labelsLexcept the ones chosen for the 1st batchDefine set of available categories
C^: these are all the categories corresponding to labels fromL^Select
n_categoriesfromC^for the 2nd batchSelect
n_labelsfor each category fromL^for the 2nd batchSelect
n_instancesfor each label for the 2nd batch…
Epoch ends after
epoch_sizesteps
Behavior in corner cases:
If all the categories were chosen before
epoch_sizesteps, the sampler resets its state and goes on sampling
from the first step.
If some class does not contain
n_instances, a choice will be made with repetition.If the chosen category does not contain unused
n_labels, all the unused labels will be added to a batch and
the missing ones will be sampled from the used labels without repetition.
If
L % n_labels == 1then one of the labels must be dropped because we always want to have more than 1 label
in a batch to be able to form positive pairs later on.
- __init__(labels: Union[List[int], ndarray], label2category: Dict[int, Union[str, int]], n_categories: int, n_labels: int, n_instances: int, epoch_size: int)[source]
- Parameters
labels – Labels to sample from
label2category – Mapping from label to category
n_categories – The desired number of categories to sample for each batch
n_labels – The desired number of labels to sample for each category in batch
n_instances – The desired number of samples to sample for each label in batch
epoch_size – The desired number of batches in epoch