Samplers
BalanceSampler
- class oml.samplers.balance.BalanceSampler(labels: Union[List[int], ndarray], n_labels: int, n_instances: int)[source]
Bases:
IBatchSampler
This sampler takes
n_instances
for each of then_labels
to form the batches. Thus, the batch size isn_instances x n_labels
. This type of sampling can be found in the classical Person Re-Id paper - In Defense of the Triplet Loss for Person Re-Identification.The strategy for the dataset with
L
unique labels is the following:Select
n_labels
ofL
labels for the 1st batchSelect
n_instances
for each label for the 1st batchSelect
n_labels
ofL - n_labels
remaining labels for 2nd batchSelect
n_instances
instances for each label for the 2nd batch…
The epoch ends after
L // n_labels
.
Thus, in each epoch, all the labels will be selected once, but this does not mean that all the instances will be picked.
Behavior in corner cases:
If some label does not contain
n_instances
, a choice will be made with repetition.If
L % n_labels != 0
then we drop the last batch.
- __init__(labels: Union[List[int], ndarray], n_labels: int, n_instances: int)[source]
- Parameters
labels – List of the labels for each element in the dataset
n_labels – The desired number of labels in a batch, should be > 1
n_instances – The desired number of instances of each label in a batch, should be > 1
CategoryBalanceSampler
- class oml.samplers.category_balance.CategoryBalanceSampler(labels: Union[List[int], ndarray], label2category: Dict[int, Union[str, int]], n_categories: int, n_labels: int, n_instances: int, resample_labels: bool = False, weight_categories: bool = True)[source]
Bases:
IBatchSampler
This sampler takes
n_instances
for each of then_labels
for each of then_categories
to form the batches. Thus, the batch size isn_instances x n_labels x n_categories
.Note, to form an epoch of batches we simply sample
L / n_labels
batches with repetition.- __init__(labels: Union[List[int], ndarray], label2category: Dict[int, Union[str, int]], n_categories: int, n_labels: int, n_instances: int, resample_labels: bool = False, weight_categories: bool = True)[source]
- Parameters
labels – Labels to sample from
label2category – Mapping from label to category
n_categories – The desired number of categories to sample for each batch
n_labels – The desired number of labels to sample for each category in batch
n_instances – The desired number of samples to sample for each label in batch
resample_labels – If
True
sample with repetition otherwise, otherwise raise an error in case of the labels lack in any categoryweight_categories – If
True
sample categories for each batch with weights proportional to the number of unique labels in the categories
DistinctCategoryBalanceSampler
- class oml.samplers.distinct_category_balance.DistinctCategoryBalanceSampler(labels: Union[List[int], ndarray], label2category: Dict[int, Union[str, int]], n_categories: int, n_labels: int, n_instances: int, epoch_size: int)[source]
Bases:
IBatchSampler
This sampler takes
n_instances
for each of then_labels
for each of then_categories
to form the batches. Thus, the batch size isn_instances x n_labels x n_categories
.The strategy for the dataset with
L
unique labels andC
unique categories is the following:Select
n_categories
ofC
for the 1st batchSelect
n_labels
for each of the chosen categories for the 1st batchSelect
n_instances
for each of the chosen labels for the 1st batchDefine the set of available for the 2nd batch labels
L^
: these are all the labelsL
except the ones chosen for the 1st batchDefine set of available categories
C^
: these are all the categories corresponding to labels fromL^
Select
n_categories
fromC^
for the 2nd batchSelect
n_labels
for each category fromL^
for the 2nd batchSelect
n_instances
for each label for the 2nd batch…
Epoch ends after
epoch_size
steps
Behavior in corner cases:
If all the categories were chosen before
epoch_size
steps, the sampler resets its state and goes on sampling
from the first step.
If some class does not contain
n_instances
, a choice will be made with repetition.If the chosen category does not contain unused
n_labels
, all the unused labels will be added to a batch and
the missing ones will be sampled from the used labels without repetition.
If
L % n_labels == 1
then one of the labels must be dropped because we always want to have more than 1 label
in a batch to be able to form positive pairs later on.
- __init__(labels: Union[List[int], ndarray], label2category: Dict[int, Union[str, int]], n_categories: int, n_labels: int, n_instances: int, epoch_size: int)[source]
- Parameters
labels – Labels to sample from
label2category – Mapping from label to category
n_categories – The desired number of categories to sample for each batch
n_labels – The desired number of labels to sample for each category in batch
n_instances – The desired number of samples to sample for each label in batch
epoch_size – The desired number of batches in epoch