DDP

IMetricDDP
EmbeddingMetricsDDP
ModuleDDP
ExtractorModuleDDP
MetricValCallbackDDP
DDPSamplerWrapper
patch_dataloader_to_ddp
sync_dicts_ddp

Note, that this is an advanced section for developers or curious users. Normally, you don’t even need to know about the existence of the classes and functions below.

IMetricDDP 

class oml.interfaces.metrics.IMetricDDP[source]

Bases: IBasicMetric

This is an extension of a base metric interface to work in DDP mode

abstract sync() → None[source]: Method aggregates data in DDP mode before metrics calculations

class oml.metrics.embeddings.EmbeddingMetricsDDP(embeddings_key: str = 'embeddings', labels_key: str = 'labels', is_query_key: str = 'is_query', is_gallery_key: str = 'is_gallery', extra_keys: Tuple[str, ...] = (), cmc_top_k: Tuple[int, ...] = (5,), precision_top_k: Tuple[int, ...] = (5,), map_top_k: Tuple[int, ...] = (5,), fmr_vals: Tuple[float, ...] = (), pcf_variance: Tuple[float, ...] = (0.5,), categories_key: Optional[str] = None, sequence_key: Optional[str] = None, postprocessor: Optional[IDistancesPostprocessor] = None, metrics_to_exclude_from_visualization: Iterable[str] = (), return_only_overall_category: bool = False, visualize_only_overall_category: bool = True, verbose: bool = True)[source]

Bases: EmbeddingMetrics, IMetricDDP

__init__(embeddings_key: str = 'embeddings', labels_key: str = 'labels', is_query_key: str = 'is_query', is_gallery_key: str = 'is_gallery', extra_keys: Tuple[str, ...] = (), cmc_top_k: Tuple[int, ...] = (5,), precision_top_k: Tuple[int, ...] = (5,), map_top_k: Tuple[int, ...] = (5,), fmr_vals: Tuple[float, ...] = (), pcf_variance: Tuple[float, ...] = (0.5,), categories_key: Optional[str] = None, sequence_key: Optional[str] = None, postprocessor: Optional[IDistancesPostprocessor] = None, metrics_to_exclude_from_visualization: Iterable[str] = (), return_only_overall_category: bool = False, visualize_only_overall_category: bool = True, verbose: bool = True)

Parameters

embeddings_key – Key to take the embeddings from the batches
labels_key – Key to take the labels from the batches
is_query_key – Key to take the information whether every batch sample belongs to the query
is_gallery_key – Key to take the information whether every batch sample belongs to the gallery
extra_keys – Keys to accumulate some additional information from the batches
cmc_top_k – Values of k to calculate cmc@k (Cumulative Matching Characteristic)
precision_top_k – Values of k to calculate precision@k
map_top_k – Values of k to calculate map@k (Mean Average Precision)
fmr_vals – Values of fmr (measured in quantiles) to calculate fnmr@fmr (False Non Match Rate at the given False Match Rate). For example, if fmr_values is (0.2, 0.4) we will calculate fnmr@fmr=0.2 and fnmr@fmr=0.4. Note, computing this metric requires additional memory overhead, that is why it’s turned off by default.
pcf_variance – Values in range [0, 1]. Find the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by pcf_variance.
categories_key – Key to take the samples’ categories from the batches (if you have ones)
sequence_key – Key to take sequence ids from the batches (if you have ones)
postprocessor – Postprocessor which applies some techniques like query reranking
metrics_to_exclude_from_visualization – Names of the metrics to exclude from the visualization. It will not affect calculations.
return_only_overall_category – Set True if you want to return only the aggregated metrics
visualize_only_overall_category – Set False if you want to visualize each category separately
verbose – Set True if you want to print metrics

setup(num_samples: int) → None: Method for preparing metrics to work: memory allocation, placeholder preparation, etc. Has to be called before the first call of self.update_data().

update_data(data_dict: Dict[str, Any], indices: List[int]) → None[source]

Parameters

data_dict – Batch of data containing records of the same size: bs.
indices – Global indices of the elements in your records within the range of (0, dataset_size - 1). Indices are needed in DDP (because data is gathered shuffled, additionally you may also get some duplicates due to padding). In the single device regime it’s may be useful if you accumulate data in shuffled order.

compute_metrics() → Dict[Union[str, int], Dict[str, Dict[Union[int, float], Union[float, Tensor]]]]

The output must be in the following format:

{
    "self.overall_categories_key": {"metric1": ..., "metric2": ...},
    "category1": {"metric1": ..., "metric2": ...},
    "category2": {"metric1": ..., "metric2": ...}
}

Where category1 and category2 are optional.

sync() → None[source]: Method aggregates data in DDP mode before metrics calculations

ModuleDDP 

class oml.lightning.modules.ddp.ModuleDDP(loaders_train: Optional[Union[DataLoader, Sequence[DataLoader], Dict[str, DataLoader]]] = None, loaders_val: Optional[Union[DataLoader, Sequence[DataLoader]]] = None)[source]

Bases: LightningModule

The module automatically patches training and validation dataloaders to DDP mode by splitting available indices between devices. Note, don’t use trainer.fit(...) or trainer.validate(...), because in this case, PytorchLightning will ignore our patching.

__init__(loaders_train: Optional[Union[DataLoader, Sequence[DataLoader], Dict[str, DataLoader]]] = None, loaders_val: Optional[Union[DataLoader, Sequence[DataLoader]]] = None)[source]

ExtractorModuleDDP 

class oml.lightning.modules.extractor.ExtractorModuleDDP(loaders_train: Optional[Any] = None, loaders_val: Optional[Any] = None, *args: Any, **kwargs: Any)[source]

Bases: ExtractorModule, ModuleDDP

This is a base module for the training of your model with Lightning in DDP.

__init__(loaders_train: Optional[Any] = None, loaders_val: Optional[Any] = None, *args: Any, **kwargs: Any)[source]

Parameters

extractor – Extractor to train
criterion – Criterion to optimize
optimizer – Optimizer
scheduler – Learning rate scheduler
scheduler_interval – Interval of calling scheduler (must be step or epoch)
scheduler_frequency – Frequency of calling scheduler
input_tensors_key – Key to get tensors from the batches
labels_key – Key to get labels from the batches
embeddings_key – Key to get embeddings from the batches
scheduler_monitor_metric – Metric to monitor for the schedulers that depend on the metric value
freeze_n_epochs – number of epochs to freeze model (for n > 0 model has to be a successor of IFreezable interface). When current_epoch >= freeze_n_epochs model is unfreezed. Note that epochs are starting with 0.

MetricValCallbackDDP 

class oml.lightning.callbacks.metric.MetricValCallbackDDP(metric: IMetricDDP, *args: Any, **kwargs: Any)[source]

Bases: MetricValCallback

This is an extension to the regular callback that takes into account data reduction and padding on the inference for each device in DDP setup

__init__(metric: IMetricDDP, *args: Any, **kwargs: Any)[source]

Parameters

metric – Metric
log_images – Set True if you want to have visual logging
loader_idx – Idx of the loader to calculate metric for
samples_in_getitem – Some of the datasets return several samples when calling __getitem__, so we need to handle it for the proper calculation. For most of the cases this value equals to 1, but for the dataset which explicitly return triplets, this value must be equal to 3, for a dataset of pairs it must be equal to 2.

DDPSamplerWrapper 

class oml.ddp.patching.DDPSamplerWrapper(sampler: Union[BatchSampler, Sampler, IBatchSampler], shuffle_samples_between_gpus: bool = True, pad_data_to_num_gpus: bool = True)[source]

Bases: DistributedSampler

This is a wrapper to allow using custom sampler in DDP mode.

Default DistributedSampler allows us to build a sampler for a dataset in DDP mode. Usually we can easily replace default SequentialSampler [when DataLoader(shuffle=False, ...)] and RandomSampler [when DataLoader(shuffle=True, ...)] with DistributedSampler. But for the custom sampler, we need an extra wrapper.

Thus, this wrapper distributes indices produced by sampler among several devices for further usage.

__init__(sampler: Union[BatchSampler, Sampler, IBatchSampler], shuffle_samples_between_gpus: bool = True, pad_data_to_num_gpus: bool = True)[source]

Parameters

sampler – Sequential or batch sampler
pad_data_to_num_gpus – When using DDP we should manage behavior with the last batch, because each device should have the same amount of data. If the sampler length is not evenly divisible by the number of devices, we must duplicate part of the data (pad_data_to_num_gpus=True), or discard part of the data (pad_data_to_num_gpus=False).
shuffle_samples_between_gpus – shuffle available indices before feeding them to GPU. Note, that shuffle inside GPU after the feeding will be used according to behavior of the sampler.

Note: Wrapper can be used with both the default SequentialSampler or RandomSampler from PyTorch and with some custom sampler.

_reload() → None[source]: We need to re-instantiate the wrapper in order to update the available indices for the new epoch. We don’t perform this step on the epoch 0, because we want to be comparable with no DDP setup there.

patch_dataloader_to_ddp 

oml.ddp.patching.patch_dataloader_to_ddp(loader: DataLoader) → DataLoader[source]: Function inspects loader and modifies sampler for working in DDP mode.

Note

We ALWAYS use the padding of samples (in terms of the number of batches or number of samples per epoch) in order to use the same amount of data for each device in DDP. Thus, the behavior with and without DDP may be slightly different (e.g. metrics values).

sync_dicts_ddp 

oml.ddp.utils.sync_dicts_ddp(outputs_from_device: Dict[str, Any], world_size: int, device: Union[device, str] = 'cpu') → Dict[str, Any][source]: The function allows you to combine and merge data stored in dictionaries from all devices. You can place this function in your code and all the devices upon reaching this function will wait for each other to synchronize and merge dictionaries.

Note

The function under the hood pickles all object, converts bytes to tensor, then unpickles them after syncing. With NCCL DDP backend (the default one) intermediate tensors are stored on CUDA.