Zoo
Zoo: Images
You can use an image model from our Zoo or use other arbitrary models after you inherited it from IExtractor.
See how to use models
from oml.const import CKPT_SAVE_ROOT as CKPT_DIR, MOCK_DATASET_PATH as DATA_DIR
from oml.models import ViTExtractor
from oml.registry import get_transforms_for_pretrained
model = ViTExtractor.from_pretrained("vits16_dino").eval()
transforms, im_reader = get_transforms_for_pretrained("vits16_dino")
img = im_reader(DATA_DIR / "images" / "circle_1.jpg") # put path to your image here
img_tensor = transforms(img)
# img_tensor = transforms(image=img)["image"] # for transforms from Albumentations
features = model(img_tensor.unsqueeze(0))
# Check other available models:
print(list(ViTExtractor.pretrained_models.keys()))
# Load checkpoint saved on a disk:
model_ = ViTExtractor(weights=CKPT_DIR / "vits16_dino.ckpt", arch="vits16", normalise_features=False)
Image models zoo
Models, trained by us. The metrics below are for 224 x 224 images:
model |
cmc1 |
dataset |
weights |
experiment |
---|---|---|---|---|
|
0.921 |
DeepFashion Inshop |
||
|
0.866 |
Stanford Online Products |
||
|
0.907 |
CARS 196 |
||
|
0.837 |
CUB 200 2011 |
Models, trained by other researchers.
Note, that some metrics on particular benchmarks are so high because they were part of the training dataset (for example unicom
).
The metrics below are for 224 x 224 images:
model |
Stanford Online Products |
DeepFashion InShop |
CUB 200 2011 |
CARS 196 |
---|---|---|---|---|
|
0.700 |
0.734 |
0.847 |
0.916 |
|
0.690 |
0.722 |
0.796 |
0.893 |
|
0.726 |
0.790 |
0.868 |
0.922 |
|
0.745 |
0.810 |
0.875 |
0.924 |
|
0.547 |
0.514 |
0.448 |
0.618 |
|
0.565 |
0.565 |
0.524 |
0.648 |
|
0.512 |
0.555 |
0.606 |
0.707 |
|
0.612 |
0.491 |
0.560 |
0.693 |
|
0.648 |
0.606 |
0.665 |
0.767 |
|
0.670 |
0.675 |
0.745 |
0.844 |
|
0.648 |
0.509 |
0.627 |
0.265 |
|
0.651 |
0.524 |
0.661 |
0.315 |
|
0.658 |
0.514 |
0.541 |
0.288 |
|
0.689 |
0.599 |
0.506 |
0.313 |
|
0.566 |
0.334 |
0.797 |
0.503 |
|
0.566 |
0.332 |
0.795 |
0.740 |
|
0.565 |
0.342 |
0.842 |
0.644 |
|
0.557 |
0.324 |
0.833 |
0.828 |
|
0.576 |
0.352 |
0.844 |
0.692 |
|
0.571 |
0.340 |
0.840 |
0.871 |
|
0.493 |
0.267 |
0.264 |
0.149 |
|
0.515 |
0.284 |
0.455 |
0.247 |
The metrics may be different from the ones reported by papers, because the version of train/val split and usage of bounding boxes may differ.
Zoo: Texts
Here is a lightweight integration with HuggingFace Transformers models. You can replace it with other arbitrary models inherited from IExtractor.
pip install open-metric-learning[nlp]
See how to use models
from transformers import AutoModel, AutoTokenizer
from oml.models import HFWrapper
model = AutoModel.from_pretrained('bert-base-uncased').eval()
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
extractor = HFWrapper(model=model, feat_dim=768)
inp = tokenizer(text="Hello world", return_tensors="pt", add_special_tokens=True)
embeddings = extractor(inp)
Note, we don’t have our own text models zoo at the moment.
Zoo: Audios
You can use an audio model from our Zoo or use other arbitrary models after you inherited it from IExtractor.
pip install open-metric-learning[audio]
See how to use models
import torchaudio
from oml.models import ECAPATDNNExtractor
from oml.const import CKPT_SAVE_ROOT as CKPT_DIR, MOCK_AUDIO_DATASET_PATH as DATA_DIR
# replace it by your actual paths
ckpt_path = CKPT_DIR / "ecapa_tdnn_taoruijie.pth"
file_path = DATA_DIR / "voices" / "voice0_0.wav"
model = ECAPATDNNExtractor(weights=ckpt_path, arch="ecapa_tdnn_taoruijie", normalise_features=False).to("cpu").eval()
audio, sr = torchaudio.load(file_path)
if audio.shape[0] > 1:
audio = audio.mean(dim=0, keepdim=True) # mean by channels
if sr != 16000:
audio = torchaudio.functional.resample(audio, sr, 16000)
embeddings = model.extract(audio)
Audio models zoo
model |
Vox1_O |
Vox1_E |
Vox1_H |
---|---|---|---|
|
0.86 |
1.18 |
2.17 |
The metrics above represent Equal Error Rate (EER). Lower is better.