自监督系列代码
# 01、MoCo: Momentum Contrast for Unsupervised Visual Representation Learning
# 1.1 MoCo v1 & MoCo v2
https://github.com/facebookresearch/moco
Models
Our pre-trained ResNet-50 models can be downloaded as following:
epochs | mlp | aug+ | cos | top-1 acc. | model | md5 | |
---|---|---|---|---|---|---|---|
MoCo v1 (opens new window) | 200 | 60.6 | download (opens new window) | b251726a | |||
MoCo v2 (opens new window) | 200 | ✓ | ✓ | ✓ | 67.7 | download (opens new window) | 59fd9945 |
MoCo v2 (opens new window) | 800 | ✓ | ✓ | ✓ | 71.1 | download (opens new window) | a04e12f8 |
# 1.2 MoCov3
https://github.com/facebookresearch/moco-v3
ResNet-50, linear classification
pretrain epochs | pretrain crops | linear acc |
---|---|---|
100 | 2x224 | 68.9 |
300 | 2x224 | 72.8 |
1000 | 2x224 | 74.6 |
ViT, linear classification
model | pretrain epochs | pretrain crops | linear acc |
---|---|---|---|
ViT-Small | 300 | 2x224 | 73.2 |
ViT-Base | 300 | 2x224 | 76.7 |
ViT, end-to-end fine-tuning
model | pretrain epochs | pretrain crops | e2e acc |
---|---|---|---|
ViT-Small | 300 | 2x224 | 81.4 |
ViT-Base | 300 | 2x224 | 83.2 |
# 02、SimCLR - A Simple Framework for Contrastive Learning of Visual Representations
https://github.com/google-research/simclr
# Pre-trained models for SimCLRv1
The pre-trained models (base network with linear classifier layer) can be found below. Note that for these SimCLRv1 checkpoints, the projection head is not available.
Model checkpoint and hub-module | ImageNet Top-1 |
---|---|
ResNet50 (1x) (opens new window) | 69.1 |
ResNet50 (2x) (opens new window) | 74.2 |
ResNet50 (4x) (opens new window) | 76.6 |
# Pre-trained models for SimCLRv2
Depth | Width | SK | Param (M) | F-T (1%) | F-T(10%) | F-T(100%) | Linear eval | Supervised |
---|---|---|---|---|---|---|---|---|
50 | 1X | False | 24 | 57.9 | 68.4 | 76.3 | 71.7 | 76.6 |
50 | 1X | True | 35 | 64.5 | 72.1 | 78.7 | 74.6 | 78.5 |
50 | 2X | False | 94 | 66.3 | 73.9 | 79.1 | 75.6 | 77.8 |
50 | 2X | True | 140 | 70.6 | 77.0 | 81.3 | 77.7 | 79.3 |
101 | 1X | False | 43 | 62.1 | 71.4 | 78.2 | 73.6 | 78.0 |
101 | 1X | True | 65 | 68.3 | 75.1 | 80.6 | 76.3 | 79.6 |
101 | 2X | False | 170 | 69.1 | 75.8 | 80.7 | 77.0 | 78.9 |
101 | 2X | True | 257 | 73.2 | 78.8 | 82.4 | 79.0 | 80.1 |
152 | 1X | False | 58 | 64.0 | 73.0 | 79.3 | 74.5 | 78.3 |
152 | 1X | True | 89 | 70.0 | 76.5 | 81.3 | 77.2 | 79.9 |
152 | 2X | False | 233 | 70.2 | 76.6 | 81.1 | 77.4 | 79.1 |
152 | 2X | True | 354 | 74.2 | 79.4 | 82.9 | 79.4 | 80.4 |
152 | 3X | True | 795 | 74.9 | 80.1 | 83.1 | 79.8 | 80.5 |
# 03、SimSiam: Exploring Simple Siamese Representation Learning
https://github.com/facebookresearch/simsiam
# Models and Logs
Our pre-trained ResNet-50 models and logs:
pre-train epochs | batch size | pre-train ckpt | pre-train log | linear cls. ckpt | linear cls. log | top-1 acc. |
---|---|---|---|---|---|---|
100 | 512 | link (opens new window) | link (opens new window) | link (opens new window) | link (opens new window) | 68.1 |
100 | 256 | link (opens new window) | link (opens new window) | link (opens new window) | link (opens new window) | 68.3 |
# 04、Understanding Dimensional Collapse in Contrastive Self-supervised Learning
# 05、Improving Contrastive Learning by Visualizing Feature Transformation
https://github.com/DTennant/CL-Visualizing-Feature-Transformation
Models
For your convenience, we provide the following pre-trained models on ImageNet-1K and ImageNet-100.
pre-train method | pre-train dataset | backbone | #epoch | ImageNet-1K | VOC det AP50 | COCO det AP | Link |
---|---|---|---|---|---|---|---|
Supervised | ImageNet-1K | ResNet-50 | - | 76.1 | 81.3 | 38.2 | download (opens new window) |
MoCo-v1 | ImageNet-1K | ResNet-50 | 200 | 60.6 | 81.5 | 38.5 | download (opens new window) |
MoCo-v1+FT | ImageNet-1K | ResNet-50 | 200 | 61.9 | 82.0 | 39.0 | download (opens new window) |
MoCo-v2 | ImageNet-1K | ResNet-50 | 200 | 67.5 | 82.4 | 39.0 | download (opens new window) |
MoCo-v2+FT | ImageNet-1K | ResNet-50 | 200 | 69.6 | 83.3 | 39.5 | download (opens new window) |
MoCo-v1+FT | ImageNet-100 | ResNet-50 | 200 | IN-100 result 77.2 | - | - | download (opens new window) |
# 06、Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning
# 6.1 Pascal VOC object detection
# Faster-RCNN with C4
Method | Epochs | Arch | AP | AP50 | AP75 | Download |
---|---|---|---|---|---|---|
Scratch | - | ResNet-50 | 33.8 | 60.2 | 33.1 | - |
Supervised | 100 | ResNet-50 | 53.5 | 81.3 | 58.8 | - |
MoCo | 200 | ResNet-50 | 55.9 | 81.5 | 62.6 | - |
SimCLR | 1000 | ResNet-50 | 56.3 | 81.9 | 62.5 | - |
MoCo v2 | 800 | ResNet-50 | 57.6 | 82.7 | 64.4 | - |
InfoMin | 200 | ResNet-50 | 57.6 | 82.7 | 64.6 | - |
InfoMin | 800 | ResNet-50 | 57.5 | 82.5 | 64.0 | - |
PixPro (ours) (opens new window) | 100 | ResNet-50 | 58.8 | 83.0 | 66.5 | config (opens new window) | model (opens new window) |
PixPro (ours) (opens new window) | 400 | ResNet-50 | 60.2 | 83.8 | 67.7 | config (opens new window) | model (opens new window) |
# 6.2 COCO object detection
# Mask-RCNN with FPN
Method | Epochs | Arch | Schedule | bbox AP | mask AP | Download |
---|---|---|---|---|---|---|
Scratch | - | ResNet-50 | 1x | 32.8 | 29.9 | - |
Supervised | 100 | ResNet-50 | 1x | 39.7 | 35.9 | - |
MoCo | 200 | ResNet-50 | 1x | 39.4 | 35.6 | - |
SimCLR | 1000 | ResNet-50 | 1x | 39.8 | 35.9 | - |
MoCo v2 | 800 | ResNet-50 | 1x | 40.4 | 36.4 | - |
InfoMin | 200 | ResNet-50 | 1x | 40.6 | 36.7 | - |
InfoMin | 800 | ResNet-50 | 1x | 40.4 | 36.6 | - |
PixPro (ours) (opens new window) | 100 | ResNet-50 | 1x | 40.8 | 36.8 | config (opens new window) | model (opens new window) |
PixPro (ours) | 100* | ResNet-50 | 1x | 41.3 | 37.1 | - |
PixPro (ours) | 400* | ResNet-50 | 1x | 41.4 | 37.4 | - |
* Indicates methods with instance branch.
# Mask-RCNN with C4
Method | Epochs | Arch | Schedule | bbox AP | mask AP | Download |
---|---|---|---|---|---|---|
Scratch | - | ResNet-50 | 1x | 26.4 | 29.3 | - |
Supervised | 100 | ResNet-50 | 1x | 38.2 | 33.3 | - |
MoCo | 200 | ResNet-50 | 1x | 38.5 | 33.6 | - |
SimCLR | 1000 | ResNet-50 | 1x | 38.4 | 33.6 | - |
MoCo v2 | 800 | ResNet-50 | 1x | 39.5 | 34.5 | - |
InfoMin | 200 | ResNet-50 | 1x | 39.0 | 34.1 | - |
InfoMin | 800 | ResNet-50 | 1x | 38.8 | 33.8 | - |
PixPro (ours) (opens new window) | 100 | ResNet-50 | 1x | 40.0 | 34.8 | config (opens new window) | model (opens new window) |
PixPro (ours) (opens new window) | 400 | ResNet-50 | 1x | 40.5 | 35.3 | config (opens new window) | model (opens new window) |
# 07、CVPR2021 | Online Bag-of-Visual-Words Generation for Unsupervised Representation Learning
https://github.com/valeoai/obow
# 7.1 ResNet50 pre-trained model
Method | Epochs | Batch-size | Dataset | ImageNet linear acc. | Links to pre-trained weights |
---|---|---|---|---|---|
OBoW | 200 | 256 | ImageNet | 73.8 | entire model (opens new window) / only feature extractor (opens new window) |
# 08、NeurIPS 2020 | Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
https://github.com/facebookresearch/swav
# 09、ECCV 2020 | Learning to Classify Images without Labels
https://github.com/wvangansbeke/Unsupervised-Classification
We also train SCAN on ImageNet for 1000 clusters. We use 10 clusterheads and finally take the head with the lowest loss. The accuracy (ACC), normalized mutual information (NMI), adjusted mutual information (AMI) and adjusted rand index (ARI) are computed:
Method | ACC | NMI | AMI | ARI | Download link |
---|---|---|---|---|---|
SCAN (ResNet50) | 39.9 | 72.0 | 51.2 | 27.5 | Download (opens new window) |
# 10、ICML 2020 | Self-Supervised Prototypical Transfer Learning for Few-Shot Classification
https://github.com/indy-lab/ProtoTransfer
# 11、NeurIPS 2020 | Bootstrap Your Own Latent
https://github.com/deepmind/deepmind-research/tree/master/byol
Using this implementation should achieve a top-1 accuracy on Imagenet between 74.0% and 74.5% after about 8h of training using 512 Cloud TPU v3.
# 12、Efficient Self-Supervised Vision Transformers
https://github.com/microsoft/esvit
# 12.1 Pretrained models
You can download the full checkpoint (trained with both view-level and region-level tasks, batch size=512 and ImageNet-1K.), which contains backbone and projection head weights for both student and teacher networks.
- EsViT (Swin) with network configurations of increased model capacities, pre-trained with both view-level and region-level tasks. ResNet-50 trained with both tasks is shown as a reference.
# 13、Emerging Properties in Self-Supervised Vision Transformers.
https://github.com/facebookresearch/dino
# 13.1 Pretrained models
You can choose to download only the weights of the pretrained backbone used for downstream tasks, or the full checkpoint which contains backbone and projection head weights for both student and teacher networks. We also provide the backbone in onnx
format, as well as detailed arguments and training/evaluation logs. Note that DeiT-S
and ViT-S
names refer exactly to the same architecture.
We also release XCiT models ([arXiv
(opens new window)] [code
(opens new window)]) trained with DINO:
arch | params | k-nn | linear | download | ||||
---|---|---|---|---|---|---|---|---|
xcit_small_12_p16 | 26M | 76.0% | 77.8% | backbone only (opens new window) | full ckpt (opens new window) | args (opens new window) | logs (opens new window) | eval (opens new window) |
xcit_small_12_p8 | 26M | 77.1% | 79.2% | backbone only (opens new window) | full ckpt (opens new window) | args (opens new window) | logs (opens new window) | eval (opens new window) |
xcit_medium_24_p16 | 84M | 76.4% | 78.8% | backbone only (opens new window) | full ckpt (opens new window) | args (opens new window) | logs (opens new window) | eval (opens new window) |
xcit_medium_24_p8 | 84M | 77.9% | 80.3% | backbone only (opens new window) | full ckpt (opens new window) | args (opens new window) | logs (opens new window) | eval (opens new window) |
- 02
- README 美化05-20
- 03
- 常见 Tricks 代码片段05-12