Muyun99's wiki Muyun99's wiki
首页
学术搬砖
学习笔记
生活杂谈
wiki搬运
资源收藏
关于
  • 分类
  • 标签
  • 归档
GitHub (opens new window)

Muyun99

努力成为一个善良的人
首页
学术搬砖
学习笔记
生活杂谈
wiki搬运
资源收藏
关于
  • 分类
  • 标签
  • 归档
GitHub (opens new window)
  • 论文摘抄

  • 论文阅读-图像分类

  • 论文阅读-语义分割

  • 论文阅读-知识蒸馏

  • 论文阅读-Transformer

    • Transformer系列代码
      • CrossFormer: https://github.com/cheerss/CrossFormer
      • Swin-Transformer: https://github.com/SwinTransformer/Swin-Transformer-Semantic-Segmentation
      • Residual Attention: A Simple but Effective Method for Multi-Label Recognition: https://github.com/Kevinz-code/CSRA
      • CMT: Convolutional Neural Networks Meet Vision Transformers
      • Pre-Trained Image Processing Transformer (IPT)
      • HRFormer: High-Resolution Transformer for Dense Prediction, NeurIPS 2021
      • DeiT: Data-efficient Image Transformers
      • Efficient Vision Transformers via Fine-Grained Manifold Distillation
      • Augmented Shortcuts for Vision Transformers
      • SOFT: Softmax-free Transformer with Linear Complexity
        • Image Classification
    • An Image is Worth 16x16 Words Transformers for Image Recognition at Scale
    • Do Vision Transformers See Like Convolutional Neural Networks
  • 论文阅读-图卷积网络

  • 论文阅读-弱监督图像分割

  • 论文阅读-半监督图像分割

  • 论文阅读-带噪学习

  • 论文阅读-小样本学习

  • 论文阅读-自监督学习

  • 语义分割中的知识蒸馏

  • 学术文章搜集

  • 论文阅读-其他文章

  • 学术搬砖
  • 论文阅读-Transformer
Muyun99
2021-10-16

Transformer系列代码

可解释性:https://github.com/hila-chefer/Transformer-Explainability

# CrossFormer: https://github.com/cheerss/CrossFormer

# ADE20K

Backbone Segmentation Head Iterations Params FLOPs IOU MS IOU
CrossFormer-S FPN 80K 34.3M 209.8G 46.4 -
CrossFormer-B FPN 80K 55.6M 320.1G 48.0 -
CrossFormer-L FPN 80K 95.4M 482.7G 49.1 -
ResNet-101 UPerNet 160K 86.0M 1029.G 44.9 -
CrossFormer-S UPerNet 160K 62.3M 979.5G 47.6 48.4
CrossFormer-B UPerNet 160K 83.6M 1089.7G 49.7 50.6
CrossFormer-L UPerNet 160K 125.5M 1257.8G 50.4 51.4

# Swin-Transformer: https://github.com/SwinTransformer/Swin-Transformer-Semantic-Segmentation

# ADE20K

Backbone Method Crop Size Lr Schd mIoU mIoU (ms+flip) #params FLOPs
Swin-T UPerNet 512x512 160K 44.51 45.81 60M 945G
Swin-S UperNet 512x512 160K 47.64 49.47 81M 1038G
Swin-B UperNet 512x512 160K 48.13 49.72 121M 1188G

# Residual Attention: A Simple but Effective Method for Multi-Label Recognition: https://github.com/Kevinz-code/CSRA

Dataset Backbone Head nums mAP(%) Resolution Download
VOC2007 ResNet-101 1 94.7 448x448 download (opens new window)
VOC2007 ResNet-cut 1 95.2 448x448 download (opens new window)
COCO ResNet-101 4 83.3 448x448 download (opens new window)
COCO ResNet-cut 6 85.6 448x448 download (opens new window)
Wider VIT_B16_224 1 89.0 224x224 download (opens new window)
Wider VIT_L16_224 1 90.2 224x224 download (opens new window)

# CMT: Convolutional Neural Networks Meet Vision Transformers

# Pre-Trained Image Processing Transformer (IPT)

https://github.com/huawei-noah/Pretrained-IPT

# HRFormer: High-Resolution Transformer for Dense Prediction, NeurIPS 2021

https://github.com/HRNet/HRFormer

# ADE20K

Methods Backbone Window Size Train Set Test Set Iterations Batch Size OHEM mIoU mIoU (Multi-Scale) Log ckpt script
OCRNet HRFormer-S 7x7 Train Val 150000 8 Yes 44.0 45.1 log (opens new window) ckpt (opens new window) script (opens new window)
OCRNet HRFormer-B 7x7 Train Val 150000 8 Yes 46.3 47.6 log (opens new window) ckpt (opens new window) script (opens new window)
OCRNet HRFormer-B 13x13 Train Val 150000 8 Yes 48.7 50.0 log (opens new window) ckpt (opens new window) script (opens new window)
OCRNet HRFormer-B 15x15 Train Val 150000 8 Yes - - - - -

# DeiT: Data-efficient Image Transformers

https://github.com/facebookresearch/deit

# Model Zoo

We provide baseline DeiT models pretrained on ImageNet 2012.

name acc@1 acc@5 #params url
DeiT-tiny 72.2 91.1 5M model (opens new window)
DeiT-small 79.9 95.0 22M model (opens new window)
DeiT-base 81.8 95.6 86M model (opens new window)
DeiT-tiny distilled 74.5 91.9 6M model (opens new window)
DeiT-small distilled 81.2 95.4 22M model (opens new window)
DeiT-base distilled 83.4 96.5 87M model (opens new window)
DeiT-base 384 82.9 96.2 87M model (opens new window)
DeiT-base distilled 384 (1000 epochs) 85.2 97.2 88M model (opens new window)
CaiT-S24 distilled 384 85.1 97.3 47M model (opens new window)
CaiT-M48 distilled 448 86.5 97.7 356M model (opens new window)

# Efficient Vision Transformers via Fine-Grained Manifold Distillation

https://arxiv.org/abs/2107.01378

# Augmented Shortcuts for Vision Transformers

https://arxiv.org/abs/2106.15941

Attention Map 的 rank 和多样性

# SOFT: Softmax-free Transformer with Linear Complexity

https://github.com/fudan-zvg/SOFT

# Image Classification

# ImageNet-1K

Model Resolution Params FLOPs Top-1 % Config
SOFT-Tiny 224 13M 1.9G 79.3 SOFT_Tiny.yaml (opens new window), SOFT_Tiny_cuda.yaml (opens new window)
SOFT-Small 224 24M 3.3G 82.2 SOFT_Small.yaml (opens new window), SOFT_Small_cuda.yaml (opens new window)
SOFT-Medium 224 45M 7.2G 82.9 SOFT_Meidum.yaml (opens new window), SOFT_Meidum_cuda.yaml (opens new window)
SOFT-Large 224 64M 11.0G 83.1 SOFT_Large.yaml (opens new window), SOFT_Large_cuda.yaml (opens new window)
SOFT-Huge 224 87M 16.3G 83.3 SOFT_Huge.yaml (opens new window), SOFT_Huge_cuda.yaml (opens new window)
上次更新: 2023/03/25, 19:58:09
Awesome-Knowledge-distillation
An Image is Worth 16x16 Words Transformers for Image Recognition at Scale

← Awesome-Knowledge-distillation An Image is Worth 16x16 Words Transformers for Image Recognition at Scale→

最近更新
01
Structured Knowledge Distillation for Semantic Segmentation
06-03
02
README 美化
05-20
03
常见 Tricks 代码片段
05-12
更多文章>
Theme by Vdoing | Copyright © 2021-2023 Muyun99 | MIT License
  • 跟随系统
  • 浅色模式
  • 深色模式
  • 阅读模式
×