概述
什么是 timm 库?
PyTorchImageModels,简称 timm,是一个巨大的PyTorch代码集合,包括了一系列:
image models layers utilities optimizers schedulers data-loaders / augmentations training / validation scripts
旨在将各种 SOTA 模型整合在一起,并具有复现 ImageNet 训练结果的能力。
作者:Ross Wightman
timm库链接:https://github.com/rwightman/pytorch-image-models
作者官方指南:https://rwightman.github.io/pytorch-image-models/
timm库实现了最新的几乎所有的具有影响力的视觉模型,它不仅提供了模型的权重,还提供了一个很棒的分布式训练和评估的代码框架,方便后人开发。更难能可贵的是它还在不断地更新迭代新的训练方法,新的视觉模型和优化代码。
但是毫无疑问,训练、测试和维护这些代码库和模型权重需要大量的 GPU (或 TPU) 资源和大量的电力/冷却费用。
使用教程
安装库 (Python3, PyTorch version 1.4+):
pip install timm
加载你需要的预训练模型权重:
import timm
m = timm.create_model('mobilenetv3_large_100', pretrained=True)
m.eval()
加载所有的预训练模型列表 (pprint 是美化打印的标准库):
import timm
from pprint import pprint
model_names = timm.list_models(pretrained=True)
pprint(model_names)
>>> ['adv_inception_v3',
'cspdarknet53',
'cspresnext50',
'densenet121',
'densenet161',
'densenet169',
'densenet201',
'densenetblur121d',
'dla34',
'dla46_c',
...
]
利用通配符加载所有的预训练模型列表:
import timm
from pprint import pprint
model_names = timm.list_models('*resne*t*')
pprint(model_names)
>>> ['cspresnet50',
'cspresnet50d',
'cspresnet50w',
'cspresnext50',
...
]
支持的全部模型列表和相关论文链接以及官方代码实现
https://rwightman.github.io/pytorch-image-models/models/
如何使用某个模型
这里以著名的 MobileNet v3 为例。MobileNetV3 是一种卷积神经网络,专为手机 CPU 设计。网络设计包括在 MBConv 块中使用 hard swish activation 激活函数和 squeeze-and-excitation 模块。
hard swish activation:https://paperswithcode.com/method/hard-swish
squeeze-and-excitation:https://paperswithcode.com/method/squeeze-and-excitation-block
加载 MobileNet v3 预训练模型:
import timm
model = timm.create_model('mobilenetv3_large_100', pretrained=True)
model.eval()
加载图片和预处理:
import urllib
from PIL import Image
from timm.data import resolve_data_config
from timm.data.transforms_factory import create_transform
config = resolve_data_config({}, model=model)
transform = create_transform(**config)
url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg")
urllib.request.urlretrieve(url, filename)
img = Image.open(filename).convert('RGB')
tensor = transform(img).unsqueeze(0) # transform and add batch dimension
获取模型预测结果:
import torch
with torch.no_grad():
out = model(tensor)
probabilities = torch.nn.functional.softmax(out[0], dim=0)
print(probabilities.shape)
# prints: torch.Size([1000])
获取预测前5名的类名称:
# Get imagenet class mappings
url, filename = ("https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt", "imagenet_classes.txt")
urllib.request.urlretrieve(url, filename)
with open("imagenet_classes.txt", "r") as f:
categories = [s.strip() for s in f.readlines()]
# Print top categories per image
top5_prob, top5_catid = torch.topk(probabilities, 5)
for i in range(top5_prob.size(0)):
print(categories[top5_catid[i]], top5_prob[i].item())
# prints class names and probabilities like:
# [('Samoyed', 0.6425196528434753), ('Pomeranian', 0.04062102362513542), ('keeshond', 0.03186424449086189), ('white wolf', 0.01739676296710968), ('Eskimo dog', 0.011717947199940681)]
timm 库所有模型在 ImageNet 数据集结果汇总
https://rwightman.github.io/pytorch-image-models/results/
开始训练你的模型
对于训练数据集文件夹,指定包含 train 和 validation 的基础文件夹。
想训练一个 SE-ResNet34 在 ImageNet 数据集,4 GPUs,分布式训练,使用 cosine 的 learning rate schedule,命令为:
./distributed_train.sh 4 /data/imagenet \--model seresnet34 \--sched cosine \--epochs 150 \--warmup-epochs 5 \--lr 0.4 \--reprob 0.5 \--remode pixel \--batch-size 256 \--amp \-j 4
注:
--amp
默认使用 native AMP。--apex-amp 将强制使用 Apex 组件。
一些训练示例
想训练 EfficientNet-B2 with RandAugment - 80.4 top-1, 95.1 top-5:
These params are for dual Titan RTX cards with NVIDIA Apex installed:
./distributed_train.sh 2 /imagenet/ \--model efficientnet_b2 \-b 128 \--sched step \--epochs 450 \--decay-epochs 2.4 \--decay-rate .97 \--opt rmsproptf \--opt-eps .001 \-j 8 \--warmup-lr 1e-6 \--weight-decay 1e-5 \--drop 0.3 \--drop-connect 0.2 \--model-ema \--model-ema-decay 0.9999 \--aa rand-m9-mstd0.5 \--remode pixel \--reprob 0.2 \--amp \--lr .016
想训练 MixNet-XL with RandAugment - 80.5 top-1, 94.9 top-5:
This params are for dual Titan RTX cards with NVIDIA Apex installed:
./distributed_train.sh 2 /imagenet/ \--model mixnet_xl \-b 128 \--sched step \--epochs 450 \--decay-epochs 2.4 \--decay-rate .969 \--opt rmsproptf \--opt-eps .001 \-j 8 \--warmup-lr 1e-6 \--weight-decay 1e-5 \--drop 0.3 \--drop-connect 0.2 \--model-ema \--model-ema-decay 0.9999 \--aa rand-m9-mstd0.5 \--remode pixel \--reprob 0.3 \--amp \--lr .016 \--dist-bn reduce
想训练 SE-ResNeXt-26-D and SE-ResNeXt-26-T:
These hparams (or similar) work well for a wide range of ResNet architecture, generally a good idea to increase the epoch # as the model size increases... ie approx 180-200 for ResNe(X)t50, and 220+ for larger. Increase batch size and LR proportionally for better GPUs or with AMP enabled. These params were for 2 1080Ti cards:
./distributed_train.sh 2 /imagenet/ \--model seresnext26t_32x4d \--lr 0.1 \--warmup-epochs 5 \--epochs 160 \--weight-decay 1e-4 \--sched cosine \--reprob 0.4 \--remode pixel \-b 112
想训练 EfficientNet-B3 with RandAugment - 81.5 top-1, 95.7 top-5:
The training of this model started with the same command line as EfficientNet-B2 w/ RA above. After almost three weeks of training the process crashed. The results weren't looking amazing so I resumed the training several times with tweaks to a few params (increase RE prob, decrease rand-aug, increase ema-decay). Nothing looked great. I ended up averaging the best checkpoints from all restarts. The result is mediocre at default res/crop but oddly performs much better with a full image test crop of 1.0.
想训练 EfficientNet-B0 with RandAugment - 77.7 top-1, 95.3 top-5:
Michael Klachko achieved these results with the command line for B2 adapted for larger batch size, with the recommended B0 dropout rate of 0.2.
./distributed_train.sh 2 /imagenet/ \--model efficientnet_b0 \-b 384 \--sched step \--epochs 450 \--decay-epochs 2.4 \--decay-rate .97 \--opt rmsproptf \--opt-eps .001 \-j 8 \--warmup-lr 1e-6 \--weight-decay 1e-5 \--drop 0.2 \--drop-connect 0.2 \--model-ema \--model-ema-decay 0.9999 \--aa rand-m9-mstd0.5 \--remode pixel \--reprob 0.2 \--amp \--lr .048
想训练 ResNet50 with JSD loss and RandAugment (clean + 2x RA augs) - 79.04 top-1, 94.39 top-5:
./distributed_train.sh 2 /imagenet \-b 64 \--model resnet50 \--sched cosine \--epochs 200 \--lr 0.05 \--amp \--remode pixel \--reprob 0.6 \--aug-splits 3 \--aa rand-m9-mstd0.5-inc1 \--resplit \--split-bn \--jsd \--dist-bn reduce
想训练 EfficientNet-ES (EdgeTPU-Small) with RandAugment - 78.066 top-1, 93.926 top-5
./distributed_train.sh 8 /imagenet \--model efficientnet_es \-b 128 \--sched step \--epochs 450 \--decay-epochs 2.4 \--decay-rate .97 \--opt rmsproptf \--opt-eps .001 \-j 8 \--warmup-lr 1e-6 \--weight-decay 1e-5 \--drop 0.2 \--drop-connect 0.2 \--aa rand-m9-mstd0.5 \--remode pixel \--reprob 0.2 \--amp \--lr .064
想训练 MobileNetV3-Large-100 - 75.766 top-1, 92,542 top-5:
./distributed_train.sh 2 /imagenet/ \--model mobilenetv3_large_100 \-b 512 \--sched step \--epochs 600 \--decay-epochs 2.4 \--decay-rate .973 \--opt rmsproptf \--opt-eps .001 \-j 7 \--warmup-lr 1e-6 \--weight-decay 1e-5 \--drop 0.2 \--drop-connect 0.2 \--model-ema \--model-ema-decay 0.9999 \--aa rand-m9-mstd0.5 \--remode pixel \--reprob 0.2 \--amp \--lr .064 \--lr-noise 0.42 0.9
想训练 ResNeXt-50 32x4d w/ RandAugment - 79.762 top-1, 94.60 top-5:.
./distributed_train.sh 8 /imagenet \--model resnext50_32x4d \--lr 0.6 \--warmup-epochs 5 \--epochs 240 \--weight-decay 1e-4 \--sched cosine \--reprob 0.4 \--recount 3 \--remode pixel \--aa rand-m7-mstd0.5-inc1 \-b 192 \-j 6 \--amp \--dist-bn reduce
验证/推理你的模型
对于验证集文件夹,指定在 validation 的文件夹位置。
验证带有预训练权重的模型:
python validate.py /imagenet/validation/ \--model seresnext26_32x4d \--pretrained
根据给定的 checkpoint 作前向推理:
python inference.py /imagenet/validation/ \--model mobilenetv3_large_100 \--checkpoint ./output/train/model_best.pth.tar
特征提取
timm 中的所有模型都可以从模型中获取各种类型的特征,用于除分类之外的任务。
获取 Penultimate Layer Features:
Penultimate Layer Features的中文含义是 "倒数第2层的特征",即 classifier 之前的特征。timm 库可以通过多种方式获得倒数第二个模型层的特征,而无需进行模型的手术。
import torch
import timm
m = timm.create_model('resnet50', pretrained=True, num_classes=0)
o = m(torch.randn(2, 3, 224, 224))
print(f'Pooled shape: {o.shape}')
输出:
Pooled shape: torch.Size([2, 2048])
获取分类器之后的特征:
import torch
import timm
m = timm.create_model('ese_vovnet19b_dw', pretrained=True)
o = m(torch.randn(2, 3, 224, 224))
print(f'Original shape: {o.shape}')
m.reset_classifier(0)
o = m(torch.randn(2, 3, 224, 224))
print(f'Pooled shape: {o.shape}')
输出:
Pooled shape: torch.Size([2, 1024])
输出多尺度特征:
默认情况下,大多数模型将输出 5 个stride (并非所有模型都有那么多),第一个从 stride = 2 开始 (有些从 1 或 4 开始)。
import torch
import timm
m = timm.create_model('resnest26d', features_only=True, pretrained=True)
o = m(torch.randn(2, 3, 224, 224))
for x in o:
print(x.shape)
输出:
torch.Size([2, 64, 112, 112])
torch.Size([2, 256, 56, 56])
torch.Size([2, 512, 28, 28])
torch.Size([2, 1024, 14, 14])
torch.Size([2, 2048, 7, 7])
.feature_info 属性是一个封装了特征提取信息的类:
比如这个例子输出各个特征的通道数:
import torch
import timm
m = timm.create_model('regnety_032', features_only=True, pretrained=True)
print(f'Feature channels: {m.feature_info.channels()}')
o = m(torch.randn(2, 3, 224, 224))
for x in o:
print(x.shape)
输出:
Feature channels: [32, 72, 216, 576, 1512]
torch.Size([2, 32, 112, 112])
torch.Size([2, 72, 56, 56])
torch.Size([2, 216, 28, 28])
torch.Size([2, 576, 14, 14])
torch.Size([2, 1512, 7, 7])
选择特定的 feature level 或限制 stride:
out_indices:指定输出特征的索引 (实际是指定通道数)。
output_stride:指定输出特征的 stride 值,通过将特征进行 dilated convolution 得到。
import torch
import timm
m = timm.create_model('ecaresnet101d', features_only=True, output_stride=8, out_indices=(2, 4), pretrained=True)
print(f'Feature channels: {m.feature_info.channels()}')
print(f'Feature reduction: {m.feature_info.reduction()}')
o = m(torch.randn(2, 3, 320, 320))
for x in o:
print(x.shape)
输出:
Feature channels: [512, 2048]
Feature reduction: [8, 8]
torch.Size([2, 512, 40, 40])
torch.Size([2, 2048, 40, 40])
这个例子里面 out_indices=8,代表输出 stride=8 的特征。out_indices=(2,4) 代表输出特征的索引是2和4,即channel数分别是512和2048。
代码解读
以上就是读者在使用 timm 库时的基本方法,其实到这里你应该已经能够使用它训练自己的分类模型了。但是如果还想进一步搞清楚它的框架原理,并在它的基础上做修改,本节可能会帮到你。
总结
本文简要介绍了优秀的 PyTorch Image Model 库:timm 库的使用方法以及框架实现。图像分类,顾名思义,是一个输入图像,输出对该图像内容分类的描述的问题。它是计算机视觉的核心,实际应用广泛。图像分类的传统方法是特征描述及检测,这类传统方法可能对于一些简单的图像分类是有效的,但由于实际情况非常复杂,传统的分类方法不堪重负。本文的目的是为学者介绍一系列的优秀的视觉分类深度学习模型的 PyTorch 实现,以便更快地开展相关实验。
reference
https://jishuin.proginn.com/p/763bfbd64d09