最新超简单解读torchvision

torchvision

https://pytorch.org/docs/stable/torchvision/index.html#module-torchvision

The torchvision package consists of popular datasets(数据集), model architectures（模型结构）, and common image transformations（通用图像转换） for computer vision.

torchvision.get_image_backend():Gets the name of the package used to load images

torchvision.set_image_backend(backend): Specifies the package used to load images.

torchvision.set_video_backend(backend): Specifies the package used to decode videos.

torchvision.datasets(目前共24个数据集)：

MNIST;Fashion-MNIST;KMNIST;EMNIST;QMNIST;FakeData;COCO;LSUN;ImageFolder;DatasetFolder;ImageNet;CIFAR;STL10;SVHN;PhotoTour;SBU;Flickr;VOC;Cityscapes;SBD;USPS;Kinetics-400;HMDB51;UCF101.

torchvision.io(目前只支持video):

Video

torchvision.io.read_video(filename, start_pts=0, end_pts=None, pts_unit='pts')

Reads a video from a file, returning both the video frames as well as the audio frames.

torchvision.models(目前只支持Classification, Semantic Segmentation, Object Detection, Instance Segmentation and Person Keypoint Detection和Video classification三类模型)：

Classification：

The models subpackage contains definitions for the following model architectures for image classification:

AlexNet

VGG

ResNet

SqueezeNet

DenseNet

Inception v3

GoogLeNet

ShuffleNet v2

MobileNet v2

ResNeXt

Wide ResNet

MNASNet

You can construct a model with random weights by calling its constructor:

import torchvision.models as models

resnet18 = models.resnet18()

alexnet = models.alexnet()

vgg16 = models.vgg16()

squeezenet = models.squeezenet1_0()

densenet = models.densenet161()

inception = models.inception_v3()

googlenet = models.googlenet()

shufflenet = models.shufflenet_v2_x1_0()

mobilenet = models.mobilenet_v2()

resnext50_32x4d = models.resnext50_32x4d()

wide_resnet50_2 = models.wide_resnet50_2()

mnasnet = models.mnasnet1_0()

pre-trained models, using the PyTorch torch.utils.model_zoo. These can be constructed by passing pretrained=True:

import torchvision.models as models

resnet18 = models.resnet18(pretrained=True)

alexnet = models.alexnet(pretrained=True)

squeezenet = models.squeezenet1_0(pretrained=True)

vgg16 = models.vgg16(pretrained=True)

densenet = models.densenet161(pretrained=True)

inception = models.inception_v3(pretrained=True)

googlenet = models.googlenet(pretrained=True)

shufflenet = models.shufflenet_v2_x1_0(pretrained=True)

mobilenet = models.mobilenet_v2(pretrained=True)

resnext50_32x4d = models.resnext50_32x4d(pretrained=True)

wide_resnet50_2 = models.wide_resnet50_2(pretrained=True)

mnasnet = models.mnasnet1_0(pretrained=True)

Instancing a pre-trained model will download its weights to a cache directory. This directory can be set using the TORCH_MODEL_ZOO environment variable. See torch.utils.model_zoo.load_url() for details.

Some models use modules which have different training and evaluation behavior, such as batch normalization. To switch between these modes, use model.train() or model.eval() as appropriate. See train() or eval() for details.

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. You can use the following transform to normalize:

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],

std=[0.229, 0.224, 0.225])

Semantic Segmentation：

The models subpackage contains definitions for the following model architectures for semantic segmentation:

FCN ResNet101

DeepLabV3 ResNet101

As with image classification models, all pre-trained models expect input images normalized in the same way. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. They have been trained on images resized such that their minimum size is 520.

The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset. You can see more information on how the subset has been selected in references/segmentation/coco_utils.py. The classes that the pre-trained model outputs are the following, in order:

['__background__', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus',

'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike',

'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']

Object Detection, Instance Segmentation and Person Keypoint Detection：

The models subpackage contains definitions for the following model architectures for detection:

Faster R-CNN ResNet-50 FPN

Mask R-CNN ResNet-50 FPN

The pre-trained models for detection, instance segmentation and keypoint detection are initialized with the classification models in torchvision.

The models expect a list of Tensor[C, H, W], in the range 0-1. The models internally resize the images so that they have a minimum size of 800. This option can be changed by passing the option min_size to the constructor of the models.

For object detection and instance segmentation, the pre-trained models return the predictions of the following classes:

COCO_INSTANCE_CATEGORY_NAMES = [

'__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',

'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',

'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',

'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',

'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',

'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',

'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',

'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',

'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',

'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',

'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',

'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'

]

For person keypoint detection, the pre-trained model return the keypoints in the following order:

COCO_PERSON_KEYPOINT_NAMES = [

'nose',

'left_eye',

'right_eye',

'left_ear',

'right_ear',

'left_shoulder',

'right_shoulder',

'left_elbow',

'right_elbow',

'left_wrist',

'right_wrist',

'left_hip',

'right_hip',

'left_knee',

'right_knee',

'left_ankle',

'right_ankle'

]

Video classification：

We provide models for action recognition pre-trained on Kinetics-400. They have all been trained with the scripts provided in references/video_classification.

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB videos of shape (3 x T x H x W), where H and W are expected to be 112, and T is a number of video frames in a clip. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.43216, 0.394666, 0.37645] and std = [0.22803, 0.22145, 0.216989].

NOTE

The normalization parameters are different from the image classification ones, and correspond to the mean and std from Kinetics-400.

NOTE

For now, normalization code can be found in references/video_classification/transforms.py, see the Normalizefunction there. Note that it differs from standard normalization for images because it assumes the video is 4d.

Kinetics 1-crop accuracies for clip length 16 (16x112x112)

Network	Clip acc@1	Clip acc@5
ResNet 3D 18	52.75	75.45
ResNet MC 18	53.90	76.29
ResNet (2+1)D	57.50	78.81

torchvision.ops（操作符）：

torchvision.ops implements operators that are specific for Computer Vision.

支持：

torchvision.ops.nms(boxes, scores, iou_threshold)：Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU).

torchvision.ops.roi_align(input, boxes, output_size, spatial_scale=1.0, sampling_ratio=-1): Performs Region of Interest (RoI) Align operator described in Mask R-CNN

torchvision.ops.roi_pool(input, boxes, output_size, spatial_scale=1.0): Performs Region of Interest (RoI) Pool operator described in Fast R-CNN

torchvision.transforms（转换操作）

torchvision.utils.make_grid(tensor, nrow=8, padding=2, normalize=False, range=None, scale_each=False, pad_value=0), Make a grid of images.

torchvision.utils.save_image(tensor, fp, nrow=8, padding=2, normalize=False, range=None, scale_each=False, pad_value=0, format=None), Save a given Tensor into an image file.

随机推荐

Codeforces 1038 D. Slime
[传送门] 其实就是这些数字前面能加正负号,在满足正负号均出现的情况下价值最大.那么就可以无脑DP$f[i][j][k]$表示到了第$i$位,正号是否出现($j$.$k$为$0$或$1$)能得到的最大 ...
SPU、SKU、ARPU
在涂涂商城开发之前,发现一篇关于电商中 SPU.SKU.ARPU 的介绍,转至博客,原文地址:http://www.ikent.me/blog/3017 什么是SPU.SKU.ARPU 首先,搞清楚商 ...
SSFOJ P1453 子序列（一）题解
每日一题 day61 打卡 Analysis las数组表示的是最近一个为j的位置为是什么. dp数组的含义是以str[i]为结尾的子序列数量. 于是有状态转移方程: dp[las[i][j]]+=d ...
OKR案例——不同类型的OKR实例
OKR是一种能将团队调动起来一起向着一个方向去努力的绝佳目标管理法,它让我们的团队去挑战自己的极限,去实现更大的价值,去将我们的战略最完美的转化为成果. 然而,想要让OKR在我们的团队中发挥作用,制定 ...
ps制作马赛克图片
ES5新增的数组方法
ES5新增:(IE9级以上支持)1.forEach():遍历数组,无返回值,不改变原数组.2.map():遍历数组,返回一个新数组,不改变原数组.3.filter():过滤掉数组中不满足条件的值,返回 ...
【搜索】$P1092$虫食算
题目链接首先,我们只考虑加法的虫食算.这里的加法是N进制加法,算式中三个数都有N位,允许有前导的0. 其次,虫子把所有的数都啃光了,我们只知道哪些数字是相同的,我们将相同的数字用相同的字母表示,不同 ...
mac切图
1.按住command键位, 两只手指点击需要切的图 2.再在右边栅格化图层 3.选中需要剪切的图层.command+c 和command+n和 command+v OK 切整张图.先 option ...
<每日 1 OJ> -LeetCode 13 . 罗马数字转正数
题目: 罗马数字包含以下七种字符: I, V, X, L,C,D 和 M. 字符数值I 1V 5X 10L 50C 100D 500M 1000例如, 罗马数字 2 写做 II ,即为两个并列的 1 ...
abp 中log4net 集成Kafka
1.安装包 Install-Package log4net.Kafka.Core 2.修改log4net.config 配置文件 <?xml version="1.0" en ...

最新超简单解读torchvision

最新超简单解读torchvision的更多相关文章

随机推荐

热门专题