TorchVison Image Transforms

transforms主要是图像transform, 它们可以通过使用Compose来链接起来。

transforms.Compose([
  transforms.CenterCrop(10),
  transforms.ToTensor()
])

Transforms on PIL Image

`torchvision.transforms.CenterCrop`(size):

对给定的PIL image在中心处裁剪。

参数为：size, int or sequence. 如果是一个sequence，比如（h,w）会裁剪一个h*w大小的图片。

如果是int，那么会裁剪大小为（size，size）的图像

`torchvision.transforms.FiveCrop`(size)

对给定的PIL image的四个角和中心进行裁剪

其他同上。

>>> transform = Compose([
>>>    FiveCrop(size), # this is a list of PIL Images
>>>    Lambda(lambda crops: torch.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor
>>> ])

`torchvision.transforms.Pad`(padding, fill=0, padding_mode=’constant’)

用给定的pad值对图像的4个sides进行填充

参数：padding: 用于确定每个border填充的数量.

如果只有一个int，对所有的边进行一样的填充数量

如果为长度为2的tuple，那么是对左右，上下分别指定

如果长度为4的tuple，那么是对左、上，右、下的边分别指定

fill: 当mode为constfill时的填充值。默认为0，如果是一个长度为3的tuple是，分别为RGB值

padding_mode:padding的类型

constant，常数填充

edge：用edge上的值进行填充

reflect：pads with reflection of image without repeating the last value on the edge

symmeic：pads with reflection of image repeating the last value on the edge

`torchvision.transforms.Grayscale`(num_output_channels=1)

将image转为灰度图

参数：num_output_channels ，默认为1，也可以为3, 是想要输出图像的channel的个数。

输出：输入的灰度版本。如果nums为1，那么返回的image是单channel，如果是3，返回的image的三个r、g、b三个通道相等。

输出的type：PIL image

`torchvision.transforms.Resize`(size, interpolation=2)

将输入的PILimage的大小resize到给定的大小

参数：size (sequence or int)期望的输出。如果size是int，那么短的边会匹配到这个数字。ie，如果height>height, 那么image会被缩放为(size*height/width, size). 如果size为sequence，那么大小会被匹配到给定的（h,w）。

interpolation: 插值的方法，默认为PIL.Image.BILINEAR

Transforms on torch.*Tensor

`torchvision.transforms.`Normalize(mean, std, inplace=False)

归一化给定的mean，std来归一化一张tensor image。对于每一个channel进行

$\frac{（input[channel - mean[channel]）}{std[channel]}$

参数：mean：每个channel的均值

std: 每个channel的std值

返回：normalized Tensor image

返回类型：Tensor

Note：不是就地改变输入Tensor

Conversion Transforms

`torchvision.transforms.ToPILImage`(mode=None)

将Tensor或者ndarray转换为PILimage

参数：mode:

如果mode没给定：

如果输入为4channel，那么默认为RGBA

如果输入为3channel，那么默认为RGB

如果输入为2channel，那么默认为LA

如果输入为1 channel，那么由mode参数确定

`torchvision.transforms.ToTensor`

将PIL image 或者ndarray转换为Tensor

将值范围为【0，255】的PIL image或者ndarray（H/W/C）转换为FloatTensor(C,H,W)并且值范围为【0.0，1.0】，如果the PIL Image属于 one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) 或者 the numpy.ndarray has dtype = np.uint8

其他的，tensors不会进行缩放

FiveCrop和TenCrop

这两种操作之后,一张图变成五张,一张图变成十张,那么在训练或者测试的时候怎么避免和标签混淆呢
思路是,这多个图拥有相同的标签,假如是分类任务,就可以使用交叉熵进行,然后求10张图的平均

transform = Compose([
    TenCrop(size), # this is a list of PIL Images
    Lambda(lambda crops: torch.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor
])

#In your test loop you can do the following:
input, target = batch # input is a 5d tensor, target is 2d
bs, ncrops, c, h, w = input.size()
result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops

result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops

Transforms on PIL Image

torchvision.transforms.CenterCrop(size):

torchvision.transforms.FiveCrop(size)

torchvision.transforms.Pad(padding, fill=0, padding_mode=’constant’)

torchvision.transforms.Grayscale(num_output_channels=1)

torchvision.transforms.Resize(size, interpolation=2)