Pytorch入门

引言

安装相关

安装Anaconda

通过链接Anaconda下载安装包并根据提示安装到自己的目标目录下。国内网络环境不友好，可以更换国内镜像源。

然后在当前环境下创建一个新的环境命名为 pytorch(可随意确定) 并切换到新环境(pytorch)下，再安装pytorch所需的环境

1
2
3
4
5
6
7


(base)$ conda create -n pytorch python=3.10
(base)$ conda activate pytorch

# GPU版本的
(pytorch)$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
# CPU版本的
(pytorch)$ conda install pytorch torchvision torchaudio cpuonly -c pytorch

安装完成之后进行验证

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


(pytorch)$ python
Python 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
# 没有报错说明 pytorch 安装成功
>>> torch.cuda.is_available()
# True说明 GPU 可用,我这里 linux 系统并没有装 nvidia 显卡驱动，所以是 False
False
>>> x = torch.rand(5, 3)
>>> print(x)
tensor([[0.7296, 0.8000, 0.3431],
        [0.9990, 0.2202, 0.0889],
        [0.1130, 0.9352, 0.8562],
        [0.4942, 0.4214, 0.4962],
        [0.3023, 0.7504, 0.6801]])
>>>

Tensorboard使用举一个简单的画三角函数图像的例子：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


import numpy as np
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter("./test_logs")    # 可自定义输出位置，默认为./runs
r = 5
for i in range(100):
    writer.add_scalars('trigonometric function', 
                       {'xsinx':i*np.sin(i/r),
                       'xcosx':i*np.cos(i/r),
                       'tanx': np.tan(i/r)}, i)
writer.close()

然后通过命令 tensorboard --logdir=test_logs，根据提示可以打开网页看到结果，如下图所示。

Transforms图像变换

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


from torchvision import transforms
transforms.CenterCrop() #对图片中心进行裁剪 
transforms.FiveCrop()  #对图像四个角和中心进行裁剪得到五分图像
transforms.Grayscale()  #对图像进行灰度变换
transforms.Pad()  #使用固定值进行像素填充
transforms.RandomResizedCrop()	#随即裁剪为不同大小和宽高比，然后缩放为特定大小
transforms.RandomCrop()  #随机区域裁剪
transforms.RandomRotation()  #随机旋转
transforms.RandomHorizontalFlip()  #随机水平翻转
transforms.Normalize()	#归一化处理
...

transforms.Compose组合实现各种变化

1
2
3
4
5


transform = transforms.Compose([
    			transforms.Random
    			transforms.ToTensor(),
     			transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

torchvision、DataLoader

nn.Module

卷积层

激活层

池化层

全连接层

损失函数

L1Loss && MSELoss

假设预期结果为 [1, 2, 8]，但实际结果为 [1, 2, 3]

定义：

$$ ℓ(x,y)=\begin{cases} mean(L),\quad reduction=mean; \\\\ sum(L),\quad reduction=sum \end{cases} $$

loss_l1 = nn.L1loss(reduction='mean'或者'sum')

L1Loss的结果：

$$ ℓ(x,y)=\begin{cases} \frac {(1 - 1) + (2 - 2) + (8 - 3)}{3} = 1.67 \quad reduction = mean \\\ (1 - 1) + (2 - 2) + (8 - 3) = 5 \quad reduction = sum \end{cases} $$

loss_mse = nn.MSELoss()

MSELoss的结果：

$$ ℓ(x,y)=\begin{cases} \frac {(1 - 1)^2 + (2 - 2)^2 + (8 - 3)^2}{3} = 8.33 \quad reduction = mean \\\ (1 - 1)^2 + (2 - 2)^2 + (8 - 3)^2 = 25 \quad reduction = sum \end{cases} $$

CrossEntropyLoss

torch.nn.CrossEntropyLoss(weight=None, ignore_index=- 100, reduction='mean', label_smoothing=0.0)

定义：

$$ loss(x, class) = -ln(\frac{e^{x[class]}}{\sum_j e^{x[j]}}) \\\ =-x[class] + ln(\sum_j e^{x[j]}) $$

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


import torch
from torch import nn
## target	[person, dog, cat]
## intputs	[0.1, 	 0.2, 0.3]
inputs = torch.tensor([0.1, 0.2, 0.3])
target = torch.tensor([1])	# 1代表dog
inputs = torch.reshape(inputs, (1, 3))

loss_cross = nn.CrossEntropyLoss()
result_cross = loss_cross(inputs, target)
print(result_cross)
## 输出
## tensor(1.1019)

CrossEntropyLoss的结果：

$$ loss(x, class) = -0.2 + ln(e^{0.1} + e^{0.2} + e^{0.3}) = 1.1019 $$

一个简单的例子（Cifar-10分类任务）

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114


# -*- coding: utf-8 -*-
# 导入包
import torch
import torchvision
from torch.utils.tensorboard import SummaryWriter
from torch import nn
from torch.utils.data import DataLoader

# 导入数据集
train_data = torchvision.datasets.CIFAR10(root="./data", train=True,
                                          transform=torchvision.transforms.ToTensor(), download=True)
test_data = torchvision.datasets.CIFAR10(root="./data", train=False,
                                          transform=torchvision.transforms.ToTensor(), download=True)

# 输出训练和测试总量
train_data_size = len(train_data)
test_data_size = len(test_data)
print("训练集的大小: {}".format(train_data_size))
print("测试集的大小: {}".format(test_data_size))

batch_size = 64
train_dataloader = DataLoader(train_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Conv2d(3, 32, 5, stride=1, padding=2),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 32, 5, stride=1, padding=2),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, 5, stride=1, padding=2),
            nn.MaxPool2d(2),
            nn.Flatten(),
            nn.Linear(64*4*4, 64),
            nn.Linear(64, 10)
        )
    def forward(self, x):
        x = self.model(x)
        return x

# 实例一个model
my_model = MyModel()

# GPU
if torch.cuda.is_available():
    my_model = my_model.cuda()

# 损失函数
loss_fn = nn.CrossEntropyLoss() # 交叉熵
if torch.cuda.is_available():
    loss_fn = loss_fn.cuda()

# 优化器
learning_rate = 0.01    # 学习率
optimizer = torch.optim.SGD(my_model.parameters(), lr=learning_rate)

# 训练参数
total_train_step = 0
total_test_step = 0
epoch = 16

# tensorboard
writer = SummaryWriter("./taining_logs")

for i in range(epoch):
    print("----开始第 {} 轮训练----".format(i+1))

    my_model.train()
    for data in train_dataloader:
        imgs, labels = data
        if torch.cuda.is_available():
            imgs = imgs.cuda()
            labels = labels.cuda()
        outputs = my_model(imgs)
        loss = loss_fn(outputs, labels)
        # 优化
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_train_step = total_train_step + 1
        # 每100次显示一次
        if total_train_step % 100 == 0:
            print("第 {} 次训练的loss值: {}".format(total_train_step, loss.item()))
            writer.add_scalar("train_loss", loss.item(), total_train_step)

    # 测试步骤开始
    my_model.eval()
    total_test_loss = 0
    total_accuracy = 0
    # 测试集不需要梯度
    with torch.no_grad():
        for data in test_dataloader:
            imgs, labels = data
            if torch.cuda.is_available():
                imgs = imgs.cuda()
                labels = labels.cuda()
            outputs = my_model(imgs)
            loss = loss_fn(outputs, labels)
            total_test_loss = total_test_loss + loss.item()
            accuracy = (outputs.argmax(1) == labels).sum()
            total_accuracy = total_accuracy + accuracy

        print("整体测试集上的Loss: {}".format(total_test_loss))
        print("整体测试集上的正确率: {}".format(total_accuracy / test_data_size))
        writer.add_scalar("test_loss", total_test_loss, total_test_step)
        writer.add_scalar("test_accuracy", total_accuracy / test_data_size, total_test_step)
        total_test_step = total_test_step + 1
      
torch.save(my_model, "my_cifar10.pth")
print("模型已保存")
writer.close()

Pytorch 各种模型的格式

格式	说明	适用场景	对应后缀
`.pt或.pth`	pytorch默认的模型文件	需要保存和加载完整Pytorch模型的场景	`.pt或.pth`
`.bin`	通用的二进制格式	需要将Pytorch模型转化为通用的二进制格式的场景	`.bin`
`.onnx`	通用的交叉模型格式	需要将Pytorch模型转化为其他深度学习框架或硬件平台可用的格式的场景	`.onnx`
TorchScript	Pytorch提供的一种序列化和优化模型的方法	需要将Pytorch模型序列化和优化，并在没有Pytorch环境的情况下运行模型的场景	`.pt或.pth`

目标检测数据集

VOC数据集

Annotations：包含了xml文件，描述了图片的各种信息，特别是目标的坐标位置
ImagesSets：主要关注Main文件夹的内容，里面的文件包含了不同类别目标的训练/验证数据集的图片名称
JPEGImages：图片原文件
SegentationClass/Object：用于语义分割

COCO数据集

创建自己的coco数据集：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


目录结构：
.........
my_dataset
    ├───imgs
    │  ├───1.jpg
    │  ├───1.json
    │  ├───2.jpg
    │  ├───2.json
    │  │...........
    │  └───nnn.json
 	└───coco
labelme2coco.py
labels.txt
.........

安装labelme,然后在当前目录执行命令：

1

python labelme2coco.py --labels labels.txt my_data/imgs/ my_data/coco/

其中文件链接labelme2coco.py和labels.txt文件示例如下(本演示只有一个标签raccoon):

1
2
3


__ignore__
_background_
raccoon

之后会在coco目录下生成可用的coco数据集文件。

加载该数据集并可视化显示的代码如下：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


import torchvision
from PIL import ImageDraw

# torch官方的Coco API加载数据集
coco_dataset = torchvision.datasets.CocoDetection(root=r"./my_data/coco/",                                       annFile=r"./my_data/coco/annotations.json")
# 读取第一张图片并显示
image, info = coco_dataset[0]
image_handler = ImageDraw.ImageDraw(image)
# 在该图片上画上标注框
for annotation in info:
    x_min, y_min, width, height = annotation['bbox']
    image_handler.rectangle(((x_min, y_min), (x_min+width, y_min+height)),
                            fill=None, outline='red', width=2)
# 显示
image.show()

To be contuine