一篇文章带你进入 Pytorch 的国度｜Python 主题月-一一网

本文正在参加「Python主题月」，详情查看活动链接

机器学习是我们根据对真实任务的理解，用数学语言描述一些，
从一个简单问题出发带你一步一步从基于 numpy 的解决方案转化为 pytorch 方式来解决问题。分享主旨在于让您对了解什么是 Tensor ，pytorch 中的 Tensor 可以将其理解为运行在 GPU 上的。还有就是使用 pytorch 构建和训练神经网络的一般步骤。文章案例和部分内容来自 pytorch 的官方文档，文档是你入门以及深入了解框架。

数据是在从 -3.14 到 3.14 间进行均匀采样，然后将数据经过 sin(x) 得到 y ，我们用设计一个神经网络来近似学出一个模型来逼近 sin 函数。

对您的一点要求

要想理解这篇分享，需要对深度学习有所了解，了解 python 的基本语法，可以 python 这个编程语言写代码。熟悉深度学习项目中模型定义以及训练，了解 pytorch 框架。

用 numpy 实现神经网络

在介绍PyTorch之前，首先要用numpy实现网络。Numpy 是一个用于科学计算的通用框架，提供对向量和矩阵的表达以及对这些数据结构的操作。但是 Numpy 当初并不是为深度学习而设计，所以没有对例如计算图着一些深度学习的支持。这里用一个高阶多项式来拟合正弦函数，通过使用numpy操作手动实现网络的前向和后向传递。

import numpy as np
import math
import matplotlib.pyplot as plt

def dataset():
	x = np.linspace(-math.pi, math.pi, 2000)
	y = np.sin(x)

	return x,y

class Model:
	def __init__(self):

		# 初始化模型权重
		self.a = np.random.randn()
		self.b = np.random.randn()
		self.c = np.random.randn()
		self.d = np.random.randn()

		self.epoches = 2000
		self.learning_rate = 1e-6
	def train(self,x,y):
		for epoch in range(self.epoches):
			y_pred = self.a + self.b *x + self.c*x **2 + self.d* x**3

			# 计算损失值
			loss = np.square(y_pred - y).sum()
			if epoch % 100 == 99:
				print(epoch,loss)
			
			grad_y_pred = 2.0 * (y_pred -y)
			
			grad_a = grad_y_pred.sum()		
			grad_b = (grad_y_pred*x).sum()		
			grad_c = (grad_y_pred*x**2).sum()		
			grad_d = (grad_y_pred*x**3).sum()		

			self.a -= self.learning_rate * grad_a
			self.b -= self.learning_rate * grad_b
			self.c -= self.learning_rate * grad_c
			self.d -= self.learning_rate * grad_d
		print(f"Result: y= {self.a} + {self.b}x + {self.c}x^2 + {self.d}x^3")	

if __name__ == "__main__":
	x,y = dataset()
#	plt.plot(x,y)
#	plt.show()

	model = Model()
	model.train(x,y)

复制代码

准备数据集
定义模型，也就是我们找到一个函数集合，这个函数集合不能太复杂也不能过于简单，这里我们用 3 阶的多项式来拟合
定义损失函数，也就是我们衡量我们找到函数给出答案与正确答案之间差距，也就是衡量我们找到函数逼近真实函数的程度，同时为给如何训练一点提示
梯度下降，我们是沿着梯度方向一步一步找到让函数给出值接近真实值时所对应参数

99 1889.940637866053
199 1279.4185079879578
299 867.9029409104106
399 590.2627387918794
499 402.7629925812673
599 276.0117451685029
699 190.23937687341535
799 132.13673749120156
899 92.73579912145074
999 65.98802589121729
1099 47.81001788773136
1199 35.4423216183395
1299 27.018280854221068
1399 21.27387063441539
1499 17.352238178693227
1599 14.671917339753634
1699 12.83788922530275
1799 11.581499958306198
1899 10.719831814469643
1999 10.128200865575229
Result: y= 0.030062902508905868 + 0.8349093658197444x + -0.005186350931643299x^2 + -0.09022505151138818x^3
复制代码

引入 Tensor

Numpy 作为经典操作矩阵库，用来开发神经网络是一个不错的选择，但不能利用 GPU 来加速其数值计算。对于深层的神经网络来说，没有 GPU 参与计算是无法接受的，使用 GPU 通常能提供 50 倍或更高的速度，所以我们需要将上面代码中矩阵用 Tensor来替换表示

这里我们介绍一下 PyTorch 最基本的概念 Tensor。PyTorch 的 Tensor 在概念上与 numpy 的数组类似，Tensor 也可以看成一个 n 维的数组，PyTorch 提供了许多函数来操作 Tensor。但是除了作为数组表示之外，Tensor 设计之处考虑许多深度学习的场景，所以添加许多功能，例如可以跟踪计算图和梯度。

最重要的是,同样与 numpy 不同的是，PyTorch Tensors 可以利用 GPU 来加速计算。要在 GPU 上运行PyTorch Tensor，你只需要指定正确的设备。

这里我们使用 PyTorch Tensor 来拟合 3 阶多项式的 sin 函数。像上面的 numpy 例子一样，我们需要手动用 tensor 实现网络的前向和后向传递。

import torch
import math


def dataset(device):
	dtype = torch.float
	
	x = torch.linspace(-math.pi,math.pi,2000,device=device,dtype=dtype)
	y = torch.sin(x)
	
	return x,y

class Model:
	def __init__(self):
		self.dtype = torch.float
		self.device = torch.device("cpu")

		
		self.a = torch.randn((), device=self.device,dtype=self.dtype)
		self.b = torch.randn((), device=self.device,dtype=self.dtype)
		self.c = torch.randn((), device=self.device,dtype=self.dtype)
		self.d = torch.randn((), device=self.device,dtype=self.dtype)

		self.learning_rate = 1e-6
		self.epoches = 2000
	def train(self,x,y):
		for epoch in range(self.epoches): 
			y_pred = self.a + self.b *x + self.c*x **2 + self.d* x**3

			# 计算损失值
			loss = (y_pred - y).pow(2).sum().item()

			if epoch % 100 == 99:
				print(epoch,loss)

			grad_y_pred = 2.0 *  (y_pred - y)	
			grad_a = grad_y_pred.sum()		
			grad_b = (grad_y_pred*x).sum()		
			grad_c = (grad_y_pred*x**2).sum()		
			grad_d = (grad_y_pred*x**3).sum()		

			self.a -= self.learning_rate * grad_a
			self.b -= self.learning_rate * grad_b
			self.c -= self.learning_rate * grad_c
			self.d -= self.learning_rate * grad_d
		print(f"Result: y= {self.a.item()} + {self.b.item()}x + {self.c.item()}x^2 + {self.d.item()}x^3")	

if __name__ == "__main__":

	device = torch.device("cpu")

	x,y = dataset(device)


	model = Model()
	model.train(x,y)


复制代码

自动计算梯度(Autograd)

在上面两个例子中，我们手动实现神经网络的前向传播和方向传播，对于一个简单的 2 层这样浅层神经网络手写一个反向传播似乎不是什么难事，但是对于复杂大型的网络来说如果想要自己实现反向传播可就不是什么容易的事。

还好，像 pytorch tnsorflow 这样框架，框架已经提供了反向传播中自动分化来自动计算神经网络参数的梯度。PyTorch 中的 autograd 就提供了这种功能。当使用 autograd 时，网络的前向传递将定义一个计算图；图中的节点将是 tensor，而边将是由输入 tensor 产生输出 tensor 的函数。通过这个图进行反向传播，就可以轻松地计算梯度。

这听起来很复杂，实际使用起来很简单，开发者利用 pytorch 框架搭建网络，让我们将精力更多投入架构设计,而不是如何在反向传播中计算梯度了。每个张量代表计算图中的一个节点。例如 x 是一个 tensor ，其 x.requires_grad=True，那么x.grad 就是另一个 tensor，持有 x 相对于某些标量值(损失函数值)的梯度。

这里我们使用 PyTorch tensor 和 autograd 来实现我们的 3 阶多项式拟合 sin 函数的例子，这样一来不再需要手动实现网络的后向传播了。

import torch
import math


def dataset(device):
	dtype = torch.float
	
	x = torch.linspace(-math.pi,math.pi,2000,device=device,dtype=dtype)
	y = torch.sin(x)
	
	return x,y

class Model:
	def __init__(self):
		self.dtype = torch.float
		self.device = torch.device("cpu")

		
		self.a = torch.randn((), device=self.device,dtype=self.dtype,requires_grad=True)
		self.b = torch.randn((), device=self.device,dtype=self.dtype,requires_grad=True)
		self.c = torch.randn((), device=self.device,dtype=self.dtype,requires_grad=True)
		self.d = torch.randn((), device=self.device,dtype=self.dtype,requires_grad=True)

		self.learning_rate = 1e-6
		self.epoches = 2000

	def train(self,x,y):
		for epoch in range(self.epoches): 
			y_pred = self.a + self.b *x + self.c*x **2 + self.d* x**3

			# 计算损失值
			loss = (y_pred - y).pow(2).sum()

			if epoch % 100 == 99:
				print(epoch,loss)


			loss.backward()
			with torch.no_grad():
				self.a -= self.learning_rate * self.a.grad
				self.b -= self.learning_rate * self.b.grad
				self.c -= self.learning_rate * self.c.grad
				self.d -= self.learning_rate * self.d.grad

				self.a.grad = None
				self.b.grad = None
				self.c.grad = None
				self.d.grad = None

			

		print(f"Result: y= {self.a.item()} + {self.b.item()}x + {self.c.item()}x^2 + {self.d.item()}x^3")	

if __name__ == "__main__":

	device = torch.device("cpu")

	x,y = dataset(device)


	model = Model()
	model.train(x,y)

复制代码

device = torch.device("cuda:0") 可以制定使用设备为 GPU
数据集，输入和标签变量现在都用 tensor 来表示，定义 tensor 如果没有制定 requires_grad=False, 则表示在反向传播时不会对该 tensor 计算其梯度值
使用autograd来计算反向传播，当调用 backward() 时，将计算图中所有具有 require_grad=True 的 Tensors 的损失梯度。计算完后，a.grad、b.grad、c.grad和d.grad 将是分别持有相对于a、b、c、d 的损失梯度的 tensor
将手动使用梯度下降来更新参数过程放置在 torch.no_grad()中，这是因为在更新权重参数时，我们是不需要跟踪梯度的，所以防止

99 tensor(3604.7258, grad_fn=<SumBackward0>)
199 tensor(2449.5571, grad_fn=<SumBackward0>)
299 tensor(1667.3009, grad_fn=<SumBackward0>)
399 tensor(1137.0117, grad_fn=<SumBackward0>)
499 tensor(777.1427, grad_fn=<SumBackward0>)
599 tensor(532.6580, grad_fn=<SumBackward0>)
699 tensor(366.3775, grad_fn=<SumBackward0>)
799 tensor(253.1588, grad_fn=<SumBackward0>)
899 tensor(175.9821, grad_fn=<SumBackward0>)
999 tensor(123.3137, grad_fn=<SumBackward0>)
1099 tensor(87.3292, grad_fn=<SumBackward0>)
1199 tensor(62.7157, grad_fn=<SumBackward0>)
1299 tensor(45.8604, grad_fn=<SumBackward0>)
1399 tensor(34.3047, grad_fn=<SumBackward0>)
1499 tensor(26.3733, grad_fn=<SumBackward0>)
1599 tensor(20.9233, grad_fn=<SumBackward0>)
1699 tensor(17.1741, grad_fn=<SumBackward0>)
1799 tensor(14.5921, grad_fn=<SumBackward0>)
1899 tensor(12.8121, grad_fn=<SumBackward0>)
1999 tensor(11.5835, grad_fn=<SumBackward0>)
Result: y= 0.04615207761526108 + 0.8281480669975281x + -0.007962002418935299x^2 + -0.08926331996917725x^3
复制代码

在引擎盖下，每个

import torch
import math

def dataset():
    dtype = torch.float
    device = torch.device("cpu")

    x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
    y = torch.sin(x)

    return x,y
"""
可以自定义反向传播类，这个类需要继承 `torch.autograd.Function`,需要实现 `forward` 和 `backward`
"""
class LegendrePolynomial3(torch.autograd.Function):

    @staticmethod
    def forward(ctx, input):
        """
        在前向传播中，我们接收一个包含输入的tensor，并返回一个包含输出的张量。ctx 是一个上下文对象，可以用来存储反向计算的信息。可以使用 ctx.save_for_backward 方法缓存任意对象，以便在后向传递中使用。
        """
        ctx.save_for_backward(input)
        return 0.5 * (5 * input ** 3 - 3 * input)

    @staticmethod
    def backward(ctx, grad_output):
        """
        在后向传递中，我们收到一个包含相对于输出的损失梯度的张量，我们需要计算相对于输入的损失梯度。
        """
        input, = ctx.saved_tensors
        return grad_output * 1.5 * (5 * input ** 2 - 1)

class Model:

    def __init__(self):
        self.dtype = torch.float
        self.device = torch.device("cpu")

        self.a = torch.full((), 0.0, device=self.device, dtype=self.dtype, requires_grad=True)
        self.b = torch.full((), -1.0, device=self.device, dtype=self.dtype, requires_grad=True)
        self.c = torch.full((), 0.0, device=self.device, dtype=self.dtype, requires_grad=True)
        self.d = torch.full((), 0.3, device=self.device, dtype=self.dtype, requires_grad=True)

        self.learning_rate = 5e-6
        self.epoches = 2000

    def train(self,x,y):
        for t in range(self.epoches):
            P3 = LegendrePolynomial3.apply

            y_pred = self.a + self.b * P3(self.c + self.d * x)
            loss = (y_pred - y).pow(2).sum()

            if t % 100 == 99:
                print(t, loss.item())

            # 在反向传播过程计算需要计算梯度 tensor 的梯度
            loss.backward()

            with torch.no_grad():
                # 使用梯度下降来更新权重
                self.a -= self.learning_rate * self.a.grad
                self.b -= self.learning_rate * self.b.grad
                self.c -= self.learning_rate * self.c.grad
                self.d -= self.learning_rate * self.d.grad

                # 在用梯度更新完成参数后手动将梯度设置为 0
                self.a.grad = None
                self.b.grad = None
                self.c.grad = None
                self.d.grad = None 

        print(f'Result: y = {self.a.item()} + {self.b.item()} * P3({self.c.item()} + {self.d.item()} x)')

if __name__ == "__main__":
    x,y = dataset()

    model = Model()
    model.train(x,y)
复制代码

在 PyTorch 中，可以通过定义一个继承了 torch.autograd.Function 的子类，并实现前向函数(forwad和反向函数(backward)，来轻松定义我们自己的 autograd 运算符。然后，可以通过构造一个实例，并像函数一样调用来使用我们新的autograd 操作符，并传递包含输入数据的 Tensors。

定义函数 $y = a + bP_3(c+dx)$ 来代替之前 $y= a + bx + cx^2 + dx^3$ 这里 $P_3(x) = \frac{1}{2}(5x^3 – 3x)$ ，勒让德多项式 $P_3(x)是 3 阶多项式为$ P_3(x) = \frac{1}{2}(5x^3 – 3x)$

```python
import torch
import math

def dataset():
    dtype = torch.float
    device = torch.device("cpu")

    x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
    y = torch.sin(x)

    return x,y
"""
可以自定义反向传播类，这个类需要继承 `torch.autograd.Function`,需要实现 `forward` 和 `backward`
"""
class LegendrePolynomial3(torch.autograd.Function):

    @staticmethod
    def forward(ctx, input):
        """
        在前向传播中，我们接收一个包含输入的tensor，并返回一个包含输出的张量。ctx 是一个上下文对象，可以用来存储反向计算的信息。可以使用 ctx.save_for_backward 方法缓存任意对象，以便在后向传递中使用。
        """
        ctx.save_for_backward(input)
        return 0.5 * (5 * input ** 3 - 3 * input)

    @staticmethod
    def backward(ctx, grad_output):
        """
        在后向传递中，我们收到一个包含相对于输出的损失梯度的张量，我们需要计算相对于输入的损失梯度。
        """
        input, = ctx.saved_tensors
        return grad_output * 1.5 * (5 * input ** 2 - 1)

class Model:

    def __init__(self):
        self.dtype = torch.float
        self.device = torch.device("cpu")

        self.a = torch.full((), 0.0, device=self.device, dtype=self.dtype, requires_grad=True)
        self.b = torch.full((), -1.0, device=self.device, dtype=self.dtype, requires_grad=True)
        self.c = torch.full((), 0.0, device=self.device, dtype=self.dtype, requires_grad=True)
        self.d = torch.full((), 0.3, device=self.device, dtype=self.dtype, requires_grad=True)

        self.learning_rate = 5e-6
        self.epoches = 2000

    def train(self,x,y):
        for t in range(self.epoches):
            P3 = LegendrePolynomial3.apply

            y_pred = self.a + self.b * P3(self.c + self.d * x)
            loss = (y_pred - y).pow(2).sum()

            if t % 100 == 99:
                print(t, loss.item())

            # 在反向传播过程计算需要计算梯度 tensor 的梯度
            loss.backward()

            with torch.no_grad():
                # 使用梯度下降来更新权重
                self.a -= self.learning_rate * self.a.grad
                self.b -= self.learning_rate * self.b.grad
                self.c -= self.learning_rate * self.c.grad
                self.d -= self.learning_rate * self.d.grad

                # 在用梯度更新完成参数后手动将梯度设置为 0
                self.a.grad = None
                self.b.grad = None
                self.c.grad = None
                self.d.grad = None 

        print(f'Result: y = {self.a.item()} + {self.b.item()} * P3({self.c.item()} + {self.d.item()} x)')

if __name__ == "__main__":
    x,y = dataset()

    model = Model()
    model.train(x,y)
复制代码

Pytorch 的模型

计算图和 autograd 是一个非常强大的范式，用于定义复杂的运算符和自动求导，但是对于大型神经网络来说，原始的 autograd 可能有点过于低级。

当构建神经网络时，我们经常会想到将计算通过层级方式来表示，其中，在训练过程中通过梯度下降方式对这些那些需要学习的参数进行优化。

在 TensorFlow 中，像 Keras、TensorFlow-Slim 和 TFLearn 这样的包在原始计算图上提供了更高层次的抽象，用这些高级方法来构建神经网络很方便。

在 PyTorch 中，nn 模块也提供这个功能。nn 包定义了一组模块，大致相当于神经网络的层。一个模块接收输入张量并计算输出张量，但也可以持有内部状态，如包含可学习参数的张量。nn 模块还定义了一组常用的损失函数，这些函数在训练神经网络时经常被用到。

在这个例子中，我们使用 nn 模块来实现我们的多项式模型的网络。

p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)
复制代码

在此示例中，y 是 x, x^2, x^3 线性组合，所以可以定义线性神经网络层来模拟计算，将每一个输入 x 值变为一个向量 x, x^2, x^3, 也就是分别对输入 x 取 (1,2,3) 幂，得到一个 3 维 tensor (x, x^2, x^3)

上面代码可能大家看起来有点陌生，有关对 tensor 形状变换还不熟悉可以看看这个视频《》，unsqueeze 就是 tensor 添加一个维度，将 (2000,) tensor 变形为 (2000,1) 的 tensor。p 是形状(3,1) 的 tensor, x 与 p 幂计算时 x 通过广播变为 (2000,3)

model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)
复制代码

接下来定义 nn.Sequential 容器作为模型，nn.Sequential 类似容器，在容器里面可以一层一层叠加层。这里我们用了 2 层网络结构，第一层是对输入进行线性变换，三个神经元对应输入 tensor 三个维度 a,b 和 c 权重和一个偏执，最后接一个 flatten 将 3 维向量变为 1 维输出，所以将神经网络输出一个 1 维的 tensor 对应 y

loss_fn = torch.nn.MSELoss(reduction='sum')
复制代码

损失函数用均方差，这次没有去实现而是使用 pytorch 提供损失函数，在 pytorch 包已经将当前主流的损失函数都一一实现

y_pred = model(xx)
复制代码

前向传播: 将 x 传入到模型中后，计算预测值 y，因为在 python 中 Module 对象 call 方法实现，所以在调用一个类的实例时，就像调用一个函数一样，所以将输入传入 model 后就可以经过 2 层网网络得到预测值

loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())
复制代码

将模型预测值和真实值作为输入到损失函数中计算损失值，然后将其输出，因为返回也是一个 tensor 所以需要 loss.item()

model.zero_grad()
复制代码

在反向传播过程中，计算梯度之前需要先将梯度设置为 0

loss.backward()
复制代码

反向传播: 计算模型中可以学习参数相对与损失函数的梯度，调用这个函数对于所有 requires_grad=True 的 Tensor 计算其梯度

with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad
复制代码

接下来的工作就是用计算的梯度来更新权重，每个参数都是 Tensor，这些参数可以通过调用 model.parameters() 获得然后将参数用其梯度和学习率的乘积来进行更新

完整代码

# -*- coding: utf-8 -*-
import torch
import math


# 创建 input 和 outputs 的 tensor
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# 在此示例中，y 是 x, x^2, x^3 线性组合，所以可以定义线性神经网络层来模拟计算，将每一个输入 x 值变为一个向量 x, x^2, x^3, 也就是分别对输入 x 取 (1,2,3) 幂，得到一个 3 维 tensor (x, x^2, x^3)
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)


# 上面代码可能大家看起来有点陌生，有关对 tensor 形状变换还不熟悉可以看看这个视频《》，unsqueeze 就是 tensor 
# 添加一个维度，将 (2000,) tensor 变形为 (2000,1) 的 tensor。p 是形状(3,1) 的 tensor, x 与 p 幂计算时
# x 通过广播变为 (2000,3) 


# 接下来定义 nn.Sequential 容器作为模型，nn.Sequential 类似容器，在容器里面可以一层一层叠加层
# 这里我们用了 2 层网络结构，第一层是对输入进行线性变换，三个神经元对应输入 tensor 三个维度 a,b 和 c 权重
# 和一个偏执，最后接一个 flatten 将 3 维向量变为 1 维输出，所以将神经网络输出一个 1 维的 tensor 对应 y
model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)


# 损失函数用均方差，这次没有去实现而是使用 pytorch 提供损失函数，在 pytorch 包已经将当前主流的损失函数都一一实现
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for t in range(2000):

    # 前向传播: 将 x 传入到模型中后，计算预测值 y，因为在 python 中 Module 对象 __call__ 方法实现，所以
    # 在调用一个类的实例时，就像调用一个函数一样，所以将输入传入 model 后就可以经过 2 层网网络得到预测值
    y_pred = model(xx)
接下来的工作就是用计算的梯度来更新权重，每个参数都是 Tensor，这些参数可以通过调用 `model.parameters()` 获得为输入到损失函数中计算损失值，然后将其输出，因为返回也是一个 tensor 所以需要 `loss.item()`
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())


    # 在反向传播过程中，计算梯度之前需要先将梯度设置为 0
    model.zero_grad()

    # 反向传播: 计算模型中可以学习参数相对与损失函数的梯度，调用这个函数对于所有 `requires_grad=True` 的 Tensor 计算其梯度
    loss.backward()


    # 接下来的工作就是用计算的梯度来更新权重，每个参数都是 Tensor，这些参数可以通过调用 `model.parameters()` 获得
    # 然后将参数用其梯度和学习率的乘积来进行更新
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

# 通过 `model` 的第一层权重就是我们要求解参数
linear_layer = model[0]


print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')
复制代码

优化器

到此为止，通过用 torch.no_grad( )手动更新那些可学习的参数(模型的权重)的 Tensors。对于像随机梯度下降这样的简单优化算法来说，实现起来并不难，但在实际项目，我们还会用到 AdaGrad、RMSProp、Adam 等更复杂的优化器来训练神经网络更新参数。PyTorch 中的 optim 模块提供了对当下主流的优化算法进行了实现。

# -*- coding: utf-8 -*-
import torch
import math


# 创建 input 和 outputs 的 tensor
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)


p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)


model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)
loss_fn = torch.nn.MSELoss(reduction='sum')



# 有关有偿优化器，不同优化器有哪些特点这些在《》视频都给大家分享过，在贴心的 pytorch 自然少不了这样功能
# 所谓优化器就是为了让在训练过程如何将梯度更新到参数，也就是一个一个具体更新参数的策略，让训练可以快速地收敛
# 函数全局最小值。pytorch 大部分常用的优化器都给予预先实现
learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
for t in range(2000):
    
    y_pred = model(xx)

    
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())


    # 在开始反向传播之前，使用优化器，对将要更新的变量（也就是模型的可学习的权重）的所有梯度设置为 0。
    # 这是因为默认情况下，每当调用.backward()时，梯度都会在缓冲区中累积（而不会被覆盖）。
    optimizer.zero_grad()



    # 反向传播: 对模型每个参数计算其梯度
    loss.backward()

    # 调用优化器 step 函数来更新模型中可学习的参数
    optimizer.step()


linear_layer = model[0]
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

复制代码

自定义模型

大部分时候，要定义的模块要比上面序列模型复杂得多，对于这些情况，可以通过继承 nn.Module 来定义自定义的模块。

# -*- coding: utf-8 -*-
import torch
import math


class Polynomial3(torch.nn.Module):
    def __init__(self):
        """
        在构造函数中 创建了 4 个参数，并将分配给类的成员变量。
        """
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))

    def forward(self, x):
        """
        在前向传播函数中，接受一个输入数据的 tensor，返回一个输出数据的 tensor。可以使用构造函数中定义的模块（网络结构）以及以及张量上的任意运算符。
        """
        return self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3

    def string(self):
        """
        就像 Python 中的任何类一样，您也可以在 PyTorch 模块上定义自定义方法
        """
        return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3'


# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# Construct our model by instantiating the class defined above
model = Polynomial3()

# 定义损失函数和优化器
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-6)
for t in range(2000):
    y_pred = model(x)

    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())


    # 在更新梯度前将梯度设置为 0 然后运行反向传播 backward 来计算梯度，最后用 step 来更新参数
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {model.string()}')
复制代码

流程控制和共享权重

为了解释动态图和权重共享的一个例子，这里实现了模型有点牵强：一个4-5阶的多项式。对于模型的前向传播传递，我们随机选择 4，5 并然后之前的3 阶项式基础上添加 4 阶多项式或者添加 4 和 5 阶多项式.这是因为 pytorch 为前向传播都构建了一个动态计算图， Python 的控制流操作符，如循环或条件语句，动态控制动态图结构.这里我们也看到，在定义计算图时，多次重复使用同一个参数是完全安全的。在定义一个计算图时，多次重复使用同一个参数是完全安全的。

# -*- coding: utf-8 -*-
import random
import torch
import math


class DynamicNet(torch.nn.Module):
    def __init__(self):

        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))
        self.e = torch.nn.Parameter(torch.randn(()))

    def forward(self, x):
        """
    
        对于模型的前向传播传递，我们随机选择 4，5 并然后之前的3 阶项式基础上添加 4 阶多项式或者添加 4 和 5 阶多项式
        这是因为 pytorch 为前向传播都构建了一个动态计算图， Python 的控制流操作符，如循环或条件语句，动态控制动态图结构
        这里我们也看到，在定义计算图时，多次重复使用同一个参数是完全安全的。在定义一个计算图时，多次重复使用同一个参数是完全安全的。
        """
        y = self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3
        for exp in range(4, random.randint(4, 6)):
            y = y + self.e * x ** exp
        return y

    def string(self):
 
        return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3 + {self.e.item()} x^4 ? + {self.e.item()} x^5 ?'


x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

model = DynamicNet()


criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-8, momentum=0.9)
for t in range(30000):
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 2000 == 1999:
        print(t, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {model.string()}')
复制代码