卷积神经网络是各种深度神经网络中应用很广泛的一种，在视觉的很多问题上都取得了较好的效果，另外，它在自然语言处理，计算机图形学等领域也有成功的应用。

LeNet 是当前各种深度卷积神经网络的鼻祖，它是Yann LeCun于1989年首次提出“卷积”，也是最早推动深度学习领域发展的卷积神经网络之一，多次成功迭代之后被命名为 LeNet5，当时 LeNet 框架主要用于字符识别任务。

在早期传统的神经网络中，例如BP神经网络，人们通常依赖专家经验设计输入特征向量。在过去几十年的经验来看，通过人工找的特征并不一定有效，其中早期人脸识别算法普遍采用了人工特征与分类器结合的思路，人工特征的巅峰之作是出自CVPR 2013年MSRA的”Blessing of Dimisionality: High Dimensional Feature and Its Efficient Compression for Face Verification”，一篇关于如何使用高维度特征在人脸验证中的文章。分类器有成熟的方案，如神经网络，支持向量机，贝叶斯等。看起来很有前景，但是我们的专家有限，计算能力也不足以支持高维特征计算需求。

我们先做个简单直观方案，假如任何特征都是从图像中提取的，把整幅图像作为特征来训练神经网络，单说这数据量非常惊人！对于这种情况，我们还得做降低维数处理，于是，卷积概念被引入。

自2012年，深度学习在ILSVRC-2012大放异彩以后，卷积神经网络在图像分类中能力得到展现和应用，通过学习得到的卷积核明显优于人工设计的特征与分类器结合的方案。在人脸识别的研究者利用卷积神经网络（CNN）对海量的人脸图片进行学习，然后对输入图像提取出对区分不同人的脸有用的特征向量，替代人工设计的特征。

DeepFace是CVPR2014上由Facebook提出的方法，是深度卷积神经网络在人脸识别领域的奠基之作，使用的是经典的交叉熵损失函数（Softmax）进行问题优化，最后通过特征嵌入（Feature Embedding）得到固定长度的人脸特征向量。DeepFace在LFW上取得了97.35%的准确率，已经接近了人类的水平。相比于1997年那篇基于卷积神经网络的40个人400张图的数据规模，Facebook搜集了4000个人400万张图片进行模型训练，也许我们能得出一个结论：大数据人工智能取得了成功！

1.卷积基本原理

在泛函分析中，卷积是一种函数的定义。它是通过两个函数f和g生成第三个函数的一种数学算子，表征函数f与g经过翻转和平移的重叠部分的面积。

卷积的数学定义是这样的：
在这里插入图片描述
在计算机视觉领域，卷积核、滤波器通常为较小尺寸的矩阵，比如3×3、5×5等，数字图像是相对较大尺寸的2维张量，图像卷积运算与相关运算的关系如下图所示。其中F为滤波器，x为输入图像，O输出为结果。
在这里插入图片描述

离散后的卷积计算公式如下：

从左到右，逐行扫描，对应位置相乘累加形成如下图右侧的结果。
在这里插入图片描述
按照离散后的卷积计算公式，我们使用程序代码实践卷积滤波理论和感受滤波效果。

import skimage.color
import skimage.io
import numpy
import numpy as np
import matplotlib
import sys

def conv_(img, conv_filter):
    filter_size = conv_filter.shape[0]

    result = numpy.zeros((img.shape))
    padding_size = 1
    #在输入图像中循环应用卷积操作.

    #滤波图像行的取值范围
    for row in range(0, img.shape[0]-filter_size + padding_size):
        for col in range(0, img.shape[1]-filter_size + padding_size):
            #按滤波器尺寸获取筛选区域.
            curr_region = img[row:row+filter_size, col:col+filter_size]
            #筛选区域与滤波器相乘.
            curr_result = curr_region * conv_filter
            conv_sum = numpy.sum(curr_result) #相乘后求和
            result[row + padding_size, col + padding_size] = conv_sum #卷积层特征映射中求和的保存
            
    #剪裁结果矩阵.
    final_result = result[padding_size:result.shape[0]-padding_size,padding_size:result.shape[1]-padding_size]
    return final_result

def conv(img, conv_filter):
    if len(img.shape) > 2 or len(conv_filter.shape) > 3: # 检查图像通道数与滤波器通道数是否一致.
        if img.shape[-1] != conv_filter.shape[-1]:
            print("Error: Number of channels in both image and filter must match.")
            sys.exit()
    if conv_filter.shape[1] != conv_filter.shape[2]: # Check if filter dimensions are equal.
        print('Error: Filter must be a square matrix. I.e. number of rows and columns must match.')
        sys.exit()
    if conv_filter.shape[1]%2==0: # Check if filter diemnsions are odd.
        print('Error: Filter must have an odd size. I.e. number of rows and columns must be odd.')
        sys.exit()

    # 初始化输出特征图为0，大小尺寸与输入图形一致
    feature_maps = numpy.zeros((img.shape[0]-conv_filter.shape[1]+1, 
                                img.shape[1]-conv_filter.shape[1]+1, 
                                conv_filter.shape[0]))

    # 使用滤波器进行卷积操作.
    for filter_num in range(conv_filter.shape[0]):
        print("Filter ", filter_num + 1)
        curr_filter = conv_filter[filter_num, :] # getting a filter from the bank.
        """ 
        Checking if there are mutliple channels for the single filter.
        If so, then each channel will convolve the image.
        The result of all convolutions are summed to return a single feature map.
        """
        if len(curr_filter.shape) > 2:
            conv_map = conv_(img[:, :, 0], curr_filter[:, :, 0]) 
            for ch_num in range(1, curr_filter.shape[-1]): 
                conv_map = conv_map + conv_(img[:, :, ch_num], curr_filter[:, :, ch_num])
        else: # There is just a single channel in the filter.
            conv_map = conv_(img, curr_filter)
        feature_maps[:, :, filter_num] = conv_map
    return feature_maps 
    
img = skimage.io.imread("timg1.jpg")
img = skimage.color.rgb2gray(img)

l1_filter = numpy.zeros((2,3,3))  #定义卷积核
l1_filter[0, :, :] = numpy.array([[[-1, 0, 1], 
                                   [-1, 0, 1], 
                                   [-1, 0, 1]]])
l1_filter[1, :, :] = numpy.array([[[1,   1,  1], 
                                   [0,   0,  0], 
                                   [-1, -1, -1]]])

l1_feature_map = conv(img, l1_filter)

# 画出结果图
fig0, ax0 = matplotlib.pyplot.subplots(nrows=1, ncols=1)
ax0.imshow(img).set_cmap("gray")
ax0.set_title("Input Image")
matplotlib.pyplot.savefig("in_img.png", bbox_inches="tight")
matplotlib.pyplot.close(fig0)

fig1, ax1 = matplotlib.pyplot.subplots(nrows=1, ncols=2)
ax1[0].imshow(l1_feature_map[:, :, 0]).set_cmap("gray")
ax1[0].set_title("Map1")

ax1[1].imshow(l1_feature_map[:, :, 1]).set_cmap("gray")
ax1[1].set_title("Map2")

matplotlib.pyplot.savefig("Out_Map.png", bbox_inches="tight")
matplotlib.pyplot.close(fig1)matplotlib.pyplot.close(fig1)

复制代码

经程序滤波处理的图像，效果如下图所示，抓出轮廓特征。
在这里插入图片描述

2. 卷积核

卷积核就是在图像处理时，对于给定输入图像，对输入图像中一块区域的像素进行加权处理后，成为输出图像中的对应像素，其中权值由一个函数定义，这个函数称为卷积核。
卷积核一般也叫成滤波器，其设计一般有以下几个原则：

卷积核形状一般是奇数的。如3×3，5×5，7×7，这样能保证一定会有个中心点（像数是最小单元，不存在子像数概念），主要是从像素编码的角度看，奇数似乎正好强调了此像素点，而偶数就会引起平衡或者抵消，消除此点的特点。3×3 是最小的能够捕获像素八邻域信息的尺寸。
卷积核的各个元素值一般相加等于1，这样做的原因是保证原图像经过卷积核的作用亮度保持不变(但该原则不是必须)。
在图9-3 卷积计算过程示意图中，6×7输入二维矩阵，经过卷积核滤波处理输出结果是4×5矩阵，四周边缘为padding，如果要使卷积操作后，图像大小不变，通常是四周补上“0”。
在这里插入图片描述
卷积操作会使图像变小，为了图像卷积后大小不变，需要填充0。在tensorflow中，如果padding=‘same’,按前面实践效果，很容易让人误解，卷积后矩阵的大小是不变的，而实际大小公式如下：
设输入图像高为Hin，宽为Win；卷积核高为HF，宽为WF，输出结果高位Hout，宽为Wout，stride为步长。
在这里插入图片描述
当stride=1时，padding = ‘same’卷积操作后图片大小不变，需要填充0,；padding=‘valid’情况下，图片大小变为Hout，Wout，结果向上取整。
当stride不为1时，Wout=Win/stride， Hout=Hin/stride （结果向上取整），对图片卷积后还是会变小。
卷积核使用多大的合适呢？目前业界的经验是3×3，就有很好的效果和性能。

关于卷积核，早期观点是卷积核越大，AlexNet中用到了一些非常大的卷积核，比如11×11、5×5卷积核，receptive field（感受野）越大，看到的图片信息越多，因此获得的特征越好。但是大的卷积核会导致计算量的暴增，不利于模型深度的增加，计算性能也会降低。于是在VGG（最早使用）、Inception网络中，利用2个3×3卷积核的组合比1个5×5卷积核的效果更佳，同时参数量（3×3×2+1 与 5×5×1+1对比）被降低，因此后来3×3卷积核被广泛应用在各种模型中。

3.卷积与图像

对图像的滤波处理就是对图像应用一个小小的卷积核，突显我们所需要的图像特征，那这个小小的卷积核到底有哪些魔法，能刻画图像特征。接下来我们一起来领略下一些简单但不简单的卷积核的魔法。
如下代码是基于Tensorflow卷积函数tf.nn.conv2d，使用不同卷积核滤波图像。

import tensorflow as tf
import skimage.color
import skimage.io
import numpy as np
import matplotlib

img = skimage.io.imread("timg1.jpg")
img = skimage.color.rgb2gray(img)

img1 = img #img.flatten() #转为一维数组

l1_filter1 = np.array([[-1, 0, 1], 
                       [-1, 0, 1], 
                       [-1, 0, 1]])

l1_filter1 = l1_filter1.astype("float64")
x_image = tf.reshape(img1, [1,420,300,1])
W_filter = tf.reshape(l1_filter1, [3,3,1,1])

input_tensor = tf.Variable(x_image,  name='input')
input_weight = tf.Variable(W_filter, dtype="float64",  name='weight')

op = tf.nn.conv2d(input_tensor,input_weight, strides=[1, 1, 1, 1], padding='SAME')

init = tf.initialize_all_variables() 
with tf.Session() as sess:
    sess.run(init)
    feature_map = sess.run(op)

# 画出结果图
fig0, ax0 = matplotlib.pyplot.subplots(nrows=1, ncols=1)
ax0.imshow(img).set_cmap("gray")
ax0.set_title("Input Image")
matplotlib.pyplot.savefig("in_img1.png", bbox_inches="tight")
matplotlib.pyplot.close(fig0)

fig1, ax1 = matplotlib.pyplot.subplots(nrows=1, ncols=1)
ax1.imshow(feature_map[0,:, :, 0]).set_cmap("gray")
ax1.set_title("Map1")
matplotlib.pyplot.savefig("Out_Map1.png", bbox_inches="tight")
matplotlib.pyplot.close(fig1)
复制代码