1. 引入
深度学习模型,大都是多层的网络,各个层可能各有不同(Dense, Dropout, Flatten, Activation, BatchNormalization, GlobalAveragePooling2D,Conv2D, MaxPooling2D, ZeroPadding2D,LSTM)。
有时候我们需要获取多层网络中某一层的输出值,用于做可视化,或者Embedding。
下面就以一个例子为例说明如何获取神经网络某一层的输出值。
2. 构建网络,各层加上name
本文构建的多层网络模型如下:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras_self_attention import SeqSelfAttention
import keras
import numpy as np
num_classes = 6 # classification number
x_train = np.random.randn(100, 15, 20, 3) # x_train.shape = (100, 15, 20, 3)
y_train = np.random.randint(1, size=(100,num_classes)) # y_train.shape = (100, 6)
input_shape = x_train.shape[-3:]# (15, 20, 3)
model = Sequential()
model.add(Conv2D(32, (2,11), activation='relu', padding='same', input_shape=input_shape, name='layer_conv_1'))
model.add(Conv2D(32, (2,11), activation='relu', padding='same', name='layer_conv_2'))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 1), name='layer_mp_1'))
model.add(Conv2D(128, (2,7), activation='relu', padding='same', name='layer_conv_3'))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 1), name='layer_mp_2'))
model.add( keras.layers.Reshape((48,128), name='layer_rsp_1') )
model.add( SeqSelfAttention( attention_type=SeqSelfAttention.ATTENTION_TYPE_MUL, name='layer_attention_1') )
model.add(Flatten(name='layer_flatten_1'))
model.add(Dense(440, activation='relu', name='layer_dense_1'))
model.add(Dropout(0.5, name='layer_dropout_1'))
model.add(Dense(num_classes, activation='softmax', name='layer_dense_2'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()
复制代码
这里注意几点:
- 要在模型每一层都加入name属性,比如(name=’layer_conv_1’)
- 模型的训练数据是用numpy生成的随机数,具体参见注释
- 使用的版本:tensorflow==2.4.0,keras==2.4.3,python==3.7
模型构建完成后,参数如下所示
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
layer_conv_1 (Conv2D) (None, 15, 20, 32) 2144
_________________________________________________________________
layer_conv_2 (Conv2D) (None, 15, 20, 32) 22560
_________________________________________________________________
layer_mp_1 (MaxPooling2D) (None, 7, 18, 32) 0
_________________________________________________________________
layer_conv_3 (Conv2D) (None, 7, 18, 128) 57472
_________________________________________________________________
layer_mp_2 (MaxPooling2D) (None, 3, 16, 128) 0
_________________________________________________________________
layer_rsp_1 (Reshape) (None, 48, 128) 0
_________________________________________________________________
layer_attention_1 (SeqSelfAt (None, 48, 128) 16385
_________________________________________________________________
layer_flatten_1 (Flatten) (None, 6144) 0
_________________________________________________________________
layer_dense_1 (Dense) (None, 440) 2703800
_________________________________________________________________
layer_dropout_1 (Dropout) (None, 440) 0
_________________________________________________________________
layer_dense_2 (Dense) (None, 6) 2646
=================================================================
Total params: 2,805,007
Trainable params: 2,805,007
Non-trainable params: 0
_________________________________________________________________
复制代码
3. 获取中间层的输出
- 模型构建完成后,需要先训练模型,这样才能得到各层的权重:
model.fit(x_train, y_train,batch_size=64,epochs=2,verbose=1)
复制代码
- 模型训练完成后,指定好要输出的中间层名字,并建立从输入到输出的
函数对象
:
import keras.backend as k
layer_name = 'layer_conv_3'
layer_output = model.get_layer(layer_name).output # get output by layer name
layer_input = model.input
output_func = k.function([layer_input], [layer_output]) # construct function
复制代码
这里设置layer_name为layer_conv_3,表示我们想获取name为layer_conv_3的这一层的输出值。
- 获取输入输出的数据维度
(1)原始输入数据,有100个样本,每个样本都是三维的。x_train.shape = (100, 15, 20, 3)
x_preproc = x_train
复制代码
(2)中间层layer_conv_3的输出数据维度
output_shape = output_func([x_preproc[0][None, ...]])[0].shape
print(output_shape)# (1, 7, 18, 128)
复制代码
获取到的中间层的输出维度,是(1, 7, 18, 128)。从第2部分中模型参数输出中可知,这个数据正好就是和(None, 7, 18, 128) 匹配的,只是这里能用代码动态获取到这个值而已。
(3)输入100个样本,则中间层layer_conv_3对应这100个样本的输出数据维度:
activations = np.zeros((x_preproc.shape[0],) + output_shape[1:], dtype=float32)
print(activations.shape) # (100, 7, 18, 128)
复制代码
最终,activations是一个全零的array,这就是存储中间层最终输出值的地方。
- 把x_train送入模型,获取最终中间层的输出值
batch_size = 8
for batch_index in range(int(np.ceil(x_preproc.shape[0] / float(batch_size)))):
begin, end = batch_index * batch_size, min((batch_index + 1) * batch_size, x_preproc.shape[0])
activations[begin:end] = output_func([x_preproc[begin:end]])[0]
复制代码
这里按照各个batch_size来迭代,依次把x_train的一部分送入函数对象
,获取这个部分对应的中间层输出,多次迭代后就能得到完整的中间层输出的数据了。
- 最终得到的中间层输出数据
最终得到的中间层输出数据存储在activations中,它的shape为(100, 7, 18, 128),数据示意如下:
array([[[[0.26599178, 0.73428845, 0.16254057, ..., 0.06111438,
0.22640981, 0.35272944],
[0.283226 , 0.6876849 , 0.03790125, ..., 0.34651148,
0.10112678, 0.29799798],
[0.54103273, 0.9894899 , 0.10496318, ..., 0.7219487 ,
0.06900553, 0.38379622],
...,
[0.277767 , 0.47766227, 0.19746281, ..., 0.91875714,
0.028616 , 0.41216236],
[0.17200978, 0.47316927, 0.12632905, ..., 0.82960546,
0.12838002, 0.15124908],
[0.2677489 , 0.29046324, 0.16919036, ..., 0.86634046,
0.14427625, 0.09604399]],
...,
[0.17622252, 0.2056456 , 0.13514128, ..., 0.52697086,
0.05512847, 0.45330787],
[0.13311246, 0.13437134, 0.17722939, ..., 0.41641268,
0. , 0.35246032],
[0.03272216, 0.07479057, 0.04990054, ..., 0.29089817,
0.0585143 , 0.17479342]]]], dtype=float32)
复制代码
4. 总结
本文讲解了获取keras模型的中间层输出值的具体步骤,这是参考了文献1中get_activations
函数写出来的,完整的函数请参考文献1。