ENet

1. ENet

ENet 是一个注重推理速度的 semantic segmentation 模型

ENet 做为一个 semantic segmentation 模型, 基本就是一个标准的 encoder-decoder 结构, 并且参考了 ResNet 和 Inception 的设计

假设网络的输入为 512x512.

输入数据为 (3, 512, 512), 通过 initial 层后变为 (16, 256, 256), 这里没有直接做 pooling 而是参考了 inception 的 Efficient Grid Size Reduction

bottleneck 类似于 resnet 的 bottleneck 结构, 做了一点修改:

bottleneck 可以有不同的 type:

downsampling

bottleneck 第一个 1x1 conv2 变成 2x2, stride=2, 且左边分支加上一个 maxpooling, 用来做一个 2x 的 downsample, 这种结构和 initial 的并行 pooling 结构类似
dilated

bottleneck 中间的 conv 变成 Dilated Conv2D
asymmetric

bottleneck 中间的 conv 变成 Asymmetric Conv, 例如 5x5 kernel 变 5x1 和 1x5 两个kernel, 在保持 receptive field 大小不变的基础上降低了计算量. 参考了 inception
upsampling

和 downsampling 相反: bottleneck 中间的 conv 变成 Deconv2D, 同时左边分支加一个 max_unpooling

upsamle 主要有四种方法:

interpolation
deconv
unpooling

unpooling 是对原值重复或补零,以填充出更多数据, 例如 keras 的 UpSampling2D 就是通过重复来做 upsample
max unpooling

如果前面是通过 max pooling 做的 downsample, 则后面可以用 max unpooling 做 upsample, 它与普通 unpooling 的区别是它会利用之前做 max pooling 时的索引

Deconv2D (CNN > Deconv2D): Deconv2D 是一种 Upsample 的手段, 被用在大部分 Semantic Segmentation 模型中

ICNet (ICNet > Network > train): 1. 第一层 1/4 scale 的图片输出为 1/32, 在 CFF 层经过 2x upsample 与 1/16 的 label 计算 loss

PSPNet (PSPNet > Network > pyramid pooling module): 3. 把每个 1x1 conv 的输出做 upsample 变成和 backbone 相同的输出尺寸, 然后和 backbone concat

ICNet (ICNet): icenet 是一个 semantic segmentation 模型, 和 ENet 一样注重推理速度.

Semantic Segmentation (Semantic Segmentation > ENet): ENet