DVSNet:动态视频语义分割网络

介绍

动态视频语义分割网络(DVSNet)框架的出现是为了在语义视频分割的质量和效率之间取得平衡。DVSNet框架由两个卷积神经网络组成:分割网络(例如DeepLabv2)和光流网络(例如FlowNet2)。前者产生高度准确的语义分割,但更深入和更慢。后者比前者快得多,但其输出需要进一步处理以产生较为准确的语义标准。DVSNet利用决策网络(DN)来确定哪些帧区域应根据称为预期置信度得分的度量进一步处理。DN的使用提出了自适应关键帧调度策略,以在运行时自适应调整关键帧的更新周期。

演示视频(需梯子)
图片

声明

这是基于Tensorflow的DVSNet的修改实现。请注意与原始实现有一些差异:(1)计算fps时包括数据I / O和图像预处理时间。(2)在原始实现中使用NHWC数据格式而不是NCHW数据格式。这些差异导致fps比文中报道的低。

软件准备要求

Checkpoint

创建 checkpoint 目录并从Google Drive获取存储的checkpoint。
checkpoint是整体发布的(DVSNet)。
没有决策网络(微调)的checkpoint。

在python 2.7中安装pip

pip install tensorflow-gpu==1.4.1  # for Python 2.7 and GPU
pip install opencv-python
pip install Pillow
pip install scipy

推理测试

要获取视频帧的标注结果:

python inference.py --data-dir=cityscape_video_dir --data-list=cityscape_video_list

参数列表:

--data_dir:      Path to the directory containing the dataset.
--data_list:     Path to the file listing the images in the dataset.
--restore_from:  Where restore model parameters from.
--decision_from: Where restore decision model parameters from (default same as restore_from).
--save_dir:      Where to save segmented output.
--num_steps:     Number of images in the video.
--overlap:       Overlapping size which must be dividable by 8.
--target:        Confidence score threshold.
--dynamic:       Whether to dynamically adjust target

推理时间包括数据I / O和图像预处理的时间:
0.1〜0.05s(10〜20fps )使用英特尔至强E5-2620 CPU和NVIDIA GTX 1080 Ti GPU

培养训练

cd train/

第1步

为训练决策网络生成测试用例(X =光流特征,Y =置信度分数):

python gentestcase.py --data-dir=cityscape_dir --data-list=cityscape_list

参数列表:

--data_dir:     Path to the directory containing the dataset.
--data_list:    Path to the file listing the images in the dataset.
--restore_from: Where restore finetune(segmentation + flow) model parameters from.
--save_dir:     Where to save testcases.
--num_steps:    Number of generates testcases.
--clip:         Trim extreme testcases.

第2步

训练决策网络:

python train.py --train-data-dir=train_testcase_dir --val-data-dir=val_testcase_dir

参数列表:

--train_data_dir: Path to the training testcases.
--val_data_dir:   Path to the validation testcases.
--save_dir:       Where to save decision model.
--batch_size:     Number of testcases sent to the network in one step.
--learning_rate:  Learning rate for training.
--epochs:         Number of epochs.
--decay:          Learning rate decay.

引文

@inproceedings{xu2018dvsnet,
    author = {Yu-Shuan Xu and Hsuan-Kung Yang and Tsu-Jui Fu and Chun-Yi Lee},
    title = {Dynamic Video Segmentation Network},
    booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = 2018
}

@article{chen2017deeplab,
    author = {L.-C. Chen and G. Papandreou and I. Kokkinos and K. Murphy and A. L. Yuille},
    title = {Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected {CRFs}},
    journal = {IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI)},
    year = 2017
}
@inproceedings{ilg2017flownet2,
    author = {E. Ilg and N. Mayer and T. Saikia and M. Keuper and A. Dosovitskiy and T. Brox},
    title = {FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks},
    booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = 2017
}

参考代码

DeepLabv2 tensorflow模型:tensorflow-deeplab-resnet
FlowNet2 tensorflow模型:flownet2-tf

https://github.com/XUSean0118/DVSNet

转载请注明:《DVSNet:动态视频语义分割网络

发表评论