[1]李亚玮,金立左,孙长银,等.基于光流约束自编码器的动作识别[J].东南大学学报(自然科学版),2017,47(4):691-696.[doi:10.3969/j.issn.1001-0505.2017.04.011]
 Li Yawei,Jin Lizuo,Sun Changyin,et al.Action recognition based on optical flow constrained auto-encoder[J].Journal of Southeast University (Natural Science Edition),2017,47(4):691-696.[doi:10.3969/j.issn.1001-0505.2017.04.011]
点击复制

基于光流约束自编码器的动作识别()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
47
期数:
2017年第4期
页码:
691-696
栏目:
自动化
出版日期:
2017-07-20

文章信息/Info

Title:
Action recognition based on optical flow constrained auto-encoder
作者:
李亚玮1金立左1孙长银1崔桐2
1东南大学自动化学院, 南京 210096; 2中国电科集团28所, 南京 210007
Author(s):
Li Yawei1 Jin Lizuo1 Sun Changyin1 Cui Tong2
1School of Automation, Southeast University, Nanjing 210096, China
2The 28th Research Institute of CETC, Nanjing 210007, China
关键词:
动作识别 特征学习 正则化自编码器 光流约束自编码器
Keywords:
action recognition feature learning regularized auto-encoder optical flow constrained auto-encoder
分类号:
TP181
DOI:
10.3969/j.issn.1001-0505.2017.04.011
摘要:
为了改进特征学习在提取目标运动方向及运动幅度等方面的能力,提高动作识别精度,提出一种基于光流约束自编码器的动作特征学习算法.该算法是一种基于单层正则化自编码器的无监督特征学习算法,使用神经网络重构视频像素并将对应的运动光流作为正则化项.该神经网络在学习动作外观信息的同时能够编码物体的运动信息,生成联合编码动作特征.在多个标准动作数据集上的实验结果表明,光流约束自编码器能有效提取目标的运动部分,增加动作特征的判别能力,在相同的动作识别框架下该算法超越了经典的单层动作特征学习算法.
Abstract:
To improve the capability of feature learning in extracting motion information such as amplitudes and directions and to increase the recognition accuracy, an optical flow constrained auto-encoder is proposed to learn action features. The optical flow constrained auto-encoder is an unsupervised feature learning algorithm based on single layer regularized auto-encoder. The algorithm uses the neural network to reconstruct the video pixels and use the corresponding optical flows in video blocks as a revised regularization. The neural network learns the appearances of the action and encodes the motion information simultaneously. The associated codes are used as the final action features. The experimental results on several well-known benchmark datasets show that the optical flow constrained auto-encoder can detect the motion parts efficiently. On the same recognition framework, the proposed algorithm outperforms the state-of-the-art single layer action feature learning algorithms.

参考文献/References:

[1] Aggarwal J K, Ryoo M S. Human activity analysis[J]. ACM Computing Surveys, 2011, 43(3): 1-43. DOI:10.1145/1922649.1922653.
[2] Poppe R. A survey on vision-based human action recognition[J]. Image and Vision Computing, 2010, 28(6): 976-990. DOI:10.1016/j.imavis.2009.11.014.
[3] Weinland D, Ronfard R, Boyer E. A survey of vision-based methods for action representation, segmentation and recognition[J]. Computer Vision and Image Understanding, 2011, 115(2): 224-241. DOI:10.1016/j.cviu.2010.10.002.
[4] Laptev I. On space-time interest points[J]. International Journal of Computer Vision, 2005, 64(2): 107-123. DOI:10.1007/s11263-005-1838-7.
[5] Dollár P, Rabaud V, Cottrell G, et al. Behavior recognition via sparse spatio-temporal features [C]//IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. Beijing, China, 2005: 65-72. DOI:10.1109/vspets.2005.1570899.
[6] Willems G, Tuytelaars T, van Gool L. An efficient dense and scale-invariant spatio-temporal interest point detector [C]//European Conference on Computer Vision. Marseille, France, 2008: 650-663. DOI:10.1007/978-3-540-88688-4-48.
[7] Wong S F, Cipolla R. Extracting spatiotemporal interest points using global information [C]//IEEE International Conference on Computer Vision. Rio de Janeiro, Brazil, 2007: 1-8. DOI:10.1109/iccv.2007.4408923.
[8] Oikonomopoulos A, Patras I, Pantic M. Spatiotemporal salient points for visual recognition of human actions [J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B(Cybernetics), 2005, 36(3): 710-719. DOI:10.1109/tsmcb.2005.861864.
[9] Wang H,Schmid C. Action recognition with improved trajectories [C]//IEEE International Conference on Computer Vision. Sydney, Australia, 2013: 3551-3558. DOI:10.1109/iccv.2013.441.
[10] Bengio Y, Courville A, Vincent P. Representation learning: A review and new perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1798-1828. DOI:10.1109/TPAMI.2013.50.
[11] Liu L, Shao L, Li X, et al. Learning spatio-temporal representations for action recognition: A genetic programming approach [J]. IEEE Transactions on Cybernetics, 2016, 46(1): 158-170. DOI:10.1109/TCYB.2015.2399172.
[12] Le Q V, Zou W Y, Yeung S Y, et al. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis [C]//IEEE Computer Vision and Pattern Recognition. Colorado Springs, USA, 2011: 3361-3368. DOI:10.1109/cvpr.2011.5995496.
[13] Zhang Z, Tao D. Slow feature analysis for human action recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(3): 436-450. DOI:10.1109/TPAMI.2011.157.
[14] Sun L, Jia K, Chan T H, et al. DL-SFA: Deeply-learned slow feature analysis for action recognition [C]//IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014: 2625-2632. DOI:10.1109/cvpr.2014.336.
[15] Bengio Y. Learning deep architectures for AI[J]. Foundations & Trends in Machine Learning, 2009, 2(1):1-55. DOI: 10.1561/2200000006.
[16] Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]//IEEE conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014: 1725-1732. DOI:10.1109/cvpr.2014.223.
[17] Wang X, Wang L M, Qiao Y. A comparative study of encoding, pooling and normalization methods for action recognition [C]//Asian Conference on Computer Vision. Tokyo, Japan, 2012: 572-585. DOI:10.1007/978-3-642-37431-9-44.
[18] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507. DOI:10.1126/science.1127647.
[19] Natarajan B K. Sparse approximate solutions to linear systems[J]. Siam Journal on Computing, 1995, 24(2):227-234. DOI: 10.1137/S0097539792240406.
[20] Fortun D, Bouthemy P, Kervrann C. Optical flow modeling and computation: A survey[J]. Computer Vision and Image Understanding, 2015, 134: 1-21. DOI:10.1016/j.cviu.2015.02.008.
[21] Sánchez J, Perronnin F, Mensink T, et al. Image classification with the Fisher vector: Theory and practice[J].International Journal of Computer Vision, 2013, 105(3): 222-245. DOI:10.1007/s11263-013-0636-x.

相似文献/References:

[1]胡斐,罗立民,刘佳,等.基于时空兴趣点和主题模型的动作识别[J].东南大学学报(自然科学版),2011,41(5):962.[doi:10.3969/j.issn.1001-0505.2011.05.013]
 Hu Fei,Luo Limin,Liu Jia,et al.Action recognition based on space-time interest points and topic model[J].Journal of Southeast University (Natural Science Edition),2011,41(4):962.[doi:10.3969/j.issn.1001-0505.2011.05.013]
[2]王健弘,张旭,章品正,等.基于时空信息和非负成分表示的动作识别[J].东南大学学报(自然科学版),2016,46(4):675.[doi:10.3969/j.issn.1001-0505.2016.04.001]
 Wang Jianhong,Zhang Xu,Zhang Pinzheng,et al.Action recognition based on spatio-temporal information and nonnegative component representation[J].Journal of Southeast University (Natural Science Edition),2016,46(4):675.[doi:10.3969/j.issn.1001-0505.2016.04.001]

备注/Memo

备注/Memo:
收稿日期: 2016-11-06.
作者简介: 李亚玮(1987—),男,博士生;金立左(联系人),男,博士,副教授, jinlizuo@qq.com.
基金项目: 国家自然科学基金资助项目(61402426).
引用本文: 李亚玮,金立左,孙长银,等.基于光流约束自编码器的动作识别[J].东南大学学报(自然科学版),2017,47(4):691-696. DOI:10.3969/j.issn.1001-0505.2017.04.011.
更新日期/Last Update: 2017-07-20