[1]薛永增,李生,赵铁军,等.树-串句法统计翻译模型的正向贪心解码算法[J].东南大学学报(自然科学版),2007,37(5):803-807.[doi:10.3969/j.issn.1001-0505.2007.05.013]
 Xue Yongzeng,Li Sheng,Zhao Tiejun,et al.Greedy direct decoding algorithm for syntax-based tree-to-string statistical translation model[J].Journal of Southeast University (Natural Science Edition),2007,37(5):803-807.[doi:10.3969/j.issn.1001-0505.2007.05.013]
点击复制

树-串句法统计翻译模型的正向贪心解码算法()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
37
期数:
2007年第5期
页码:
803-807
栏目:
计算机科学与工程
出版日期:
2007-09-20

文章信息/Info

Title:
Greedy direct decoding algorithm for syntax-based tree-to-string statistical translation model
作者:
薛永增 李生 赵铁军 杨沐昀
哈尔滨工业大学语言语音教育部微软重点实验室, 哈尔滨 150001
Author(s):
Xue Yongzeng Li Sheng Zhao Tiejun Yang Muyun
MOE-MS Key Laboratory of Natural Language Processing and Speech, Harbin Institute of Technology, Harbin 150001, China
关键词:
统计机器翻译 句法 贪心 解码
Keywords:
statistical machine translation syntax greedy decoding
分类号:
TP391.2
DOI:
10.3969/j.issn.1001-0505.2007.05.013
摘要:
为了有效利用句法信息指导翻译过程,提出了基于贪心搜索的树-串句法统计翻译模型的正向解码算法.该算法以对数线性模型为整体框架,采用翻译模型概率、语言模型概率和空译文罚分作为特征函数.在解码过程中首先生成初始译文,然后通过遍历句法分析树反复迭代来改进译文.重点研究了解码过程中译文片断的打分方法.实验在IWSLT2004数据集上进行并采用BLEU方法评价翻译结果.实验结果表明正向贪心解码算法在翻译质量和速度上均好于现有的反向解码算法,这说明正向贪心解码算法能够更为有效地利用句法结构信息,更适合于树-串统计翻译模型.
Abstract:
In order to effectively direct the translation process by syntax information, a greedy direct decoding algorithm is proposed for the syntax-based tree-to-string statistical translation model. The log-linear model is adopted as the framework and the feature functions are defined upon the translation model probability, the language model probability and the null translation penalty. The decoder firstly generates the initial translation gloss, and then improves the gloss by iteratively traversing the parse tree. The scoring methods for translation segments are described. The experiment was carried out on IWSLT 2004 data set. The translation results were evaluated by the BLEU metrics. Experimental results show that the greedy direct decoding algorithm gives better results than the current reverse decoding algorithm on translation quality and speed. This means that the greedy direct decoding algorithm can make more efficient use of syntactical information, thus is more suitable for the tree-to-string statistical translation model.

参考文献/References:

[1] Yamada K,Knight K.A syntax-based statistical translation model[C] //Webber B L,Reithinger N,Satta G,eds.Proceedings of the 39th Annual Conference of the Association for Computational Linguistics.Stroudsburg,PA,USA:Association for Computational Linguistics,2001:338-346.
[2] Yamada K.A syntax-based statistical translation model[D].California,USA:University of Southern California,2002.
[3] Gildea D.Loosely tree-based alignment for machine translation[C] //Hinrichs E,Roth D,eds. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA,USA:Association for Computational Linguistics,2003:80-87.
[4] Melamed I.Statistical machine translation by parsing[C] //Scott D,Daelemans W,Walker M,eds.Proceedings of the 42nd Annual Conference of the Association for Computational Linguistics.Stroudsburg,PA,USA:Association for Computational Linguistics,2004:653-660.
[5] Melamed I,Satta G,Wellington B.Generalized multitext grammars[C] //Scott D,Daelemans W,Walker M,eds.Proceedings of the 42nd Annual Conference of the Association for Computational Linguistics.Stroudsburg,PA,USA:Association for Computational Linguistics,2004:661-668.
[6] Yamada K,Knight K.A decoder for syntax-based statistical MT[C] //Isabelle P,Charniak E,Lin D K,eds.Proceedings of the 40th Annual Conference of the Association for Computational Linguistics.Stroudsburg,PA,USA:Association for Computational Linguistics,2002:303-310.
[7] Younger D.Recognition and parsing of context-free languages in time n3[J]. Inf Control,1967,10(2):189-208.
[8] Cao H L,Zhao T J,Yang M Y,et al.Two-stage approach to full Chinese parsing[J].High Technology Letters,2005,11(4):359-363.
[9] Collins M.Head-driven statistical models for natural language parsing[J].Computational Linguistics,2003, 29(4):589-637.
[10] Papineni K,Roukos S,Ward T,et al.BLEU:a method for automatic evaluation of machine translation[C] //Isabelle P,Charniak E,Lin D K,eds. Proceedings of the 40th Annual Conference of the Association for Computational Linguistics.Stroudsburg,PA,USA:Association for Computational Linguistics,2002:311-318.

备注/Memo

备注/Memo:
基金项目: 国家高技术研究计划(863计划)资助项目(2006AA010108).
作者简介: 薛永增(1977—),男,博士生,xyz@mtlab.hit.edu.cn; 李生(联系人),男,教授,博士生导师,lisheng@hit.edu.cn.
更新日期/Last Update: 2007-09-20