[1]卢志远,谢建明,孙啸.基于重叠信息的基因组测序短片段定位算法[J].东南大学学报(自然科学版),2011,41(1):63-66.[doi:10.3969/j.issn.1001-0505.2011.01.013]
 Lu Zhiyuan,Xie Jianming,Sun Xiao.Maximum use of reads overlap information for short reads mapping[J].Journal of Southeast University (Natural Science Edition),2011,41(1):63-66.[doi:10.3969/j.issn.1001-0505.2011.01.013]
点击复制

基于重叠信息的基因组测序短片段定位算法()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
41
期数:
2011年第1期
页码:
63-66
栏目:
计算机科学与工程
出版日期:
2011-01-20

文章信息/Info

Title:
Maximum use of reads overlap information for short reads mapping
作者:
卢志远谢建明孙啸
(东南大学生物电子学国家重点实验室, 南京 210096)
Author(s):
Lu ZhiyuanXie JianmingSun Xiao
(State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China)
关键词:
短片段唯一子串唯一短片段片段重叠信息
Keywords:
short reads unique k-tuple unique short reads overlap information
分类号:
TP311.51
DOI:
10.3969/j.issn.1001-0505.2011.01.013
摘要:
提出了一种新的测序短片段定位算法Umap,算法引入核心片段逐步扩展延伸的基本思想,通过短片段间的重叠信息定位短片段.首先找出所有在参考基因组上只出现一次的短片段,称为唯一短片段.然后以唯一短片段为基础,利用短片段间的重叠信息,使用贪婪算法对唯一短片段进行扩展,进而确定其他非唯一短片段的准确位置.实验表明,该算法对短片段的定位比现有短片段定位算法更加准确,能够定位的短片段数目更多,匹配的短片段比率达到71%.通过利用客观存在于短片段间的重叠信息,可以更加准确地在参考基因组上对短片段在参考基因组上进行定位,减少模糊匹配.
Abstract:
A new short reads mapping algorithm Umap is presented here. Short reads are mapped to the reference genome using the main thought of contig extension based on reads overlap information. The unique reads which match only one position in the reference genome are found at first. Then, these unique reads are extended by greedy algorithm, and finally the un-unique reads’ position in the reference genome are found. The experiments show that Umap can map short reads more accurately. And up to 71% short reads can be mapped to the reference genome. Taking advantages of the overlap information, short reads can be mapped to the reference genome more accurately.

参考文献/References:

[1] Mcpheron John D.Next-generation gap[J].Nature Methods,2009,11(6):S2-S5.
[2] Altschul S F,Gish W,Miller W,et al.Basic local alignment search tool[J].J Mol Biol,1990,215(3):403-410.
[3] Ning Z,Cox A J,Mullikin J C.SSAHA:a fast search method for large DNA databases[J].Genome Res,2001,11(10):1725-1729.
[4] Li H,Ruan J,Durbin R.Mapping short DNA sequencing reads and calling variants using mapping quality scores[J].Genome Res,2008,18(11):1851-1858.
[5] Lin H,Zhang Z,Zhang M Q,et al.ZOOM! zillions of oligos mapped[J].Bioinformatics,2008,24(21):2431-2437.
[6] Campagna D,Albiero A,Bilardi A,et al.PASS:a program to align short sequences[J].Bioinformatics,2009,25(7):967-968.
[7] Li R,Li Y,Kristiansen K,et al.SOAP:short oligonucleotide alignment program[J].Bioinformatics,2008,24(5):713-714.
[8] Burrows M,Wheeler D J.A block-sorting lossless data compression algorithm[R].Technical Report 124,America:Digital Equipment Corporation,1994.
[9] Langmead B,Trapnell C,Pop M,et al.Ultrafast and memory-efficient alignment of short DNA sequences[J].Genome Biology,2009,10(3):R25.
[10] Li H,Durbin R.Fast and accurate short read alignment with burrows—wheeler transform[J].Bioinformatics,2009,25(14):1754-1760.
[11] Li R,Yu C,Li Y,et al.SOAP2:an improved ultrafast tool for short read alignment[J].Bioinformatics,2009,25(15):1966-1967.

备注/Memo

备注/Memo:
作者简介:卢志远(1981—),男,博士生;孙啸(联系人),男,博士,教授,博士生导师,xsun@seu.edu.cn.
基金项目:国家自然科学基金资助项目(60671018,60771024).
引文格式: 卢志远,谢建明,孙啸.基于重叠信息的基因组测序短片段定位算法[J].东南大学学报:自然科学版,2011,41(1):63-66.[doi:10.3969/j.issn.1001-0505.2011.01.013]
更新日期/Last Update: 2011-01-20