[1]髙敬阳,管瑞.基于AdaBoost的基因组缺失变异综合检测策略[J].东南大学学报(自然科学版),2014,44(5):924-928.[doi:10.3969/j.issn.1001-0505.2014.05.009]
 Gao Jingyang,Guan Rui.Integrated AdaBoost-based strategy for detection of genomic deletions[J].Journal of Southeast University (Natural Science Edition),2014,44(5):924-928.[doi:10.3969/j.issn.1001-0505.2014.05.009]
点击复制

基于AdaBoost的基因组缺失变异综合检测策略()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
44
期数:
2014年第5期
页码:
924-928
栏目:
生物医学工程
出版日期:
2014-09-20

文章信息/Info

Title:
Integrated AdaBoost-based strategy for detection of genomic deletions
作者:
髙敬阳管瑞
北京化工大学信息科学与技术学院, 北京100029
Author(s):
Gao Jingyang Guan Rui
College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
关键词:
缺失变异 二代测序 特征提取 AdaBoost
Keywords:
deletion next-generation sequencing feature extraction AdaBoost
分类号:
Q523;TP274
DOI:
10.3969/j.issn.1001-0505.2014.05.009
摘要:
针对基因组缺失变异检测中测序序列分裂比对方法所存在的假发现率较高的问题,提出了一种基于检测理论和AdaBoost的综合检测策略.首先,对配对末端测序序列进行初次映射和二次分裂比对,得到1 bp解析度的候选缺失变异集合,并使得该集合中包含尽可能多的候选变异;然后,依据配对末端测序序列映射分析、测序序列分裂比对和测序序列映射深度分析3类检测方法的基本原理,在2次比对结果中提取与缺失变异相关的序列特征;最后,以具有高泛化性能的AdaBoost神经网络集成模型为判别模型,筛除候选集中的伪阳性结果,从而得到最终结果集.实验结果表明,相对于传统的测序序列分裂比对方法,所提策略能够在几乎不损失检测敏感度的前提下更加有效地降低假发现率.
Abstract:
To solve the problem that the false discovery rate of split-read approaches for genomic deletion detection is relatively high, an integrated strategy based on detection theories and AdaBoost is proposed. First, after initial mapping and following split read alignment of paired-end reads, a set containing 1 bp-resolution deletion candidates as many as possible is identified. Then, according to the fundamentals of read-pair technologies, split-read approaches and read-depth methods, deletion-related features are extracted based on the two alignment results. Finaly, to get final calls, an AdaBoost neural net ensemble model is generalized to distinguish true from false deletion candidates. The experimental results show that compared with the traditional split-read approaches, the proposed strategy can reduce the number of false positives more effectively with negligible loss of sensitivity.

参考文献/References:

[1] Moore L E, Baris D R, Figueroa J D, et al. GSTM1 null and NAT2 slow acetylation genotypes, smoking intensity and bladder cancer risk: results from the New England bladder cancer study and NAT2 meta-analysis[J]. Carcinogenesis, 2011, 32(2): 182-189.
[2] Blaydon D C, Biancheri P, Di W L, et al. Inflammatory skin and bowel disease linked to ADAM17 deletion[J]. New England Journal of Medicine, 2011, 365(16): 1502-1508.
[3] Lee M Y, Won H S, Baek J W, et al. Variety of prenatally diagnosed congenital heart disease in 22q11. 2 deletion syndrome[J]. Obstetrics & Gynecology Science, 2014, 57(1): 11-16.
[4] Lam H Y K, Clark M J, Chen R, et al. Performance comparison of whole-genome sequencing platforms[J]. Nature Biotechnology, 2011, 30(1): 78-82.
[5] Alkan C, Coe B P, Eichler E E. Genome structural variation discovery and genotyping[J]. Nature Reviews Genetics, 2011, 12(5): 363-376.
[6] Ye K, Schulz M H, Long Q, et al. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads[J]. Bioinformatics, 2009, 25(21): 2865-2871.
[7] Zhang Z D, Du J, Lam H, et al. Identification of genomic indels and structural variations using split reads[EB/OL].(2011-07-25)[2014-01-21]. http://www.biomedcentral.com/1471-2164/12/375.
[8] Zhang J, Wang J, Wu Y. An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data[EB/OL].(2012-04-19)[2014-01-21]. http://www.biomedcentral.com/1471-2105/13/S6/S6.
[9] Levy S, Sutton G, Ng P C, et al. The diploid genome sequence of an individual human[EB/OL].(2007-09-04)[2014-01-21]. http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.0050254.
[10] Huang W, Li L, Myers J R, et al. ART: a next-generation sequencing read simulator[J]. Bioinformatics, 2012, 28(4): 593-594.

备注/Memo

备注/Memo:
收稿日期: 2014-05-21.
作者简介: 髙敬阳(1966—),女,博士,副教授,gaojy@mail.buct.edu.cn.
基金项目: 国家自然科学基金资助项目(51275030,61472026).
引用本文: 髙敬阳,管瑞.基于AdaBoost的基因组缺失变异综合检测策略[J].东南大学学报:自然科学版,2014,44(5):924-928. [doi:10.3969/j.issn.1001-0505.2014.05.009]
更新日期/Last Update: 2014-09-20