[1]林加镇,曹玖新,程杰.一种新的垃圾邮件样本采集方法[J].东南大学学报(自然科学版),2008,38(2):244-248.[doi:10.3969/j.issn.1001-0505.2008.02.012]
 Lin Jiazhen,Cao Jiuxin,Cheng Jie.New approach for spam sample collection[J].Journal of Southeast University (Natural Science Edition),2008,38(2):244-248.[doi:10.3969/j.issn.1001-0505.2008.02.012]
点击复制

一种新的垃圾邮件样本采集方法()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
38
期数:
2008年第2期
页码:
244-248
栏目:
计算机科学与工程
出版日期:
2008-03-20

文章信息/Info

Title:
New approach for spam sample collection
作者:
林加镇 曹玖新 程杰
东南大学计算机科学与工程学院, 南京 210096; 东南大学江苏省网络与信息安全重点实验室, 南京 210096; 东南大学计算机网络和信息集成教育部重点实验室, 南京 210096
Author(s):
Lin Jiazhen Cao Jiuxin Cheng Jie
School of Computer Science and Engineering, Southeast University, Nanjing 210096, China
Jiangsu Provincial Key Laboratory of Network and Information Security, Southeast University, Nanjing 210096, China
Key Laboratory of Computer Network and Infor
关键词:
垃圾邮件 过滤 样本采集 蜜罐帐户
Keywords:
spam filtering sample collection honeypot-account
分类号:
TP393.08
DOI:
10.3969/j.issn.1001-0505.2008.02.012
摘要:
为了提高垃圾邮件样本的覆盖率和实时性,降低垃圾邮件过滤系统的计算复杂性和滞后性,提出了基于垃圾邮件发送的行为特征,采用蜜罐原理进行垃圾邮件样本采集.引入蜜罐帐户评价公式,根据这个公式设计并实现了蜜罐帐户选择算法,动态地在电子邮件服务器中选择一定数量的帐户作为蜜罐并生成蜜罐集合,定期从蜜罐集合中采集邮件样本,作为过滤系统的学习语料.实验表明,利用该方法能够使采集到垃圾邮件样本覆盖率达到98%以上.由于系统能够定期地进行样本采集,因此实时性较强,从而提高系统过滤垃圾邮件的能力.
Abstract:
In order to improve the coverage rate and gain the real time property of the corpus used by filtering system as well as to reduce the computing complexity and hysteretic behavior, a new method for spam sample collection is proposed, which is based on the honeypot technology and the behavior characteristics of spam. An algorithm, on the basis of a honeypot-account evaluation formula, is designed to select the accounts in e-mail system as honeypot and dynamically build a set of honeypot-accounts. Spam samples are collected from this set of honeypots using the algorithm. Results show that the sample coverage rate can reach up to 98% and real time property can be obtained using this approach for collecting corpus, which as a result can improve the performance of the filtering system.

参考文献/References:

[1] Cohen W.Fast effective rule induction[C] //Machine Learning:Proceedings of the Twelfth International Conference.Lake Tahoe,California,1995:115-123.
[2] Carreras X,Marquez L.Boosting trees for anti-spam email filtering[C] //Proceedings of Euro Conference Recent Advances in NLP.Tzigov Chark,Bulgaria,2001:58-64.
[3] Nicholas T.Using AdaBoost and decision stumps to identify spam e-mail[EB/OL].(2003)[2007-03-09].http://nlp.stanford.edu/courses/cs224n/2003.fp,2003.
[4] Drucker H,Wu D,Vapnik V N.Support vector machines for spam categorization[J].IEEE Transactions on Neural Networks,1999,10(5):1048-1054.
[5] Sahami M,Dumais S,Hecherman D,et al.WS-98-05 A Bayesian approach to filtering junk e-mail[R].Madison Wisconsin,1998.
[6] Androutsopoulos I,Koutsias J,Chandrinos K V,et al.An evaluation of Naïve Bayesian anti-spam filtering[C] //Proc of the Workshop on Machine Learning in the New Information Age.Barcelona,Spain,2000:9-17.
[7] Stoll C.Stalking the wily hacker[J].Communications of the ACM,1988,31(5):484-497.
[8] Cheswick B.An evening with berferd:in which a cracker is lured,endured,and studied[C] //Proceedings of the Winter USENIX Conference.San Francisco,1992:163-174.
[9] Spitzner L.Open source honeypots:learning with honeyd[EB/OL].(2003-01-20)[2007-04-07].http://www.securityfocus.com/infocus/1659.
[10] Cohen F.A note on the role of deception in information protection [J].Computers and Security,1998,17(6):483-506.
[11] Cohen F.A mathematical structure of simple defensive network deceptions [J]. Computer and Security,2000,19(6):520-528.
[12] Spitzner L.Honeypots:definitions and value of honeypots[EB/OL].(2002-03-17)[2007-04-07].http://www.enteract.com/~lspitz/honeypot.html.
[13] Oudot L.Fighting spammers with honeypots:part 1 and 2[EB/OL].(2003-11)[2007-04-06].http://www.securityfocus.com/infocus/1747.
[14] Provos N.A virtual honeypot framework[R].San Diego,CA,2004.
[15] Andreolini M,Bulgarelli A,Colajanni M,et al.Honeyspam:honeypots fighting spam at the source[C] //Proc USENIX SRUTI.Cambridge,MA,2005:77-83.
[16] Spitzner L.Fighting relay spam the honeypot way[EB/OL].(2002)[2007-04-06].http://www.tracking —hackers.com/solutions/sendmail.html.
[17] Barnett R C.Open proxy honeypots[EB/OL].(2004)[2007-04-06].http://honeypots.sourceforge.net/open_proxy_honeypots.pdf.
[18] E-SCRUB Technologies INC.Wpoison[EB/OL].(2000)[2007-04-06].http://www.monkeys.com/wposion/.
[19] Wikipedia.Honeypot(computing)[EB/OL](2007-07-20)[2007-11-13].http://en.wikipedia.org/wiki/Honeypot(computing).
[20] Krawetz N.Anti-honeypot technology[J]. IEEE Security & Privacy,2004,2(1):76-79.

相似文献/References:

[1]赵欢,王世和,周飞,等.长纤维过滤与石英砂过滤的性能对比试验[J].东南大学学报(自然科学版),2006,36(1):138.[doi:10.3969/j.issn.1001-0505.2006.01.028]
 Zhao Huan,Wang Shihe,Zhou Fei,et al.Study on function correlation between long-fiber filtering and sand filtering[J].Journal of Southeast University (Natural Science Edition),2006,36(2):138.[doi:10.3969/j.issn.1001-0505.2006.01.028]
[2]周飞,王世和,赵欢,等.长纤维高速过滤器的适应性与稳定性[J].东南大学学报(自然科学版),2005,35(4):611.[doi:10.3969/j.issn.1001-0505.2005.04.025]
 Zhou Fei,Wang Shihe,Zhao Huan,et al.Adaptability and stability of long fiber efficient filter[J].Journal of Southeast University (Natural Science Edition),2005,35(2):611.[doi:10.3969/j.issn.1001-0505.2005.04.025]
[3]冯良贵,温扬敬.导出偶、谱序列与复形的张量积[J].东南大学学报(自然科学版),1996,26(3):63.[doi:10.3969/j.issn.1001-0505.1996.03.013]
 Feng Lianggui,Wen Yangjing,Wen Yangjing.Derived Couples, Spectral Sequences and Tensor Products of Complexes[J].Journal of Southeast University (Natural Science Edition),1996,26(2):63.[doi:10.3969/j.issn.1001-0505.1996.03.013]

备注/Memo

备注/Memo:
作者简介: 林加镇(1982—),男,硕士生; 曹玖新(联系人),男,博士,副教授,jx.cao@seu.edu.cn.
基金项目: 国家自然科学基金资助项目(90204009)、江苏省高技术研究资助项目(BG2004036).
引文格式: 林加镇,曹玖新,程杰.一种新的垃圾邮件样本采集方法[J].东南大学学报:自然科学版,2008,38(2):244-248.
更新日期/Last Update: 2008-03-20