[1]温东新,董文菁,曹瑞,等.基于Alluxio的异步存储优化[J].东南大学学报(自然科学版),2018,48(2):248-252.[doi:10.3969/j.issn.1001-0505.2018.02.009]
 Wen Dongxin,Dong Wenjing,Cao Rui,et al.Asynchronous storage optimization based on Alluxio[J].Journal of Southeast University (Natural Science Edition),2018,48(2):248-252.[doi:10.3969/j.issn.1001-0505.2018.02.009]
点击复制

基于Alluxio的异步存储优化()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
48
期数:
2018年第2期
页码:
248-252
栏目:
计算机科学与工程
出版日期:
2018-03-20

文章信息/Info

Title:
Asynchronous storage optimization based on Alluxio
作者:
温东新董文菁曹瑞张展
哈尔滨工业大学计算机科学与技术学院, 哈尔滨 150001
Author(s):
Wen Dongxin Dong Wenjing Cao Rui Zhang Zhan
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
关键词:
异步存储 Alluxio 数据可靠性 世系关系 底层资源
Keywords:
asynchronous storage Alluxio data reliability lineage relationship underlying resources
分类号:
TP301
DOI:
10.3969/j.issn.1001-0505.2018.02.009
摘要:
为了减缓与底层存储直接传输数据的网络压力,保证异步传输速度与数据可靠性,综合分析了生成文件的操作是否可重算和重算时间等条件,采用传数据和传操作相结合的方式进行数据持久化.相对于传数据,传操作利用底层存储的计算资源,以较小数据传输代价完成部分数据的持久化.对于无法重计算恢复的文件,采用同步与异步相结合的策略保证文件的可靠性.实验结果表明,结合文件操作的异步存储策略Async-Store的运行时间比单纯的同步策略缩短41%,并在一定程度上保证了数据的可靠性.相比异步策略,同步与异步相结合的策略Async&Sync通过牺牲少量性能完全保证数据的可靠性,其运行时间较同步策略缩短26%.
Abstract:
In order to reduce the work load of directly transmitting data to the underlying storages as well as guarantee the speed of asynchronous data transmitting and the reliability of data, the constraints of data transmission and sending operation such as the operation of generating a file is recalculated or not and the recomputation time of a file are comprehensively analyzed, and the method by combining sending data with sending operation is proposed for data persistence. Compared with the transmitting data, sending operation can reduce the work load of network transmission by using the computing resources of underlying storages. To the files which cannot be recovered by recomputation, a strategy by combining asynchronization with synchronization(Async&Sync)is adopted to ensure the reliability of the file. The experimental results show that compared with the synchronization strategy, the running time of the asynchronous strategy combined with file operation can be reduced by 41% and the reliability of the data in some degree is guaranteed. Compared with the asynchronous strategy, the Async & Sync strategy can completely guarantee the reliability of the data with lower performance costs. And the running time is reduced by 26% compared with the synchronization strategy.

参考文献/References:

[1] Zaharia M, Chowdhury M, Das T, et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing [C]//Usenix Conference on Networked Systems Design and Implementation. Berkeley, CA, USA, 2012: 2.
[2] Vaidya M. MapReduce: A flexible data processing tool[J]. Communications of the ACM, 2010, 53(1): 72-77.
[3] Ousterhout J, Agrawal P, Erickson D, et al. The case for RAMCloud[J]. AcmSigops Operating Systems Review, 2009, 54(4): 121-130.
[4] Baker J, Bond C, Corbett J, et al. Megastore: Providing scalable, highly available storage for interactive services[C]//Fifth Biennial Conference on Innovative Data Systems Research. Asilomar, CA, USA, 2011: 223-234.
[5] Escriva R, Wong B. HyperDex: A distributed, searchable key-value store[J]. ACM SIGCOMM Computer Communication Review, 2012, 42(42): 25-36. DOI:10.1145/2377677.2377681.
[6] Ghemawat S, Gobioff H, Leung S. File and storage systems: The Google file system[J]. Acm Symposium on Operating Systems Principles Bolton Landing, 2003, 37: 29-43.
[7] anonym. Tiered storage on Alluxio. [EB/OL].[2017-04-03].http://www.alluxio.org/docs/master/cn/Tiered-Storage-on-Alluxio.html.
[8] Li, Haoyuan, Ghodsi, et al. Tachyon: Reliable, memory speed storage for cluster computing frameworks[J]. ACM Transactions on Networks, 2014, 14: 1-15.
[9] Anonym. Lineage [EB/OL].[2017-04-12]. http://www.alluxio.org/docs/master/cn/Lineage-API.html.
[10] Anonym. Why the data in memory still not be persisted into hdfs with setting write type to “ASYNC_THROUGH” [EB/OL].(2016-10-12)[2017-04-03]. http://alluxio-users.85194.x6.nabble.com/Why-the-data-in-memory-still-not-be-persisted-into-hdfs-with-setting-write-type-to-quot-ASYNC-THROUG-td1295.html#a1526.
[11] Massie M L, Chun B N, Culler D E. The ganglia distributed monitoring system: Design, implementation, and experience[J]. Parallel Computing, 2004, 30(7): 817-840. DOI:10.1016/j.parco.2004.04.001.

备注/Memo

备注/Memo:
收稿日期: 2017-09-15.
作者简介: 温东新(1971—),女,博士,副教授,wdongxin@hit.edu.cn.
基金项目: 国家自然科学基金资助项目(61370085)、国家高技术研究发展计划(863计划)资助项目(2013AA01A215).
引用本文: 温东新,董文菁,曹瑞,等.基于Alluxio的异步存储优化[J].东南大学学报(自然科学版),2018,48(2):248-252. DOI:10.3969/j.issn.1001-0505.2018.02.009.
更新日期/Last Update: 2018-03-20