[1]黄廷磊,张伟莉,梁霄,等.数据驱动的细粒度中文属性对齐方法[J].东南大学学报(自然科学版),2017,47(4):660-666.[doi:10.3969/j.issn.1001-0505.2017.04.006]
 Huang Tinglei,Zhang Weili,Liang Xiao,et al.Data-driven method for fine-grained property alignment between Chinese open datasets[J].Journal of Southeast University (Natural Science Edition),2017,47(4):660-666.[doi:10.3969/j.issn.1001-0505.2017.04.006]
点击复制

数据驱动的细粒度中文属性对齐方法()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
47
期数:
2017年第4期
页码:
660-666
栏目:
自动化
出版日期:
2017-07-20

文章信息/Info

Title:
Data-driven method for fine-grained property alignment between Chinese open datasets
作者:
黄廷磊12张伟莉123梁霄12付琨12
1中科院空间信息处理与应用系统技术重点实验室, 北京 100190; 2中国科学院电子学研究所, 北京 100190; 3中国科学院大学, 北京 100049
Author(s):
Huang Tinglei12 Zhang Weili123 Liang Xiao12 Fu Kun12
1CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, Beijing 100190, China
2 Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China
3 University of Chinese Academy of Sciences, Beijing 100049, China
关键词:
中文属性对齐 属性数据类型判别 属性相似度 异构数据集成 知识图谱构建
Keywords:
Chinese property alignment property data type determination similarity of properties heterogeneous data integration construction of knowledge graphs
分类号:
TP182
DOI:
10.3969/j.issn.1001-0505.2017.04.006
摘要:
为提高中文开源数据集间属性关系识别的准确率,提出一种数据驱动的细粒度对齐方法,综合利用属性的扩展、定义域等对属性间的同义、包含、相关等关系进行统一识别.方法首先利用统计理论确定属性的数据类型,并给出类型感知的属性相似度计算方法.在此基础上,将属性关系识别建模为多分类问题,抽取有效特征对不同关系进行描述并用于随机森林模型的构建.实验结果表明,该方法中属性数据类型判别的准确率达94.6%,最终对同义、包含、相关关系识别的F1值分别为71.3%,57.3%及59.9%.相比只关注同义属性的传统方法,细粒度属性对齐方法不仅提高了同义属性识别的准确性,而且可识别出相互包含和相关的属性,证明了其在中文开源数据集上的有效性.
Abstract:
In order to improve the performance of property alignment between heterogeneous Chinese open datasets, a data-driven method for fine-grained alignment is proposed, which exploits the extension and domain information of properties to find equivalence, subsumption and relevance relations between properties in a unified way. First, the data types of properties are determined utilizing statistical theory, and a type-aware metric is given to calculate the similarity of properties. Based on that, the property relation recognition is modeled as a multi-classification problem, and effective features are generated to represent different property relationships and construct the random forest classifier. The experimental results show that, the proposed method can reach a precision of 94.6% in determining data types of properties, and the final F1 measures in recognizing equivalent, subsumptive and relevant properties are 71.3%, 57.3% and 59.9%, respectively. Compared with the traditional approaches that only focus on equivalent properties, the fine-grained property alignment method can improve the precision in recognizing equivalent properties, and recognize subsumptive and relevant properties, proving its effectiveness on Chinese open datasets.

参考文献/References:

[1] Gunaratna K, Thirunarayan K, Jain P, et al. A statistical and schema independent approach to identify equivalent properties on linked data[C]//I-Semantics 2013, 9th International Conference on Semantic Systems. Graz, Austria, 2013: 33-40. DOI: 10.1145/2506182.2506187.
[2] Niu X, Sun X, Wang H, et al. Zhishi.me-weaving Chinese linking open data[C]//10th International Semantic Web Conference. Bonn, Germany, 2011: 205-220. DOI: 10.1007/978-3-642-25093-4_14.
[3] Wang Z C, Wang Z G, Li J Z, et al. Knowledge extraction from Chinese wiki encyclopedias[J]. Journal of Zhejiang University Science C, 2012, 13(4): 268-280. DOI: 10.1631/jzus.C1101008.
[4] Wang H, Wu T, Qi G, et al. On publishing Chinese linked open schema[C]//13th International Semantic Web Conference. Riva del Garda, Italy, 2014: 293-308. DOI:10.1007/978-3-319-11964-9_19.
[5] 徐增林,盛泳潘,贺丽荣,等.知识图谱技术综述[J].电子科技大学学报,2016,45(4):589-606. DOI: 10.3969/j.issn.1001-0548.2016.04.012.
Xu Zenglin, Sheng Yongpan, He Lirong, et al. Review on knowledge graph techniques[J]. Journal of University of Electronic Science and Technology of China, 2016, 45(4): 589-606. DOI:10.3969/j.issn.1001-0548.2016.04.012. (in Chinese)
[6] 漆桂林,高桓,吴天星.知识图谱研究进展[J].情报工程,2017,3(1):4-25.
  Qi Guilin, Gao Huan, Wu Tianxing. The research advances of knowledge graph[J]. Technology Intelligence Engineering, 2017, 3(1): 4-25.(in Chinese)
[7] Qiu L, Yu J, Pu Q, et al. Knowledge entity learning and representation for ontology matching based on deep neural networks[J]. Cluster Computing, 2017, 20(2): 969-977. DOI: 10.1007/s10586-017-0844-1.
[8] Shvaiko P, Euzenat J. Ontology matching: State of the art and future challenges[J]. IEEE Transactions on Knowledge & Data Engineering, 2013, 25(1): 158-176. DOI: 10.1109/TKDE.2011.253.
[9] Suchanek F M, Abiteboul S, Senellart P. PARIS: Probabilistic alignment of relations, instances, and schema[J]. Proceedings of the VLDB Endowment, 2011, 5(3): 157-168. DOI: 10.14778/2078331.2078332.
[10] Cheatham M, Hitzler P. The properties of property alignment[C]//9th International Conference on Ontology Matching. Riva del Garda, Italy, 2014: 13-24.
[11] Cheatham M, Hitzler P. String similarity metrics for ontology alignment[C]//12th International Semantic Web Conference. Sydney, Australia, 2013: 294-309. DOI:10.1007/978-3-642-41338-4_19.
[12] Zhang Z, Gentile A L, Blomqvist E, et al. An unsupervised data-driven method to discover equivalent relations in large linked datasets[J]. Semantic Web, 2017, 8(2): 197-223.
[13] 王峰,李小平,王茜.基于形式概念分析的模式匹配算法[J].东南大学学报(自然科学版),2009,39(1):34-39.
  Wang Feng, Li Xiaoping, Wang Qian. Formal concept analysis based schema matching[J]. Journal of Southeast University(Natural Science Edition), 2009, 39(1): 34-39.(in Chinese)
[14] Jean-Mary Y R, Shironoshita E P, Kabuka M R. Ontology matching with semantic verification[J]. Web Semantics Science Services and Agents on the World Wide Web, 2009, 7(3): 235-251. DOI: 10.1016/j.websem.2009.04.001.
[15] Seddiqui M H, Aono M. An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size[J]. Web Semantics: Science, Services and Agents on the World Wide Web, 2009, 7(4): 344-356. DOI: 10.1016/j.websem.2009.09.001.
[16] Ruan T, Dong X, Wang H, et al. Evaluating and comparing web-scale extracted knowledge bases in Chinese and English[C]//5th Joint International Conference, JIST 2015. Yichang, China, 2015: 167-184. DOI: 10.1007/978-3-319-31676-5_12.
[17] Li J, Tang J, Li Y, et al. RiMOM: A dynamic multistrategy ontology alignment framework[J]. IEEE Transactions on Knowledge & Data Engineering, 2009, 21(8): 1218-1232. DOI: 10.1109/TKDE.2008.202.
[18] Fu L, Wang H, Jin W, et al. Towards better understanding and utilizing relations in DBpedia[J]. Web Intelligence & Agent Systems, 2012, 10(3): 291-303. DOI: 10.3233/WIA-2012-0247.
[19] Liu Y, Chen S H, Chen J G G. Property alignment of linked data based on similarity between functions[J]. International Journal of Database Theory & Application, 2015, 8(4): 191-206. DOI: 10.14257/ijdta.2015.8.4.20.
[20] Adar E, Skinner M, Weld D S. Information arbitrage across multi-lingual Wikipedia[C]//Proceedings of the Second ACM International Conference on Web Search and Data Mining. Barcelona, Spain, 2009: 94-103. DOI: 10.1145/1498759.1498813.
[21] Wang H, Fang Z, Zhang L, et al. Effective online knowledge graph fusion[C]//14th International Semantic Web Conference. Bethlehem, PA, USA, 2015: 286-302. DOI:10.1007/978-3-319-25007-6_17.
[22] 胡芳槐.基于多种数据集的中文知识图谱构建方法研究[D].上海:华东理工大学信息科学与工程学院,2015.

备注/Memo

备注/Memo:
收稿日期: 2016-11-15.
作者简介: 黄廷磊(1971—),男,博士,教授,博士生导师,tlhuang@mail.ie.ac.cn.
基金项目: 国家高技术研究发展计划(863计划)资助项目(2012AA011005).
引用本文: 黄廷磊,张伟莉,梁霄,等.数据驱动的细粒度中文属性对齐方法[J].东南大学学报(自然科学版),2017,47(4):660-666. DOI:10.3969/j.issn.1001-0505.2017.04.006.
更新日期/Last Update: 2017-07-20