欢迎访问《高校地质学报》官方网站,今天是
分享到:

高校地质学报 ›› 2023, Vol. 29 ›› Issue (3): 419-428.DOI: 10.16108/j.issn1006-7493.2023026

• 固体地球科学文本挖掘和知识图谱专栏 特邀主编:马 超 诸云强 闾海荣 胡修棉 • 上一篇    下一篇

地质领域文本实体关系联合抽取方法

邱芹军1,2,王 斌1,2,徐德馨5,马 凯3,4,谢 忠1,2*,潘声勇6,陶留锋1,2   

  1. 1. 中国地质大学(武汉)计算机学院, 武汉 430074;
    2. 智能地学信息处理湖北省重点实验室,武汉 430074;
    3. 三峡大学 计算机与信息学院,宜昌 443002;
    4. 湖北省水电工程智能视觉监测重点实验室,宜昌 443002; 5. 武汉市测绘研究院,武汉 430074;  6. 武汉中地数码科技有限公司,武汉 430074
  • 出版日期:2023-06-20 发布日期:2023-06-20

Research on the Joint Extraction Method of Entity Relations in Geological Domain

QIU Qinjun1,2,WANG Bin1,2,XU Dexin5,MA Kai3,4,XIE Zhong1,2*,PAN Shengyong6,TAO Liufeng1,2   

  1. 1. School of Computer Sciences, China University of Geosciences, Wuhan 430074, China;
    2. Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan 430074, China;
    3. College of Computer and Information Technology, China Three Gorges University, Yichang 443002, China;
    4. Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Hubei, Yichang, 443002, China;
    5. Wuhan Geomatics Institute, Wuhan 430074, China;
    6. Wuhan Zondy Cyber Science & Technology Co., Ltd., Wuhan 430074, China
  • Online:2023-06-20 Published:2023-06-20

摘要: 地质领域实体关系抽取是构建地质知识图谱的基础,对地质领域文本信息抽取与知识库构建具有重要的作用。针对地质领域实体关系复杂、缺少人工标注语料库等特点,提出了面向地质领域实体关系联合抽取模型,着重对多地质文本中存在的复杂重叠关系进行识别,避免传统流水线模型中由于实体识别错误造成级联误差。文章构建了高质量地质领域实体关系语料库,提出了基于预训练语言模型BERT(Bidirectional Encoder Representations from Transformers)和双向门控循环单元BiGRU(Bidirectional Gated Recurrent Units)与条件随机场CRF(Conditional Random Field)的序列标注模型,实现对实体关系的联合抽取。在构建数据集上进行了实验,结果表明,本文提出的联合抽取模型在实体关系抽取上的F1值达到0.671,验证了本文模型在地质实体关系抽取的有效性。

关键词: 地质领域, 实体关系联合抽取, 知识图谱, BERT, BiGRU

Abstract: Entity relationship extraction for the geological domain is the basis for building a geological knowledge graph, and is very important for text information extraction and knowledge base construction in the geological domain. In view of the complexity of entity relations in geological domain and the lack of a manually annotated corpus, a joint extraction model for entity relations in geological domain is proposed, focusing on the recognition of complex overlapping relations in multiple geological texts and avoiding cascading errors caused by entity recognition errors in the traditional pipeline model. In this paper, a high-quality corpus of entity relations in the geological domain is constructed, and a pre-trained language model based on BERT (Bidirectional Encoder Representations from Transformers) and BiGRU (Bidirectional Gated Recurrent Units) is proposed. Recurrent Units and Conditional Random Field (CRF) sequence annotation models to achieve joint extraction of entity relations. Experiments were conducted on the constructed dataset, and the results showed that the F1 value of the joint extraction model proposed in this paper reached 0.671 for entity relationship extraction, which verified the effectiveness of the model in this paper for geological entity relationship extraction.

Key words: geological domain, entity relationship union extraction, knowledge graph, BERT, BiGRU

中图分类号: