地质领域实体关系抽取是构建地质知识图谱的基础,对地质领域文本信息抽取与知识库构建具有重要的作用。针对地质领域实体关系复杂、缺少人工标注语料库等特点,提出了面向地质领域实体关系联合抽取模型,着重对多地质文本中存在的复杂重叠关系进行识别,避免传统流水线模型中由于实体识别错误造成级联误差。文章构建了高质量地质领域实体关系语料库,提出了基于预训练语言模型BERT(Bidirectional Encoder Representations from Transformers)和双向门控循环单元BiGRU(Bidirectional Gated Recurrent Units)与条件随机场CRF(Conditional Random Field)的序列标注模型,实现对实体关系的联合抽取。在构建数据集上进行了实验,结果表明,本文提出的联合抽取模型在实体关系抽取上的F1值达到0.671,验证了本文模型在地质实体关系抽取的有效性。
Entity relationship extraction for the geological domain is the basis for building a geological knowledge graph, and is very important for text information extraction and knowledge base construction in the geological domain. In view of the complexity of entity relations in geological domain and the lack of a manually annotated corpus, a joint extraction model for entity relations in geological domain is proposed, focusing on the recognition of complex overlapping relations in multiple geological texts and avoiding cascading errors caused by entity recognition errors in the traditional pipeline model. In this paper, a high-quality corpus of entity relations in the geological domain is constructed, and a pre-trained language model based on BERT (Bidirectional Encoder Representations from Transformers) and BiGRU (Bidirectional Gated Recurrent Units) is proposed. Recurrent Units and Conditional Random Field (CRF) sequence annotation models to achieve joint extraction of entity relations. Experiments were conducted on the constructed dataset, and the results showed that the F1 value of the joint extraction model proposed in this paper reached 0.671 for entity relationship extraction, which verified the effectiveness of the model in this paper for geological entity relationship extraction.