Welcome to Geological Journal of China Universities ! Today is
Share:

Geological Journal of China Universities ›› 2023, Vol. 29 ›› Issue (3): 429-438.DOI: 10.16108/j.issn1006-7493.2023028

Previous Articles     Next Articles

Chinese Text-oriented Geological Semantic Information Annotation and Corpus Construction

ZHANG Xueying1,ZHANG Chunju2, 3*,WANG Chen3,LIU Wencong3,PENG Ye4,LU Yanxu1   

  1. 1. Institute of Geographical Science, Nanjing Normal University, Nanjing 210046, China;
    2. The Key Laboratory of JiangHuai Arable Land Resources Protection and Eco-restoration, Ministry of Natural Resources, Hefei 230036, China;
    3. The School of Civil Engineering, Hefei University of Technology, Hefei 230009, China;
    4. Urban Planning and Development Institute, Yangzhou University, Yangzhou 225127, China
  • Online:2023-06-20 Published:2023-06-20

Abstract: The structured extraction of geological information, semantic analysis, visual expression and the construction of knowledge map in text will provide a strong data foundation and technical support for the deep mining and utilization of geological big data. Whether it is a traditional statistical model or a deep learning model, the semantic analysis of geological information needs the support of tag corpus. In particular, the textual description of geological information has domain characteristics and cannot be achieved by migrating natural language corpora. Therefore, the construction of different levels of geological information annotation corpus has become the key foundation of geological semantic information analysis. Based on the analysis of the characteristics of the geological semantic information description language in Chinese text, according to the spatial and temporal characteristics and attribute description features of the geological entities, various semantic relations of geological entities are clearly expressed, and the geological semantic information is formed, formulating Chinese text labeling system and labeling specifications. The self-developed “interactive geological semantic information labeling tool”solves the shortcomings of traditional manual labeling methods such as high error rates and large workload. Using Chinese mineral resources literature and reports as data sources, a large-scale geological semantic information annotation corpus is constructed, which effectively solves the problem of the lack of large-scale standard data.

Key words: Chinese text, geological entity, semantic relationship, labeling system, labeling specification

CLC Number: