The structured extraction of geological information, semantic analysis, visual expression and the construction of knowledge map in text will provide a strong data foundation and technical support for the deep mining and utilization of geological big data. Whether it is a traditional statistical model or a deep learning model, the semantic analysis of geological information needs the support of tag corpus. In particular, the textual description of geological information has domain characteristics and cannot be achieved by migrating natural language corpora. Therefore, the construction of different levels of geological information annotation corpus has become the key foundation of geological semantic information analysis. Based on the analysis of the characteristics of the geological semantic information description language in Chinese text, according to the spatial and temporal characteristics and attribute description features of the geological entities, various semantic relations of geological entities are clearly expressed, and the geological semantic information is formed, formulating Chinese text labeling system and labeling specifications. The self-developed “interactive geological semantic information labeling tool”solves the shortcomings of traditional manual labeling methods such as high error rates and large workload. Using Chinese mineral resources literature and reports as data sources, a large-scale geological semantic information annotation corpus is constructed, which effectively solves the problem of the lack of large-scale standard data.
ZHANG Xueying, ZHANG Chunju, WANG Chen, LIU Wencong, PENG Ye, LU Yanxu
. Chinese Text-oriented Geological Semantic Information Annotation and Corpus Construction[J]. Geological Journal of China Universities, 2023
, 29(3)
: 429
-438
.
DOI: 10.16108/j.issn1006-7493.2023028