基于词汇增强和表格填充的中文命名实体识别-AET-电子技术应用

基于词汇增强和表格填充的中文命名实体识别

电子技术应用

褚天舒1，唐球1，梁军学2，徐睿1，王明阳2，刘涛2

1.华北计算机系统工程研究所，北京 100083；2.中国人民解放军93216部队，北京 100085

摘要： 中文命名实体识别主要包括中文平面命名实体识别和中文嵌套命名实体识别两个任务，其中中文嵌套命名实体识别任务难度更大。提出了一个基于词汇增强和表格填充的统一模型TLEXNER，该模型能够同时处理上述任务。该模型首先针对中文语料分词困难的问题，使用词典适配器将词汇信息融合到BERT预训练模型，并且将字符与词汇组的相对位置信息集成到BERT的嵌入层中；然后通过条件层归一化和双仿射模型构造并预测字符对表格，使用表格建模字符与字符之间的关系，得到平面实体与嵌套实体的统一表示；最后根据字符对表格上三角区域的数值判断实体类别。提出的模型在平面实体的公开数据集Resume和自行标注的军事领域嵌套实体数据集上F1分别是97.35%和91.96%，证明了TLEXNER模型的有效性。

关键词： 词汇增强中文命名实体识别表格填充

中图分类号：TP391 文献标志码：A DOI: 10.16157/j.issn.0258-7998.233939
中文引用格式： 褚天舒，唐球，梁军学，等. 基于词汇增强和表格填充的中文命名实体识别[J]. 电子技术应用，2024，50(2)：23-29.
英文引用格式： Chu Tianshu，Tang Qiu，Liang Junxue，et al. Chinese named entity recognition based on lexicon enhancement and table filling[J]. Application of Electronic Technique，2024，50(2)：23-29.

Chinese named entity recognition based on lexicon enhancement and table filling

Chu Tianshu1，Tang Qiu1，Liang Junxue2，Xu Rui1，Wang Mingyang2，Liu Tao2

1.National Computer System Engineering Research Institute of China， Beijing 100083， China； 2.People′s Liberation Army 93216， Beijing 100085， China

Abstract： Chinese named entity recognition has been involved with two tasks, including Chinese flat named entity recognition and Chinese nested named entity recognition. Chinese nested named entity recognition is more difficult. Therefore, this paper proposes a unified model, namely TLEXNER, based on lexicon enhancement and table filling, which can tackle the above two tasks concurrently. Aiming at the difficulty of Chinese word segmentation, the lexicon adapter is used to integrate the lexicon information into the BERT pre-training model，and integrates the relative position information of characters and lexical groups into the BERT embedding layer. Then conditional layer normalization and biaffine model is used to build and predict the representation of the character-pair table, and the relationship between character pairs is modeled by table structure to obtain the unified representation of flat entities and nested entities.

Key words : lexicon enhancement；Chinese named entity recognition；table filling

引言

在大数据时代，每天都产生海量的文本数据，如何从这些存在大量冗余的数据中获取真正有价值的知识信息显得愈发重要。使用知识抽取方法能够自动识别并提取所需知识要素信息，为后续的知识融合、知识加工、知识应用提供数据支撑，其中命名实体识别是知识抽取的重要任务，也是知识图谱、数据挖掘、智能检索、问答系统等下游任务的基础，命名实体识别技术的研究具有重要的理论需求与现实意义。

中文命名实体识别根据粒度划分可分为基于词的命名实体识别、基于字符的命名实体识别和基于字词混合的命名实体识别。与英文命名实体识别相比，中文没有明确的单词分隔符号，因此，中文命名实体识别存在分词困难的问题。

本文详细内容请下载：

https://www.chinaaet.com/resource/share/2000005850

作者信息：

褚天舒1，唐球1，梁军学2，徐睿1，王明阳2，刘涛2

1.华北计算机系统工程研究所，北京 100083；2.中国人民解放军93216部队，北京 100085

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容