计算机应用

• 人工智能与仿真 •    下一篇

增强领域特征的电力审计文本分类方法

陈平1,匡尧2,胡景懿3,王向阳2,蔡静1   

  1. 1. 国网湖北省电力有限公司技术培训中心
    2. 国网湖北省电力有限公司审计部
    3. 国网湖北省电力有限公司技术培训中心审计部
  • 收稿日期:2019-11-20 修回日期:2019-12-30 发布日期:2019-12-30 出版日期:2020-05-09
  • 通讯作者: 陈平

Text categorization method with enhanced domain features in power audit field

  • Received:2019-11-20 Revised:2019-12-30 Online:2019-12-30 Published:2020-05-09

摘要: 针对电力审计领域的文本具有行业特征明显、文本特征相似度高、分类边界模糊的特性,提出了增强领域 特征的电力审计文本分类方法。首先构建面向电力审计的专业词典,提出EF-Doc2VecC模型再联合专业词典增强文本的特征,最后送入BiLSTM分类器实现专业领域的文本分类。实验结果表明,针对专业性显著的电力审计类文本分类,EF-Doc2Vec 模型,在召回率、特异性、准确率和 F1值分类指标上比对照模型 Doc2VecC 分别高出 4,2,2,2 个百分点;针对通用领域文本分类,EF-Doc2VecC模型在召回率、差异性、准确率和F1值分类指标上比对照模型Doc2VecC高出 3,3,4,4个百分点。另外,EF-Doc2VecC 模型在电力审计类的文本分类性能分别比通用领域高出 4,5,3,3个百分点。因此,提出的文本向量表示方法及文本分类方法,不仅能提升通用领域的文本分类性能,还能显著提升垂直领域的文本细粒度分类性能。

Abstract: For that texts in power audit field has features with obvious industry characteristics,high text feature similarity,and fuzzy classification boundaries,a power audit text classification method with enhanced domain features was proposed. Firstly,a professional dictionary in power audit field was built and the EF-Doc2VecC model was proposed and combined with the professional dictionary text feature to obtain the enhanced feature text. The experimental results show that for the text classification of power audit with significant specialty,the EF-Doc2VecC model is 4,2,2,and 2 percentage points higher than the general domain text in terms of recall,sensitivity,precision,and F1 value classification index. In the general field,the EF-Doc2VecC model is 3,3,4 and 4 percentage points higher than the comparison method Doc2VecC in those evaluation indexes. In addition,comparing the classification performance of this method in vertical and general domains,the text classification performance in power audit field is 4,5,3 and 3 percentage points higher than that in general domain,respectively. Therefore,the text vector representation method and text classification method proposed in this paper can not only improve the text classification performance in the general field,but also significantly improve the finegrained classification performance in the vertical field.

中图分类号: