江苏科技信息 ›› 2016, Vol. 33 ›› Issue (8): 27-29.doi: 10.3969/j.issn.1004-7530.2016.08.004

• 论文 • 上一篇    下一篇

基于CRF的古汉语分词标注模型研究

严顺   

  1. 南京农业大学信息科学技术学院,江苏南京,210095
  • 出版日期:2016-03-15 发布日期:2016-03-15

Research on Word Segmentation and Tagging for Ancient Chinese Based on CRF

Yan Shun   

  • Online:2016-03-15 Published:2016-03-15

摘要: 中文分词是自然语言处理的重要研究范畴,当前关于古汉语的分词研究尚有待探索。文章基于条件随机场(CRF)模型探究了古汉语文献的自动分词,并设计了2组对比实验,对包含有27部经典先秦典籍的古汉语语料库进行了词性标注模型研究。

关键词: CRF, 古汉语语料库, 词性标注

Abstract: Chinese word segmentation is an important research area of Natural Language Processing (NLP). Current research on ancient Chinese words has yet to be explored. Article based on CRF model explores the automatic word segmentation of ancient Chinese literature, and designs two comparative experiments; 27 classic books of Pre-Qin Chinese corpus is part of speech (POS) tagging study model.