Title :
Research on Improved Algorithm for Chinese Word Segmentation Based on Markov Chain
Author :
Baomao, Pang ; Haoshan, Shi
Author_Institution :
Coll. of Electron. Inf., Northwest Polytech. Univ., Xi´´an, China
Abstract :
Chinese words segmentation is an important technique for Chinese Web data mining. After the research made on some Chinese word segmentation nowadays, an improved algorithm is proposed in this paper. The algorithm updates dictionary by using two-way Markov chain, and does word segmentation by applying an improved forward maximum matching method based on word frequency statistic. The simulation shows this algorithm can finish word segmentation for a given text quickly and accurately.
Keywords :
Internet; Markov processes; data mining; natural language processing; pattern matching; text analysis; Chinese Web data mining; Chinese word segmentation; dictionary; forward maximum matching method; text analysis; two-way Markov chain; word frequency statistic; Algorithm design and analysis; Data mining; Dictionaries; Educational institutions; Frequency; Information security; Natural languages; Space technology; Statistical analysis; Statistics;
Conference_Titel :
Information Assurance and Security, 2009. IAS '09. Fifth International Conference on
Conference_Location :
Xian
Print_ISBN :
978-0-7695-3744-3
DOI :
10.1109/IAS.2009.317