Title :
The Tibetan Microblog Text Representation Method Based on Shallow Parsing
Author :
Li Ailin;Yu Hongzhi;Yuan Bin
Author_Institution :
Nat. Languages Inf. Technol., Northwest Univ. for Nat., Lanzhou, China
Abstract :
Tibetan text representation, which has great influence on Tibetan text Categorization and Cluster, is the groundwork in Tibetan text mining. Tibetan microblog is one of the most popular Tibetan network media. Researches on Tibetan microblog are now increasing. However, because of the special features of microblog text and the features of Tibetan language, traditional Tibetan text representation method cannot satisfy the need. This paper proposes a Tibetan microblog text representation method that is based on shallow parsing and takes the Tibetan micro-blog sentiment analysis experiment. First, for Tibetan micro-blog text, the syntactic structure is generated by using syntactic tree. Second, the semantic feature space is built based on syntactic structures semantic features. Then, the semantic Cluster centroid is formed with the K-means method in the feature space. Last, the TF-IDF value based on cluster is calculated. The experiment shows, the method of this paper is compared with the SVM+TF-IDF and Naive Bayes+ the Maximum Entropy method, the F-measure is as high as 91.4%.
Keywords :
"Syntactics","Semantics","Text mining","Labeling","Feature extraction","Training"
Conference_Titel :
Computational Intelligence and Design (ISCID), 2015 8th International Symposium on
Print_ISBN :
978-1-4673-9586-1
DOI :
10.1109/ISCID.2015.297