Title :
Words Clustering Based on Keywords Indexing from Large-scale Categorization Corpora
Author_Institution :
Coll. of Chinese Language & Culture, Jinan Univ., Guangzhou, China
Abstract :
Keywords are indexed automatically for large-scale categorization corpora. Indexed keywords of more than 20 documents are selected as seed words, thus overcoming subjectivity of selecting seed words in clustering; at the same time, clustering is limited to particular category corpora and keywords indexed feature extraction method is adopted to obtain domanial words automatically, thus reducing noise of similarity calculation.
Keywords :
document handling; feature extraction; indexing; feature extraction; keywords indexing; large-scale categorization; words clustering; Feature extraction; Indexing; Information security; Large-scale systems; Materials science and technology; Noise reduction; Societies; Statistics; Vocabulary; Web pages; Categorization corpora; Clustering; Domanial words; Keywords indexing;
Conference_Titel :
Information Assurance and Security, 2009. IAS '09. Fifth International Conference on
Conference_Location :
Xian
Print_ISBN :
978-0-7695-3744-3
DOI :
10.1109/IAS.2009.271