مرکز منطقه ای اطلاع رساني علوم و فناوري - Learning text classifier using the domain concept hierarchy

DocumentCode :

3169451

Title :

Learning text classifier using the domain concept hierarchy

Author :

Wang, Bill B. ; McKay, Ri Bob ; Abbass, Hussein A. ; Barlow, Michael

Author_Institution :

Sch. of Comput. Sci., Univ. of New South Wales, Canberra, ACT, Australia

Volume :

fYear :

2002

fDate :

29 June-1 July 2002

Firstpage :

1230

Abstract :

Automatic text categorization is an important component in many information organization and management tasks. Research has shown that similarity based categorization algorithms like K-nearest neighbour (KNN) are effective in document categorization. These algorithms use index terms to represent documents. However some drawbacks persecute these algorithms. One major drawback is that they tend to use all features when computing the similarities, which implies that they must search in a high-dimensional space. Another major drawback is that they tend to use a very large training document set so that all terms, which are important to identify content of documents, are covered. To overcome these drawbacks, in this paper, we present a novel method to search for the optimal representation in a domain ontology hierarchical structure to reflect concepts for the taxonomic standard for pre-defined categories. Experiments have shown this is a feasible method to reduce the dimensionality of the document vector space effectively and reasonably and consequently improves the generalisation power of the derived classifier. The result is a classification method which is both very significantly less costly, in computation terms, and yet of considerably higher accuracy than comparable methods.

Keywords :

classification; indexing; learning (artificial intelligence); search problems; text analysis; vocabulary; K-nearest neighbour algorithms; KNN document categorization; automatic text categorization; classifier generalisation power; document vector space dimensionality reduction; domain concept hierarchy learning text classifiers; domain ontology hierarchical structures; heuristic search algorithms; high-dimensional space search; index terms; information organization; optimal concept representations; pre-defined category taxonomic standards; semantics; similarity based categorization algorithms; training document sets; Computer science; Decision trees; Educational institutions; Information management; Internet; Natural languages; Neural networks; Ontologies; Software libraries; Text categorization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Communications, Circuits and Systems and West Sino Expositions, IEEE 2002 International Conference on

Print_ISBN :

0-7803-7547-5

Type :

conf

DOI :

10.1109/ICCCAS.2002.1179005

Filename :

1179005

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3169451