DocumentCode :
1093236
Title :
Multidimensional text classification for drug information
Author :
Lertnattee, Verayuth ; Theeramunkong, Thanaruk
Author_Institution :
Sirindhorn Int. Inst. of Technol., Thammasat Univ., Pathumthani, Thailand
Volume :
8
Issue :
3
fYear :
2004
Firstpage :
306
Lastpage :
312
Abstract :
This paper proposes a multidimensional model for classifying drug information text documents. The concept of multidimensional category model is introduced for representing classes. In contrast with traditional flat and hierarchical category models, the multidimensional category model classifies each document using multiple predefined sets of categories, where each set corresponds to a dimension. Since a multidimensional model can be converted to flat and hierarchical models, three classification approaches are possible, i.e., classifying directly based on the multidimensional model and classifying with the equivalent flat or hierarchical models. The efficiency of these three approaches is investigated using drug information collection with two different dimensions: 1) drug topics and 2) primary therapeutic classes. In the experiments, k-nearest neighbor, naïve Bayes, and two centroid-based methods are selected as classifiers. The comparisons among three approaches of classification are done using two-way analysis of variance, followed by the Scheffe´´s test for post hoc comparison. The experimental results show that multidimensional-based classification performs better than the others, especially in the presence of a relatively small training set. As one application, a category-based search engine using the multidimensional category concept was developed to help users retrieve drug information.
Keywords :
Bayes methods; drugs; learning (artificial intelligence); natural languages; search engines; text analysis; Scheffe test; category-based search engine; centroid-based methods; drug information; drug topics; hierarchical model; machine learning; multidimensional category model; multidimensional text classification; multidimensional-based classification; naive Bayes method; natural language processing; primary therapeutic class; variance two-way analysis; Analysis of variance; Classification tree analysis; Drugs; Information retrieval; Machine learning; Markup languages; Multidimensional systems; Search engines; Testing; Text categorization; Abstracting and Indexing as Topic; Artificial Intelligence; Documentation; Drug Information Services; Information Storage and Retrieval; Internet; Natural Language Processing; Pattern Recognition, Automated; Periodicals as Topic; Pharmaceutical Preparations; Vocabulary, Controlled;
fLanguage :
English
Journal_Title :
Information Technology in Biomedicine, IEEE Transactions on
Publisher :
ieee
ISSN :
1089-7771
Type :
jour
DOI :
10.1109/TITB.2004.832542
Filename :
1331408
Link To Document :
بازگشت