Title :
Multidimensional text classification for drug information
Author :
Lertnattee, Verayuth ; Theeramunkong, Thanaruk
Author_Institution :
Sirindhorn Int. Inst. of Technol., Thammasat Univ., Pathumthani, Thailand
Abstract :
This paper proposes a multidimensional model for classifying drug information text documents. The concept of multidimensional category model is introduced for representing classes. In contrast with traditional flat and hierarchical category models, the multidimensional category model classifies each document using multiple predefined sets of categories, where each set corresponds to a dimension. Since a multidimensional model can be converted to flat and hierarchical models, three classification approaches are possible, i.e., classifying directly based on the multidimensional model and classifying with the equivalent flat or hierarchical models. The efficiency of these three approaches is investigated using drug information collection with two different dimensions: 1) drug topics and 2) primary therapeutic classes. In the experiments, k-nearest neighbor, naïve Bayes, and two centroid-based methods are selected as classifiers. The comparisons among three approaches of classification are done using two-way analysis of variance, followed by the Scheffe´´s test for post hoc comparison. The experimental results show that multidimensional-based classification performs better than the others, especially in the presence of a relatively small training set. As one application, a category-based search engine using the multidimensional category concept was developed to help users retrieve drug information.
Keywords :
Bayes methods; drugs; learning (artificial intelligence); natural languages; search engines; text analysis; Scheffe test; category-based search engine; centroid-based methods; drug information; drug topics; hierarchical model; machine learning; multidimensional category model; multidimensional text classification; multidimensional-based classification; naive Bayes method; natural language processing; primary therapeutic class; variance two-way analysis; Analysis of variance; Classification tree analysis; Drugs; Information retrieval; Machine learning; Markup languages; Multidimensional systems; Search engines; Testing; Text categorization; Abstracting and Indexing as Topic; Artificial Intelligence; Documentation; Drug Information Services; Information Storage and Retrieval; Internet; Natural Language Processing; Pattern Recognition, Automated; Periodicals as Topic; Pharmaceutical Preparations; Vocabulary, Controlled;
Journal_Title :
Information Technology in Biomedicine, IEEE Transactions on
DOI :
10.1109/TITB.2004.832542