DocumentCode :
3320321
Title :
A correlation-based algorithm for classifying technical articles
Author :
Kilany, Rania ; Ammar, Reda ; Rajasekaran, Sanguthevar
Author_Institution :
Comput. Sci. & Eng. Dept, Univ. of Connecticut, Storrs, CT, USA
fYear :
2011
fDate :
14-17 Dec. 2011
Abstract :
An enormous amount of information is constantly generated by scientists in various branches of science as a result of research conducted especially in the field of Biology. These research outcomes are reported in journal and conference articles. For example, Pubmed currently stores millions of abstracts and is growing at a rapid pace. Given such a large repository, one of the challenges for any biologist will be to search for articles that will likely have specific information that (s) he is looking for. A computational tool that can come up with a short list of papers that are likely to contain the information of interest will be of great use to any scientist. In this paper we present generic computational techniques that can be used to build such tools. A typical tool that we envision will take as input a set of keywords (that characterize the information of interest) and will develop a learner that is capable of classifying papers into two types. A Type 1 paper does have information of interest and a Type 2 paper does not. It is noteworthy that there are tools reported in the literature that are similar to what we study in this paper. An example is the TextMine algorithm of [11]. We show that our algorithms yield better results than TextMine. For each PubMed paper, the TextMine algorithm computes the likelihood of this paper containing information on minimotifs. As a result, the algorithm assigns a score for each paper. Those papers that have a score above a threshold will be output for the biologists to read manually. TextMine has proven to be a very valuable tool for enhancing the minimotif database of the MnM system [12] [13].
Keywords :
biology computing; data mining; pattern classification; text analysis; Pubmed; TextMine algorithm; article searching; biology; computational tool; conference articles; correlation-based algorithm; generic computational technique; journal; minimotif database; paper classification; technical article classification; Accuracy; Biology; Classification algorithms; Correlation; Databases; Text mining; Training; Article Classification; Correlation Coefficien; Data Mining; Minimotif; Text Categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing and Information Technology (ISSPIT), 2011 IEEE International Symposium on
Conference_Location :
Bilbao
Print_ISBN :
978-1-4673-0752-9
Electronic_ISBN :
978-1-4673-0751-2
Type :
conf
DOI :
10.1109/ISSPIT.2011.6151534
Filename :
6151534
Link To Document :
بازگشت