Title :
A novel approach to compute pattern history for trend analysis
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Asia Univ., Taichung, Taiwan
Abstract :
It is attractive to observe the history of one pattern in the retrospective corpus such that one might sense the trends related to that pattern efficiently, where one pattern history was defined as the frequency distribution of that pattern over time. Pattern history could provide information analysts with valuable information and clues for trend analysis. Note that one pattern could be a token or a sequence of words in this study. To extract significant patterns from a large amount of texts, and meanwhile compute the corresponding patterns histories, a scalable and external memory approach based on bucket-like suffixes sorting and push-pop stack operations is proposed. To highlight the scalability and robustness of this approach, experimental data consisted of 3, 225, 549 articles (about 4 GB) downloaded from the PubMed for 20 years from 1990 to 2009, and the total computation time of patterns histories was about 48 hours using only one PC. Experimental results showed that specific patterns histories did reveal the variations of some events and gave hints for trend analysis.
Keywords :
pattern classification; sorting; text analysis; word processing; bucket-like suffix sorting; frequency distribution; information analyst; pattern extraction; pattern history; push-pop stack operation; retrospective corpus; scalability; text analysis; trend analysis; word sequence; Bioinformatics; Cancer; History; Lungs; Sorting; Time frequency analysis; USA Councils;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-61284-180-9
DOI :
10.1109/FSKD.2011.6019799