• DocumentCode
    553162
  • Title

    A novel approach to compute pattern history for trend analysis

  • Author

    Jing-Doo Wang

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Asia Univ., Taichung, Taiwan
  • Volume
    3
  • fYear
    2011
  • fDate
    26-28 July 2011
  • Firstpage
    1746
  • Lastpage
    1750
  • Abstract
    It is attractive to observe the history of one pattern in the retrospective corpus such that one might sense the trends related to that pattern efficiently, where one pattern history was defined as the frequency distribution of that pattern over time. Pattern history could provide information analysts with valuable information and clues for trend analysis. Note that one pattern could be a token or a sequence of words in this study. To extract significant patterns from a large amount of texts, and meanwhile compute the corresponding patterns histories, a scalable and external memory approach based on bucket-like suffixes sorting and push-pop stack operations is proposed. To highlight the scalability and robustness of this approach, experimental data consisted of 3, 225, 549 articles (about 4 GB) downloaded from the PubMed for 20 years from 1990 to 2009, and the total computation time of patterns histories was about 48 hours using only one PC. Experimental results showed that specific patterns histories did reveal the variations of some events and gave hints for trend analysis.
  • Keywords
    pattern classification; sorting; text analysis; word processing; bucket-like suffix sorting; frequency distribution; information analyst; pattern extraction; pattern history; push-pop stack operation; retrospective corpus; scalability; text analysis; trend analysis; word sequence; Bioinformatics; Cancer; History; Lungs; Sorting; Time frequency analysis; USA Councils;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-61284-180-9
  • Type

    conf

  • DOI
    10.1109/FSKD.2011.6019799
  • Filename
    6019799