• DocumentCode
    479073
  • Title

    Learning Domain Feature from Text Corpora

  • Author

    Yu, Juan ; Dang, Yanzhong

  • Author_Institution
    Inst. of Syst. Eng., Dalian Univ. of Technol., Dalian
  • fYear
    2008
  • fDate
    12-14 Oct. 2008
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    For improving performance in automatically electronic documents processing, this paper proposes a concept of domain feature, which is defined as terms that can represent topics of a certain domain. Then it presents a non-lexicon-based approach automatically learning domain feature from text corpora. This approach combines the length first segment algorithm and domain feature possibility (DFP) algorithm. The former segments domain foreground corpora and extracts words and phrases in a satisfying recall rate, while the latter enhances the precision rate of learning by comparing different statistic properties that domain feature shows between foreground and background corpora. Experiments verify that given appropriate foreground and background corpora, this approach significantly improves efficiency in domain feature building and gets better result than manually building does. Algorithms combined in this approach can be widely used in other research domains of knowledge management.
  • Keywords
    knowledge management; possibility theory; text analysis; domain feature possibility algorithm; electronic documents processing; knowledge management; length first segment algorithm; nonlexicon-based approach; text corpora; Buildings; Carbon capture and storage; Frequency; Internet; Knowledge management; Ontologies; Statistical distributions; Statistics; Systems engineering and theory; Technology management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Wireless Communications, Networking and Mobile Computing, 2008. WiCOM '08. 4th International Conference on
  • Conference_Location
    Dalian
  • Print_ISBN
    978-1-4244-2107-7
  • Electronic_ISBN
    978-1-4244-2108-4
  • Type

    conf

  • DOI
    10.1109/WiCom.2008.2670
  • Filename
    4680859