• DocumentCode
    3143166
  • Title

    Extracting Document Semantics for Semantic Header

  • Author

    Wang, Tao ; Desai, Bipin C.

  • Author_Institution
    Dept. of Comput. Sci., Concordia Univ., Montreal, Que.
  • fYear
    2006
  • fDate
    38838
  • Firstpage
    1878
  • Lastpage
    1883
  • Abstract
    Accurate indexing and cataloguing of electronic information on the Internet is the foundation for precise retrieval. Most existing search systems, however, tend to generate misses and false hits due to the fact that they attempt to match the specified search terms in the target information resources without considering context. It is clear that using traditional keyword-based methods for representing semantics of information items has become a major obstacle to high precision. The notion of semantic header proposed previously captures the semantics of information resources that takes into account the logical structure of an information item. The contents of semantic header may be used by modern search systems to help locate an appropriate information item with minimum effort. In this paper, we present a system, called automatic semantic header generator (ASHG), for generating five key components of the semantic header. Finally, we evaluate the system with two sets of documents, and analyze the corresponding results
  • Keywords
    Internet; information retrieval; text analysis; Internet; automatic semantic header generator; document semantic extraction; text categorization; Computer science; Data mining; Indexing; Information analysis; Information resources; Information retrieval; Internet; Search engines; Text categorization; Web search; Semantics extraction; meta-data structure; text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical and Computer Engineering, 2006. CCECE '06. Canadian Conference on
  • Conference_Location
    Ottawa, Ont.
  • Print_ISBN
    1-4244-0038-4
  • Electronic_ISBN
    1-4244-0038-4
  • Type

    conf

  • DOI
    10.1109/CCECE.2006.277719
  • Filename
    4054998