• DocumentCode
    1850051
  • Title

    Domain-specific keyphrase extraction and near-duplicate article detection based on ontology

  • Author

    Nhon Do ; LongVan Ho

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Inf. Technol. - VNU HCMC, Ho Chi Minh City, Vietnam
  • fYear
    2015
  • fDate
    25-28 Jan. 2015
  • Firstpage
    123
  • Lastpage
    126
  • Abstract
    The significant increase in number of the online newspapers has given web users a giant information source. The users are really difficult to manage content as well as check the correctness of articles. In this paper, we introduce algorithms of extracting keyphrase and matching signatures for near-duplicate articles detection. Based on ontology, keyphrases of articles are extracted automatically and similarity of two articles is calculated by using extracted keyphrases. Algorithms are applied on Vietnamese online newspapers for Labor & Employment. Experimental results show that our proposed methods are effective.
  • Keywords
    feature extraction; ontologies (artificial intelligence); text analysis; text detection; domain-specific keyphrase extraction; near-duplicate article detection; ontology; Algorithm design and analysis; Data mining; Educational institutions; Employment; Information technology; Ontologies; Semantics; document retrieval system; keyphrase extraction; near-duplicate detection; ontology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing & Communication Technologies - Research, Innovation, and Vision for the Future (RIVF), 2015 IEEE RIVF International Conference on
  • Conference_Location
    Can Tho
  • Print_ISBN
    978-1-4799-8043-7
  • Type

    conf

  • DOI
    10.1109/RIVF.2015.7049886
  • Filename
    7049886