• DocumentCode
    2456883
  • Title

    Automatic User Comment Detection in Flat Internet Fora

  • Author

    Bank, Mathias ; Mattes, Michael

  • Author_Institution
    Fac. for Math. & Econ., Univ. of Ulm, Ulm, Germany
  • fYear
    2009
  • fDate
    Aug. 31 2009-Sept. 4 2009
  • Firstpage
    373
  • Lastpage
    377
  • Abstract
    Millions of people are using the World Wide Web and are publishing content online. This user generated content contains many information relevant not only to marketing but to companies in general (customer-oriented products), governments (direct democracy) and many more. Analysis on such data becomes more and more important. This paper deals with a prerequisite: we propose an algorithm to automatically detect posting structures in flat internet fora to extract user comments. The algorithm is able to handle a wide range of different fora systems - even nested structures. The approach first detects the main content section by applying a modified version of the SST algorithm and then detects the posting structure by using several posting properties found in internet fora. It creates XPath expressions for faster data extraction in further steps.
  • Keywords
    Internet; data analysis; information retrieval; SST algorithm; World Wide Web; XPath expressions; automatic posting structures detection; automatic user comment detection; data analysis; data extraction; flat internet fora; user generated content; Algorithm design and analysis; Data mining; Databases; Expert systems; Internet; Mathematics; Publishing; User-generated content; Web pages; Web sites; Information Retrieval; crawler; extraction; forum; internet community; social media; web 2.0;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Application, 2009. DEXA '09. 20th International Workshop on
  • Conference_Location
    Linz
  • ISSN
    1529-4188
  • Print_ISBN
    978-0-7695-3763-4
  • Type

    conf

  • DOI
    10.1109/DEXA.2009.14
  • Filename
    5337102