DocumentCode
2456883
Title
Automatic User Comment Detection in Flat Internet Fora
Author
Bank, Mathias ; Mattes, Michael
Author_Institution
Fac. for Math. & Econ., Univ. of Ulm, Ulm, Germany
fYear
2009
fDate
Aug. 31 2009-Sept. 4 2009
Firstpage
373
Lastpage
377
Abstract
Millions of people are using the World Wide Web and are publishing content online. This user generated content contains many information relevant not only to marketing but to companies in general (customer-oriented products), governments (direct democracy) and many more. Analysis on such data becomes more and more important. This paper deals with a prerequisite: we propose an algorithm to automatically detect posting structures in flat internet fora to extract user comments. The algorithm is able to handle a wide range of different fora systems - even nested structures. The approach first detects the main content section by applying a modified version of the SST algorithm and then detects the posting structure by using several posting properties found in internet fora. It creates XPath expressions for faster data extraction in further steps.
Keywords
Internet; data analysis; information retrieval; SST algorithm; World Wide Web; XPath expressions; automatic posting structures detection; automatic user comment detection; data analysis; data extraction; flat internet fora; user generated content; Algorithm design and analysis; Data mining; Databases; Expert systems; Internet; Mathematics; Publishing; User-generated content; Web pages; Web sites; Information Retrieval; crawler; extraction; forum; internet community; social media; web 2.0;
fLanguage
English
Publisher
ieee
Conference_Titel
Database and Expert Systems Application, 2009. DEXA '09. 20th International Workshop on
Conference_Location
Linz
ISSN
1529-4188
Print_ISBN
978-0-7695-3763-4
Type
conf
DOI
10.1109/DEXA.2009.14
Filename
5337102
Link To Document