Title :
Extracting Academic Information from Conference Web Pages
Author :
Wang, Peng ; You, Yue ; Xu, Baowen ; Zhao, Jianyu
Author_Institution :
Sch. of Comput. Sci. & Eng., Southeast Univ., Nanjing, China
Abstract :
Conference Web pages are the main platforms to share the conference information and organize conference events. To discover the academic knowledge from such Web pages for building academic ontologies or social networks, it is necessary to extract academic information from conference Web pages. This paper proposes an approach to extract academic information from conference Web pages. Firstly, Web pages are segmented into text blocks by analyzing the visual feature and DOM structure. Then Bayes Network is used to classify these text blocks into predefined categories, and the quality of initial classification results are improved after post-processing. Finally, the academic information is extracted from the classified text blocks. Our experimental results on the real world datasets show that the proposed method is highly effective and efficient for extracting academic information from conference Web pages, and it has average 90% precision and 89% recall.
Keywords :
Internet; belief networks; data mining; information retrieval; ontologies (artificial intelligence); social networking (online); text analysis; Bayes network; DOM structure; academic information extraction; academic knowledge discovery; academic ontologies; conference Web pages; conference event organization; conference information; social networks; text blocks; visual feature; Algorithm design and analysis; Classification algorithms; Data mining; Feature extraction; Semantics; Web pages; Bayes Network; DOM structure; Visual Feature; Web Information Extraction;
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International Conference on
Conference_Location :
Boca Raton, FL
Print_ISBN :
978-1-4577-2068-0
Electronic_ISBN :
1082-3409
DOI :
10.1109/ICTAI.2011.164