Title :
Title extraction from Loosely Structured Data Records
Author :
Wu, Yi-pu ; Zhang, Xue-Jie ; Li, Qing ; Chen, Jing
Author_Institution :
Dept. of Comput. Sci. & Eng., Yunnan Univ., Kunming
Abstract :
In this paper, we present a novel title extraction method from loosely structured data records (LSDRs). Firstly, we automatically identify the format of titles and then extract them accordingly. For the Web page whose title is occurred in all the data records, we obtain the one in the candidate titles which has the largest length of the dasiasame contentpsila as the accurate title. And for the Web page whose title is occurred before the first data record, the candidate title which has the largest length of the dasiadifferent contentpsila can be considered as the accurate title. Our experiment demonstrates that our automatic algorithm is robust and effective on two databases collected from the Internet.
Keywords :
Internet; feature extraction; text analysis; Internet; Web page; loosely structured data record; title extraction; Computer science; Cybernetics; Data engineering; Data mining; Databases; HTML; Internet; Machine learning; Robustness; Web pages; Forum data; Loosely structured data records; Structured data records; Title extraction;
Conference_Titel :
Machine Learning and Cybernetics, 2008 International Conference on
Conference_Location :
Kunming
Print_ISBN :
978-1-4244-2095-7
Electronic_ISBN :
978-1-4244-2096-4
DOI :
10.1109/ICMLC.2008.4620851