Title :
HT2X[ML]: An HTML converter
Author :
Baghdadi, Hossein Shahsavand ; Ranaivo-Malançon, Bali
Author_Institution :
Fac. of Inf. Technol., Multimedia Univ., Cyberjaya, Malaysia
Abstract :
Capturing specific data among an HTML file and encapsulate it somehow to be usable for other tools, is a significant challenge in web mining. This paper is going to introduce HT2X[ML] which is a tool to extract customized information from HTML files in both user-customized and automatic way and convert them into well-formed XML and plain text format. The result would be suitable to use by other tools in any purposes.
Keywords :
XML; data encapsulation; data mining; hypermedia markup languages; text analysis; HT2X[ML]; HTML converter; automatic way; data capturing; data encapsulation; information extract; plain text format; user-customized way; web mining; well-formed XML; Converters; Data mining; Graphical user interfaces; HTML; Web pages; XML; HTML; Plain Text; XML;
Conference_Titel :
Electronics and Information Engineering (ICEIE), 2010 International Conference On
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-7679-4
Electronic_ISBN :
978-1-4244-7681-7
DOI :
10.1109/ICEIE.2010.5559899