Title :
Metadata Extraction from Semi-structured Email Documents
Author :
Sharma, Ashok ; Chaudhary, B.D. ; Gore, M.M.
Author_Institution :
Satyam Comput. Services Ltd., Hyderabad
fDate :
July 27 2008-Aug. 1 2008
Abstract :
This paper presents metadata extraction technique from email documents. Emails are characterized in terms of keywords that are extracted from body of the mail using frequency, average similarity and term discrimination value measures. The email metadata is defined as a document type definition (DTD) in extensible markup language (XML) that captures the structure as well as content characterizing keywords with their attributes (weights). A PERL application has been designed and implemented to extract keywords with their attributes (weights) and generate XML document for email metadata. Practical applications of metadata extraction technique are also discussed briefly.
Keywords :
Perl; XML; electronic mail; meta data; PERL application; XML document; average similarity; document type definition; email metadata; extensible markup language; metadata extraction; semi-structured email documents; term discrimination value measures; Content management; Data mining; Frequency measurement; Indexing; Information technology; Postal services; Spine; Text analysis; XML; DTD; Perl; Term discrimination values; XML; average similarity; email; index words; metadata;
Conference_Titel :
Computing in the Global Information Technology, 2008. ICCGI '08. The Third International Multi-Conference on
Conference_Location :
Athens
Print_ISBN :
978-0-7695-3275-2
Electronic_ISBN :
978-0-7695-3275-2
DOI :
10.1109/ICCGI.2008.52