Title :
Language-independent information extraction based on formal concept analysis
Author :
Mironczuk, Marcin ; Czerski, Dariusz ; Sydow, Marcin ; Klopotek, Mieczyslaw A.
Author_Institution :
Inst. of Comput. Sci., Warsaw, Poland
Abstract :
This paper proposes application of Formal Concept Analysis (FCA) in creating character-level information extraction patterns and presents BigGrams: a prototype of a language-independent information extraction system. The main goal of the system is to recognise and to extract of named entities belonging to some semantic classes (e.g. cars, actors, pop-stars, etc.) from semi structured text (web page documents).
Keywords :
formal concept analysis; information retrieval; text analysis; BigGrams; FCA; Web page documents; character-level information extraction patterns; formal concept analysis; language-independent information extraction; named entity extraction; named entity recognition; semistructured text; Context; Data mining; Formal concept analysis; HTML; Information retrieval; Lattices; Seals;
Conference_Titel :
Informatics and Applications (ICIA),2013 Second International Conference on
Conference_Location :
Lodz
Print_ISBN :
978-1-4673-5255-0
DOI :
10.1109/ICoIA.2013.6650277