• DocumentCode
    566745
  • Title

    Mining bilingual linguistic patterns with aligned and parsed bilingual corpus

  • Author

    Wang, Bo ; Meng, Fanqi ; Hou, Yuexian

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Tianjin Univ., Tianjin, China
  • Volume
    1
  • fYear
    2012
  • fDate
    26-28 June 2012
  • Firstpage
    123
  • Lastpage
    127
  • Abstract
    Classical grammar for natural languages, which is defined by the linguistics, is widely used in many natural languages processing (NLP) tasks, such as information extraction, machine translation and parsing. The classical grammar is well defined but is context free and does not include the complex patterns which contain multiple linguistic units. On the other hand, there are also many simple patterns which are not included in the classical grammar but are useful in the NLP tasks. Therefore, the recognition of special linguistic patterns from natural language is an important step in various NLP systems. We propose an unsupervised method to automatically discover the complex monolingual linguistic patterns from a classically parsed and aligned bilingual corpus. And all the patterns in one language are qualified by the other parallel language. A specialized and efficient algorithm is applied to mine the frequent bilingual subtrees in the forest and the found subtrees are formalized as the linguistic patterns.
  • Keywords
    data mining; grammars; linguistics; natural language processing; program compilers; trees (mathematics); NLP systems; aligned bilingual corpus; bilingual linguistic pattern mining; classical grammar; complex monolingual linguistic patterns; context free; forest subtrees; found subtrees; frequent bilingual subtrees; information extraction; machine translation; natural languages processing; parallel language; parsed bilingual corpus; unsupervised method; Data mining; Pragmatics; alignment; linguistic patterns; parsing; subtree mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science and Digital Content Technology (ICIDT), 2012 8th International Conference on
  • Conference_Location
    Jeju
  • Print_ISBN
    978-1-4673-1288-2
  • Type

    conf

  • Filename
    6269240