A method of mining bilingual resources from Web Based on Maximum Frequent Sequential Pattern

Author

Zhang, Guiping ; Luo, Yang ; Ji, Duo

Author_Institution

Knowledge Eng. Res. Center, Shenyang Aerosp. Univ., Shenyang, China

fYear

2010

fDate

21-23 Aug. 2010

Firstpage

1

Lastpage

8

Abstract

The bilingual resources are indispensable and vital resources in the NPL fields, such as machine translation, etc. A large amount of electronic information is embedded in the Internet, which can be used as a potential information source of large-scale multi-language corpus, so it is a potential and feasible way to mine a great capacity of true bilingual resources from the Web. This paper proposes a method of mining bilingual resources from the Web based on Maximum Frequent Sequential Pattern. The method uses the heuristic approach to search and filter the candidate bilingual web pages, then mines patterns using maximum frequent sequential, and uses a machine learning method for extending the pattern base and verifying bilingual resources in accordance with the Japanese to Chinese word proportion. The experimental results indicate that the method could extract bilingual resources efficiently, with the precision rate over 90%.

Keywords

Internet; data mining; language translation; natural language processing; Internet; Japanese to Chinese word proportion; NPL fields; bilingual Web pages; bilingual resources mining; machine translation; maximum frequent sequential pattern; multilanguage corpus; Aerospace engineering; Artificial neural networks; Information filters; Knowledge engineering; Bilingual corpus; Maximum Frequent Sequential Pattern; Pattern base; Web mining;

fLanguage

English

Publisher

ieee

Conference_Titel

Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on

Conference_Location

Beijing

Print_ISBN

978-1-4244-6896-6

Type

conf

DOI

10.1109/NLPKE.2010.5587831

Filename

5587831