DocumentCode
2348686
Title
A method of mining bilingual resources from Web Based on Maximum Frequent Sequential Pattern
Author
Zhang, Guiping ; Luo, Yang ; Ji, Duo
Author_Institution
Knowledge Eng. Res. Center, Shenyang Aerosp. Univ., Shenyang, China
fYear
2010
fDate
21-23 Aug. 2010
Firstpage
1
Lastpage
8
Abstract
The bilingual resources are indispensable and vital resources in the NPL fields, such as machine translation, etc. A large amount of electronic information is embedded in the Internet, which can be used as a potential information source of large-scale multi-language corpus, so it is a potential and feasible way to mine a great capacity of true bilingual resources from the Web. This paper proposes a method of mining bilingual resources from the Web based on Maximum Frequent Sequential Pattern. The method uses the heuristic approach to search and filter the candidate bilingual web pages, then mines patterns using maximum frequent sequential, and uses a machine learning method for extending the pattern base and verifying bilingual resources in accordance with the Japanese to Chinese word proportion. The experimental results indicate that the method could extract bilingual resources efficiently, with the precision rate over 90%.
Keywords
Internet; data mining; language translation; natural language processing; Internet; Japanese to Chinese word proportion; NPL fields; bilingual Web pages; bilingual resources mining; machine translation; maximum frequent sequential pattern; multilanguage corpus; Aerospace engineering; Artificial neural networks; Information filters; Knowledge engineering; Bilingual corpus; Maximum Frequent Sequential Pattern; Pattern base; Web mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-6896-6
Type
conf
DOI
10.1109/NLPKE.2010.5587831
Filename
5587831
Link To Document