Abstract :
The CALES corpus (Corpus Archive of Learner English in Sabah/Sarawak) is a text database containing more than 481,000 words of essay writing produced by university undergraduates taking English courses in four higher education institutions in Sarawak and Sabah. Among the project´s aims are to identify, using this corpus, the most frequently-occurring features of the interlanguage of university students in East Malaysian institutions of higher learning, as well as to relate these features to student variables such as native language background, ethnicity, age, gender and level of English proficiency. Finally, the project aims to compare the interlanguage features found in essays by East Malaysian University students with a comparable sample of essays written by native speaker students. This paper will briefly discuss some selected findings concerning the interlanguage of university students in East Malaysia.