كليدواژه :
اعتبارسنجي , مهارت صحبت كردن , معيار نمره دهي , راش , تحليل عاملي , آزمون جامع زبان فارسي فردوسي
چكيده فارسي :
هارت صحبت كردن بخش بسيار مهمي از توانايي زباني افراد را دربر مي گيرد. بهره مندي از اين مهارت در محيط دانشگاه نيز اهميت بسزايي دارد؛ اما سنجش صحبت كردن كار چندان ساده اي نيست و با مشكلاتي مانند دشواري در نمره دهي روبه رو است. در اين پژوهش تلاش شده است تا ميزان اعتبار معيار نمره دهي مهارت صحبت كردن در آزمون جامع زبان فارسي مركز بين المللي دانشگاه فردوسي مشهد مطالعه شود. به همين منظور، نتايج به دست آمده از يكي از آزمون هاي برگزار شده در اين مركز به وسيله مدل هاي آماري راش و تحليل عاملي بررسي شد. نتايج نشان داد كه پايايي آزمون گيرنده 97 درصد است. اين عدد بيانگر درك نسبتا يكسان آزمون گيرندگان از معيار نمره دهي است. همچنين، در اين آزمون، آزمون گيرندگان توانسته اند به شكل مناسبي مقياس نمره دهي را براي آزمون دهندگان با توانايي هاي مختلف به كار گيرند؛ زيرا آستانه هاي به دست آمده بر اساس مدل راش، سير صعودي منظمي داشته اند. نقشه آزمون دهنده - پرسش نيز نشان مي دهد كه معيار نمره دهي توانايي تمييز زبان آموزان ضعيف، متوسط و قوي از يكديگر را داشته است. با اين حال، در بالاي طيف توانمندي آزمون دهندگان، هشت آزمون دهنده قرار گرفته است كه هيچ نمره اي متناسب با سطح توانمندي شان ديده نمي شود؛ يعني معيار نمره دهي در تمييز آن ها كارآمد نبوده است. از سوي ديگر، بار عاملي به دست آمده براي سه سازه شيوه بيان، كيفيت زبان و بسط موضوع به ترتيب 76، 78 و 74 درصد بوده است. اين امر نشان مي دهد تقسيم توانايي صحبت كردن به سه عامل ياد شده متناسب و دقيق است و هر كدام از اين سازه ها توانمندي متفاوتي را سنجش مي كنند. از اين ميان كيفيت زبان بيشترين و سازه بسط موضوع، كم ترين ميزان بار عاملي را داشته اند.
چكيده لاتين :
The ability to speak is an important part of every body’s language
proficiency. This ability plays an important role in the academic life of
students. But scoring and assessing speaking is not easy. In this research, we
try to study the validity of Ferdowsi University’s Persian proficiency test.
We know that every test has a certain amount of error; but in scoring speaking
ability if the scoring rubric is designed in a scientific way, the score attributed to
the speakers' speech ability is likely to be very similar to their actual language
ability. In other words, the appropriate scoring rubric can have a significant
effect on reducing the error rate of the test. In norm-reference tests, this can be
achieved only when test designers can say what scoring constructs they intend to
measure and how successful they are in achieving that goal. Also, it should be
clear whether the scoring scale can distinguish weak, medium, and strong test
takers. On the other hand, in applying the scoring rubric , the level of consensus
of the scorers should be clear. In order to see how successful is the scoring
rubric in Ferdowsi Persian proficiency test, in measuring the test taker’s
speaking ability, the authors analyzed the result of one of the proficiency
tests administered at Ferdowsi University with Rasch model and factor
analysis. The result showed that scorer reliability is 0.97 which is so high. It
showed that scorers have the same understanding of the scoring rubric. This
means that the scorers have given the test takers a relatively stable score,
which is a strong point for the test. Also, the scores have used the scoring
rubric properly because the cut score goes up in an organized way as the
ability of test-takers increase. Each of the four thresholds obtained by the
Rash statistical model differs by approximately 5 degrees, respectively. A regular increase in thresholds is commensurate with the ability of the test
takers. This indicates a correct understanding of the scorers of the 5 grades
specified in the scoring rubric; in other words, scorers have a good
understanding of the level of competence of test takers and its relationship
with the grades in the scoring rubric. The Wright map shoes that the scoring
rubric can differentiate basic, intermediate and advanced test-takers well.
Although on the top of the map there are 8 test-takers which there is no score
for them that means the needs some higher scores for them. On the other
hand, factor load for three constructs, delivery, language use and topic
development are 0.74, 0.78 and 0.76. This shows that dividing speaking
ability into these three constructs is proper while language use has the
highest factor load and topic development has the lowest factor load.