RELIABILITY & VALIDITY
Arthur Hughes, Testing for Language Teachers
"Even without considering the possibility of bias, we have to recognise the need for a common yardstick, which tests provide, in order to make meaningful comparisons."
The foundation of sound assessment starts with reliability (ability to measure similarly across contexts) and validity (ability to measure what exactly is intended to be measured). As professional educators, we are not creating large-scale, standardized tests; however, it is our job to fairly and appropriately assess our students.
How do your assessments measure up?
Examine the graphic organizer above and complete the following activity to check your understanding!
Decide whether the scenario concerns reliability or validity.
1. Students complete a writing exam that tests their understanding of a class reading. The teacher grades their papers for errors in grammar.
2. Two sections of Beginning Spanish take the same test. Section A takes the test at 10am on Wednesday and Section B takes the test at 1pm on Thursday. The majority of Section A scores between 80% and 100% and the majority Section B scores between 50% and 70%.
3. A teacher assigns a multiple choice exam of 50 items to their three sections of Advanced Spanish. Section A and B receive 60 minutes, but Section C receives 30 minutes due to a fire drill.
Think of these two concepts as the two parts of an egg.
Validity is the egg yolk (content, what's measured)
and reliability as the egg white (surrounds content, maintains consistency).
When we combine these two parts, we get consistent, delicious scrambled eggs. In other words, reliability and validity, together, provide consistent and sound assessment.
You're probably already naturally using some pieces of each in your existing language assessment. One scrambled egg at a time (looking at writing, reading, listening, and speaking) we can make quick fixes to your existing methods of assessment to create more sound, consistent assessments for your students. Let's scramble!
Activity Answers: 1) Validity (content intended to be graded, not grammar), 2) Reliability (morning vs. afternoon testing), 3) Reliability (need equal time across sections)