Ofqual’s Absolute Error
In science lessons we teach students about the two main categories of error when taking readings. (And yes, I know that it’s a little more complicated than that.) We teach about random and systematic error.
Random errors are the ones due to inherently changing and unpredictable variables. They give readings which may be above or below the so-called ‘true value’. We can make allowances for them by repeating the reading, keeping all control variables the same, then finding a mean value. The larger the range, the bigger the potential random error – this is now described as the precision of the reading. I sometimes have my students plot this range as an error bar.
A systematic error is an artifact of the measuring system. It will be consistent, in direction and size (perhaps in proportion to the reading, rather than absolute). A common type is a ‘zero error’, where the measuring device does not start at zero so all readings are offset from the true value. We sometimes calibrate our readings to account for this.
You can consider spelling errors due to sloppy typing as being random, while persistently misspelling a particular word is systematic.
So what does this have to do with Ofqual?
The recent issues with the scoring of GCSE English coursework – discussed on twitter with the hashtag #gcsefiasco – are a good example of errors causing problems. But if we use the scientific approach to errors, it is much harder to blame teachers as Stacey has done.
Coursework is marked by teachers according to a markscheme, provided by the exam board. (It’s worth remembering that apart from multiple choice papers all external exams are marked in this way too.) An issue with controlled assessments is that teachers are unavoidably familiar with the marking guidelines, so can ensure students gain skills that should help them demonstrate their knowledge. This is after all the point of the classroom, to learn how it’s done. To complain that we ‘teach to the test’ is like criticising driving instructors for teaching teenagers how to drive on British roads.
Once the work of all students in a cohort has been marked, the department will spend some time on ‘internal moderation’. This means checking a random sample, making sure everyone has marked in the same way, and to the standard specified by the markscheme. Once the school has committed to the accuracy of the marks, they are sent to the exam board who will specify a new random sample to be remarked externally. If the new scores match those awarded by the school, within a narrow tolerance, then all the scores are accepted. If not, then all will be adjusted, up or down, to correct for a systematic error by the department. There will still be a few random errors – deviations from the ‘correct’ score on specific essays – but these will be fairly rare.
The exam board then converts the coursework score, using a top secret table, into a percentage of the available marks. You may not need to get everything perfect to get an ‘effective’ 100% on the coursework element of the course. And dropping 2 of 50 on the raw score, as marked by the teachers, may mean more than a 4% decrease after conversion. This table will be different for different papers because some exams are harder than others, but changes should be minimal if we want to able to compare successive years.
So what happened last summer?
Students who had gained the same raw score on the same coursework task, which had been marked to the same standard as confirmed by the exam boards during external moderation, were awarded different percentages by the exam boards depending on when the work was sent in. This was after sustained pressure from Ofqual, possibly because using the same boundaries in June as they had in January would have resulted in ‘too many’ higher grades. This was not about a small number of random errors in marking. This was not about a systematic error by some or all schools, because the boards had procedures to identify that. This was about a failure by the exam boards and Ofqual to discreetly fix the results the way they intended to.
It is a basic principle in science that you cannot adjust your results based on what you want or expect them to be. You might be surprised, you might recheck your working, but you can’t change the numbers because of wishful thinking. If there was an error, it was by the exam boards and Ofqual, who showed that they could not specify what work was equivalent to a C grade.
The procedures were followed in schools. The exam boards agreed that the controlled assessments were marked to their own standards. And yet Ofqual still claim that it is the fault of us teachers, who prepared our students so well for the controlled assessment that we are being called cheats.
I’ve blogged before about the weaknesses built in to the science ISAs. The exam board and Ofqual are either too busy to read what one teacher has to say – perfectly reasonable – or don’t have an answer. I don’t understand how it is our fault when their system approved what teachers did and how they marked.
So maybe we shouldn’t be marking controlled assessments at all.
PS (This is the cue for the unions to step in. And they won’t. This is why we need one national professional body representing teachers, using evidence rather than political rhetoric.)
Filed under: assessment, exams, political | Leave a Comment
Tags: exams, gcse, political