Measurable Outcomes

Following a conversation on twitter about the phonics screening test administered in primary school, I have a few thoughts about how it’s relevant to secondary science. First, a little context – especially for colleagues who have only the vaguest idea of what I’m talking about. I should point out that all I know about synthetic phonics comes from glancing at materials online and helping my own kids with reading.

Synthetic Phonics and the Screening Check

This is an approach to teaching reading which relies on breaking words down into parts. These parts and how they are pronounced follow rules; admittedly in English it’s probably less regular than many other languages! But the rules are useful enough to be a good stepping stone. So far, so good – that’s true of so many models I’m familiar with from the secondary science classroom.

The phonics screen is intended, on the face of it, to check if individual students are able to correctly follow these rules with a sequence of words. To ensure they are relying on the process, not their recall of familiar words, nonsense words are included. There are arguments that some students may try to ‘correct’ those to approximate something they recognise – the same way as I automatically read ‘int eh’ as ‘in the’ because I know it’s one of my characteristic typing mistakes. I’m staying away from those discussions – out of my area of competence! I’m more interested in the results.

Unusual Results

We’d expect most attributes to follow a predictable pattern over a population. Think about height in humans, or hair colour. There are many possibilities but some are more common than others. If the distribution isn’t smooth – and I’m sure there are many more scientific ways to describe it, but I’m using student language because of familiarity – then any thresholds are interesting by definition. They tell us, something interesting is happening here.

The most exciting phrase to hear in science, the one that heralds new discoveries, is not “Eureka!” but “That’s funny …”

Possibly Isaac Asimov. Or possibly not.

It turns out that with the phonics screen, there is indeed a threshold. And that threshold just so happens to be at the nominal ‘pass mark’. Funny coincidence, huh?

The esteemed Dorothy Bishop, better known to me and many others as @deevybee, has written about this several times. A very useful post from 2012 sums up the issue. I recommend you read that properly – and the follow-up in 2013, which showed the issue continued to be of concern – but I’ve summarised my own opinion below.

phonics plot 2013
D Bishop, used with permission.

Some kids were being given a score of 32 – just passing – than should have been. We can speculate on the reasons for this, but a few leading candidates are fairly obvious:

  • teachers don’t want pupils who they ‘know’ are generally good with phonics to fail by one mark on a bad day.
  • teachers ‘pre-test’ students and give extra support to those pupils who are just below the threshold – like C/D revision clubs at GCSE.
  • teachers know that the class results may have an impact on them or the school.

This last one is the issue I want to focus on. If the class or school results are used in any kind of judgment or comparison, inside or outside the school, then it is only sensible to recognise that human nature should be considered. And the pass rate is important. It might be factor when it comes time for internal roles. It might be relevant to performance management discussions and/or pay progression. (All 1% of it.)

“The teaching of phonics (letters and the sounds they make) has improved since the last inspection and, as a result, pupils’ achievement in the end of Year 1 phonics screening check has gradually risen.”

From an Ofsted report

Would the inspector in that case have been confident that the teaching of phonics had improved if the scores had not risen?

Assessment vs Accountability

The conclusion here is obvious, I think. Most of the assessment we do in school is intended to be used in two ways; formatively or summatively. We want to know what kids know so we can provide the right support for them to take the next step. And we want to know where that kid is, compared to some external standard or their peers.

Both of those have their place, of course. Effectively, we can think of these as tools for diagnosis. In some cases, literally that; I had a student whose written work varied greatly depending on where they sat. His writing was good, but words were spelt phonetically (or fonetically) if he was sat anywhere than the first two rows. It turned out he needed glasses for short-sightedness. The phonics screen is or was intended to flag up those students who might need extra support; further testing would then, I assume, suggest the reason for their difficulty and suggested routes for improvement.

If the scores are also being used as an accountability measure, then there is a pressure on teachers to minimise failure among their students. (This is not just seen in teaching; an example I’m familiar with is ambulance response times which I first read about in Dilnot and Blastland’s The Tiger That Isn’t, but issues have continued eg this from the Independent) Ideally, this would mean ensuring a high level of teaching and so high scores. But if a child has an unrecognised problem, it might not matter how well we teach them; they’re still going to struggle. It is only by the results telling us that – and in some cases, telling the parents reluctant to believe it – that we can help them find individual tactics which help.

And so teachers, reacting in a human way, sabotage the diagnosis of their students so as not to risk problems with accountability. Every time a HoD puts on revision classes, every time students were put in for resits because they were below a boundary, every time an ISA graph was handed back to a student with a post-it suggesting a ‘change’, every time their PSA mysteriously changed from an okay 4 to a full-marks 6, we did this. We may also have wanted the best for ‘our’ kids, even if they didn’t believe it! But think back to when league tables changed so BTecs weren’t accepted any more. Did the kids keep doing them or did it all change overnight?

And was that change for the kids?

Any testing which is high-stakes invites participants to try to influence results. It’s worth remembering that GCSE results are not just high-stakes for the students; they make a big difference to us as teachers, too! We are not neutral in this. We sometimes need to remember that.


With thanks to @oldandrewuk, @deevybee and @tom_hartley for the twitter discussion which informed and inspired this post. All arguments are mine, not theirs.

Advertisements