Performance Related Pay as an ISA

I’ve just been reading that the government (in the form of the Education Select Committee) is recommending a return to the idea of performance-related pay for teachers. Now, this is interesting, to say the least – and more than a little political. Because, of course we all know how well a bonus-led culture worked in banking. So I’m going to sublimate my anger and approach this from a scientific point of view. Not just by looking at the data, but by treating it like a GCSE science problem in experimental design.

Background Research

You can find news reports at the Guardian and the Telegraph, among others. It might be an ineresting Politics/Media lesson to compare the reporting of this story in different publications, perhaps? The news stories I’ve seen completely fail to mention that this will presumably only apply to schools governed by national agreements, so academies and free schools may not even care. I’m still checking out research (the actual data that governments like to claim backs up their case) but this from the famous Ted Wragg is interesting.

Confounding Factors

It’s not that long ago that the government stopped collecting what we call ‘contextual value-added‘ data – where the students’ circumstances, social background etc are taken into account. So if we don’t know about all of these things, how can we account for them? An abvious example is that in some schools and areas it’s much more likely that students will access a tutor. And what about kids whose parents help them out, talk them through homework, share study techniques? Who’s responsible for any improvement?

Subjects overlap too. If I teach a student who’s doing badly in Maths, and this affects their Physics scores, who gets the blame? I’m imagining wars between Maths and Science, between English and Humanities, as teachers accuse each other of causing them problems. Not a pretty image. How are we supposed to work together when we’re also competing? Nobody wants to be at the bottom. Will teachers in one department stop sharing resources with each other?

Measuring the Dependant Variable

Is this going to be based solely on exam results? What about subjects which don’t do an external exam, such as PSHE? The equality or otherwise of subjects is always a huge issue, especially when different types of qualifications are considered. Will it apply to all key stages – what about teachers who only or mainly teach at Key Stage 3, for example?

What happens if one class does ‘well’ (although I’m still not sure how we’ll be able to tell) and another doesn’t? What about when a class is shared between two or more teachers? Or when a teacher is ill or on maternity leave? Do good A-level results matter more or less than good GCSEs? Should absolute scores or percentages matter? For example, if I have 14 students at A2 Physics, 7 of whom achieve an A grade, is this better or worse than, say, Spanish, who have 4 students and 3 A grades?


Many courses rely to at least some extent on teacher-assessed work. Will the existing pressure on teachers to give students the ‘best possible chance’ be increased? Should only externally-assessed work be used for the judgements? In theory this could lead to ethical teachers being penalised when those colleagues who are more ‘supportive’ – and yes, that was sarcastic – benefit personally from the better results of their students.

What about those students who happen to be taught by their Head of Year? How will their level of support vary compared to others? Or the students mentored by members of SMT, who so often seem to get extra chances or have the rules ‘stretched’ for them? Teaching the children of other staff members may suddenlt be a bigger perk than before.

And who chooses which teachers get the more promising students? It’s already true in many schools that timetabling causes problems when particular teachers are perceived to get ‘easier’ classes. Sometimes this is unavoidable – imagine two A-level Physics classes, who due to timetabling are split depending on whether they aso study Further Maths. I know which one I’d rather have.


It’s so easy to forget with the rhetoric from politicians, but at a school level the sample sizes are small. Too small, really, for any such judgements to be made on a class by class basis. If we drew error bars on the results to account for the confounding factors – many of which we don’t know about, let alone have the ability to control – they would be huge. Yes, we can look at the effects of various interventions on students, and many of us are trying to use this data (see the fantastic work by Geoff Petty for example, the What Works Clearing House, and Dr Mark Evans’ Teachitso website). Linking research to educators working in the classroom is surprisingly difficult, though see #SciTeachJC for one such effort.

But the useful data comes from large studies, reviews of many classrooms and many teachers. If I have a class of twenty-five (chance would be a fine thing) then every child’s results make up 4% of the total. How many students in the average classroom will lose a relative during exam season? How many will have health problems? You don’t need many to affect the class results hugely, and these factors are unpredictable. Like decaying atoms, we can measure how many of these events will happen – probably with high accuracy – in any particular cohort. But in any one class it will vary hugely.


Our results aren’t even very detailed. Grade boundaries change, and we can often break it down into more detail than to an A or a B. Will it matter if students meet a decimalised target, or does just the grade matter? How many subjects will we need to look at? If it’s just about meeting a boundary, those who get over it will be ignored even more than we’ve already seen with the wonderfully-named ‘C-chasing’ strategy.


Sadly, it seems to me that performance related pay fails the test according to what we teach our students. It seems a shame that the MPs haven’t done an ISA recently…


A New Exam Board?

We’ve seen a lot of problems with exams recently – just look at the problems last summer with mistakes in a wide range of exam papers. Today I’ve found that AQA have spent so little time checking that suitable research sources are online that the only good Google results are their own teacher notes, and a primary science investigative cartoon. On top of this, a new specification inevitably means a lack of practice material which means students and teachers don’t really know what to expect.  If you have to explain why this is unfair to non-teachers, perhaps this analogy might help; we wouldn’t expect to have a driving test on the road having only practised in car parks, would we?

I have an idea.

In fact, I have two ideas, neither of which is mine. If we take the ‘backward design’ principle (originated by Wiggins and McTighe, introduced to me by Robin Millar’s work) and combine it with a ‘curated crowdsourced’ model, maybe there’s a way to do a better job. 

Backward Design

My apologies to Robin and other experts if I miss the subtleties – I’m just a classroom teacher with delusions of writing grandeur. Instead of beginning a syllabus with the content that we want to teach, backward design asks what we want students to be able to do at the end – how will they be tested? How will we know if the course was successful or not (or more precisely, how successfully the student has completed it)? If we create assessment tasks that will allow us to differentiate between students – ideally including, but not limited to written exams – then we can develop a list of what students should learn, which gives us a list of possible learning/teaching activities. As Robin and others point out, ‘teaching to the test’ is only a problem if the test is not fit for purpose. If we produce a realistic, useful test then being prepared for it is a positive thing. 


So who better to contribute possible questions than teachers? Imagine a Google form set up by a new exam board; let’s call it CCEB. Exemplar material, based on accepted good practice, shows how to lay out mathematical working. Questions are entered, with a markscheme. Dropdown boxes allow those entering the question to define marks available, and from key words describing the area(s) of science being tested. Active teachers, retired staff, academics – even students – all can contribute. The contributions are freely given on the basis that the results will be freely available as far as practical, probably via Creative Commons licensing.


When a certain threshold is reached – which if every science teacher in the UK supplies a single question, won’t take long – the submissions are sorted by category and checked by CCEB staff. Because they are being proofread rather than written, it will be quicker and easier. If you have some of the original contributors – determined by random allocation – paid for a day’s work, they can be pre-moderated as well. Mathematical questions can be kept in the same form but with different numbers substituted. A large pool of questions is now complete, ready for the exam, which can be balanced between topics. There will be enough questions, all produced at the same time, for several specimen papers to be made available. With a large enough pool, you could even make all the questions open source, like those for the theory element of the UK driving test.

One Day 

It’s feasible that in the future, with enough questions available, every student could get a different but equivalent exam, as described in John Barnes’ book Orbital Resonance.

In the meantime, maybe we as science educators can get involved with setting better exams than the ones we complain about. The exam boards could ask for submissions in this way now. The cynic in me thinks that this would make it much harder for them to justify their existence. Maybe they would like to prove me wrong.