Assessment is something we do in all aspects of our lives. We do annual assessments on the roadworthiness of our cars, we have medical check-ups, we examine the suitability of candidates applying for jobs, we determine guilt or innocence in courts, we assess the impact of a piece of legislation, we judge the pros and cons of political parties before we cast our votes. For much of past history, in Europe, we even looked ahead to a ‘Final Judgment’ in which we would be assessed on how well we had lived our lives, an assessment whose result would determine our residence for all eternity in either heaven or hell.
In each case we (or God in the latter example) need to be clear why, what and how we are assessing. In some types of assessment the answers are straightforward. In others they can be complex and contested. Assessment in and of schools is very much in the latter category. Why and how it is done can have huge consequences for good or ill. This is above all true for the lives of those being assessed though, when a head teacher’s suicide following a negative school assessment launches a national debate about how assessments are reported and an education minister, for whom I had been an adviser, loses his job because of protests by teachers against the introduction of new tests, one sees the impact on the responsible adults as well [1].
Three main types of assessment
The three main types of assessment in education, differentiated by their purposes, are (1) diagnostic or formative (‘formative’ meaning ‘designed to inform the way something is learned’) (2) summative, meaning assessment at the end of a phase of learning designed to show what has been learned during that phase, and (3) evaluative. Both diagnostic and summative assessments can also be used to evaluate the effectiveness of the educational programme on which they have been focused. I will look at each of diagnostic and summative assessment in turn, focusing on their use in schools, drawing where appropriate on my experience over the years in a range of educational roles, largely in England but also in Scotland, France and Switzerland, and indeed on my own experience of being assessed.
I shall not be looking at evaluative assessments used to produce international league tables such as PISA. Their chief use is for political parties either to boost their reputations by showing how they have presided over an improved ranking or to attack their opponents when there has been a deterioration. Governments that need the excuse of a PISA ranking to start changing their educational arrangements have signally failed to make evaluative use of the kinds of assessments that they ought to have domestically and that will be my theme in this essay. PISA rankings have few direct implications for schools in a world in which, mercifully, education is one of the few areas where the nation state continues to reign supreme.
Diagnostic or formative assessment
In recent years secondary teachers in England have often been encouraged to give priority to curriculum and lesson planning over marking pupils’ work. I find this misguided. I can understand how not having properly thought through one’s teaching plan for part of the curriculum can leave one having to mark pupils’ work that fails to meet the learning objectives one should have been setting. But both good lesson planning and checking on pupils’ responses are needed. If one does not mark pupils’ work on a regular basis one fails to see both collective misunderstandings – a widespread failure among pupils to get across the meaning of a new concept, for example – and the extent to which each individual is keeping up with the work and may need individual help.
It is, of course, in part a question of resources. The more hours that teachers have to teach the fewer they will have for preparation and assessment. That is why allocation of scarce resources to classroom teachers – rather than to senior management, buildings and equipment – should always have top priority. IT programmes that give feedback to users on spelling, punctuation, factual accuracy and how to structure one’s thoughts may help, but are no substitute for a word or written comment from a teacher with whom one has a human relationship and who knows the stage one is at in one’s learning.
I was fortunate at school, in the subject – history - in which I went on to specialise, to have excellent teachers who not only aroused my interest in the past but pointed me towards a better understanding of the nature of the subject and how to write essays on historical issues. It was the almost complete absence of this when I went to university that was so striking. I was exposed at close hand to some of the most brilliant historians of the day, which was an education in itself, but as for guidance about my own assignments there was nothing.
It was with great satisfaction therefore, many years later, when I became a humanities tutor in Scotland for the UK’s new Open University, that I came across an educational institution with a clear view of what the tutor-student relationship ought to be like and how diagnostic assessment should be done. Many of the students were able adults who, because of family background or poor schooling, had failed to receive a proper education when younger and were coming back into learning for the first time, alongside full-time jobs and family responsibilities. Not all could come to tutorials and assignments were submitted by post. The instructions to tutors were to make detailed comments on all parts of their work, including the use of English, explain the criteria by which one had given a particular grade and point out explicitly and precisely what they might have done differently to obtain the highest grade. For a tutor it was time-consuming work, and not well-paid, but immensely satisfying to see over time the way in which it could lead to a transformation in the quality of a student’s thinking and writing. It was that experience that finally made me realise that individual diagnostic or formative assessment was by far and away the most important kind of assessment, the one most capable of having a lasting effect on the quality of a student’s learning.
When I was head of England’s national curriculum and assessment agencies our main assessment responsibility in relation to primary and lower secondary schools was to develop and administer compulsory national summative assessments at the end of the different phases of schooling, at ages 7, 11 and 14. Although we urged schools to ensure, by following the curriculum requirements, that pupils were prepared for these end of phase tests, our very clear message was that teaching with the next set of tests in mind was not their main function in the years leading up to them, not least because these tests were able only to sample the subject curricula on which they were based and because the results had no implications for individual pupils.
What was most important, we urged, was ongoing diagnostic assessment – informally through questioning pupils or through short tests and written exercises - across all the subjects of the national curriculum, not just the three that were subject to summative assessment. Our teams of subject experts were thus charged with providing guidance showing how diagnostic assessment might be used across all years and phases. The emphasis in this guidance, as the first and most essential step, was on clarifying the main aims and objectives within a subject which were being promoted through its programme of study. There was no point in undertaking assessments until one was clear what the outcomes of the study of this subject were intended to be at different stages in a child’s schooling and within the various units into which the curriculum was being structured.
Assessment, we were emphasising, must therefore follow curriculum and not the other way round. The results of diagnostic assessment would inform any support needed either by the class as a whole or by individual pupils when facing comparable work in the future. As part of the school’s ongoing evaluation of its curricular arrangements they might also help to shape how a particular unit of study would be taught on subsequent occasions.
Diagnostic tests of cognitive ability
One very distinctive form of diagnostic assessment which has been developed over the last forty years and is now widely used involves the administration of tests of cognitive ability across a whole year group within a school. Their purpose is to inform teachers of pupils’ levels of cognitive department at a particular point in time with a view to identifying under-achievement and providing a baseline against which pupil achievement at a later date can be compared.
One of these tests whose use I encouraged in a school where I was a governor was PIPS (Performance Indicators in Primary Schools) [2] This involves an individual assessment of each child at the beginning of the first year of schooling – at age 4 to 5 - and covering both cognitive skills and (through observation over a number of weeks) personal and social development. The results are compared at the end of the first year in order to see how much progress has been made, both individually and over the class or year group as a whole. This kind of assessment is probably what, more informally, the best teachers have always done, but like any large workforce teachers vary in their levels of effectiveness and systems like this, as long as they do not become bureaucratic, can be very helpful in improving a school’s performance.
Another such test is MidYis (Middle Years Information System), a test taken digitally by pupils aged 11 to 14 that assesses underlying ability and aptitude and in particular the four areas of mathematics, vocabulary, non-verbal reasoning, and skills such as speed and accuracy. This is an adaptive test in which the computer automatically adjusts the level of questions based on the pupil’s responses. The information provided enables teachers to identify strengths and weaknesses among their pupils, set targets for future attainment and adapt teaching to the range of needs identified.
Having baseline information of these kinds, across all of these systems, also allows the measurement of the value added by the school over a designated period of time, both to individuals and to whole cohorts of pupils. I will say more about this later.
Summative assessment
The main purpose of a summative assessment is to record some of the most important outcomes of learning at a particular point in time. The ones that people in my country England, for example, are familiar with are: at age 11, at the end of primary school, at which point when I was a child everyone was assessed to see whether they should go on to a grammar school or to a technical or ‘modern’ school; at 16, the traditional school leaving age, when the first public examinations are taken across a range of subjects; and at 18, when examinations assess the smaller number of subjects in which students have chosen to specialise.
The main function of the assessments at 11 and 18 has been traditionally to select. Currently at 11 selection - by a wholly separate set of tests - only takes place in the small number of English local authorities which still have grammar schools. The statutory assessment for all pupils at 11 in English, mathematics and science is a summative one but done for evaluative and formative purposes. It has no effect on the future of the individual pupil, but shows the primary school how well it has done with the cohort of pupils leaving the school and gives the secondary schools receiving these pupils a baseline assessment to take into account in giving them the appropriate levels of support and challenge they will need in the next phase of their schooling.
At age 18, in England as in most other European countries, the public examinations determine whether students go on to university, training or employment. As at the other ages, they also provide evaluative data on how successful a school has been, not just with individual students but also in the different subjects and across the school as a whole. This is particularly the case where, as in England, all these results are made public and can be turned into ‘league tables’ by the press and taken into account by parents choosing where to send their children.
Summative assessment: for and against
Summative assessment, especially of the kind that I have just been describing at 16 and 18, has its supporters and its opponents, with few people wanting to abolish it altogether but with deep disagreements – which have been smouldering for many decades – on how much of it one should have and what form it should take. The main argument for the traditional summative assessment, based on timed, written examinations taken by everyone in a country at the same time on the same day and assessed according to tightly defined criteria, is that it has high reliability, that is that it ensures that each candidate is treated in exactly the same way as any other. Its critics argue that it may have high reliability but that it has low validity, in the sense that a short timed examination can only sample a small part of a syllabus that has been studied over a number of years and that trying to assess the outcomes of such study in an artificial and stressful environment misses out on much of what the student might have learned.
The supporters of traditional summative assessment, of whom I am one, would also argue that a certain amount of stress is not a bad thing, that its negative consequences can be exaggerated, and that it is training for an adult life in which hard work, deadlines and concentrated effort are an essential component. Critics are more concerned about students’ mental health. England’s current Secretary of State for Education, for example, following a survey where many students reported that they found examinations ‘stressful’, has argued for reducing the time spent on them for the sake of young people’s ‘well-being’ [3].
Those who take this view, as well as reducing the length and demands of examinations, often favour greater weight being given to other forms of assessment such as coursework undertaken in school and assessed either by student’s teachers or by external examiners. This, they argue, makes the assessment more valid. Their opponents, such as myself, retort that this is all well and good, but that the main purpose of the examination is selection and that this is made less reliable and more unfair the more it becomes possible for what is being assessed to become someone else’s work (not least as a result of the increasing availability of AI). This is an important point as trust in the fairness of an assessment system is a crucial commodity, varies from one country and education system to another and over time, and is highly dependent on the extent to which assessments are perceived to be both reliable and valid, with a lack of reliability being what users fear most [4].
Support for summative assessment through timed examinations is not just confined to the minority of traditional –and often older – teachers who want to cling on to existing arrangements. When France’s minister for national education, François Fillon, decided in 2004 to introduce an element of assessed coursework into the French Baccalaureate lycée students went on strike and occupied their schools and teachers marched in the streets. Why did they not want some element of coursework? It was not, as far as I could see, because students did not trust their teachers who already, as employees of the state, mark their examination papers. It was because the idea was felt to undermine the supposedly egalitarian nature of the French education system. Students from privileged backgrounds, it was feared, would be able to get help from home that was denied to others and would be unfairly advantaged. I was working in Switzerland at the time, but also a member of France’s Haut Conseil de l’évaluation de l’école, an advisory body to M. Fillon, and so had a particular interest in the subject. Listening to endless debates about the subject over many weeks on my car radio as I drove around the different campuses of my school in Switzerland, I only once heard a student – and then very tentatively - say anything in support of the change.
The proposal was eventually withdrawn and Fillon, like most of his immediate predecessors in the role, soon fell (the worst job in the world for any politician who wants to change anything is to be French minister of education). This was the second education minister in ten years to whom I had been an adviser who had been forced to resign following teacher protests about tests. I can reassure readers that no causal connection can be established between my advisory role and these unfortunate events.
Critics of summative assessment via timed examinations are also sometimes keen to limit its role by developing assessment and reporting arrangements that embrace the whole range of students’ achievements, including all the competences that students develop during their schooling, not just in their academic work but in the wider life of the school and in their extra-curricular activities. During my time as head of England’s school assessment agencies in the 1990s we gave limited support to a major initiative of this kind promoting the use of what was called a National Record of Achievement. It failed to have a large take-up, partly perhaps because of the limitations of our support but mainly because employers often preferred to gather the information themselves via their own recruitment tools [5].
[1] The English secretary of state for education, for whom the author worked, resigned in 1994 following a teacher boycott of new national tests for 14-year-olds which the Government had introduced. France’s minister of national education did so too in 2004, as discussed later in this essay. The head teacher sadly took her own life in 2023 following an inspection report – which was about to be published - which labelled her school as ’Unsatisfactory’ as a result of administrative failings rather than the quality of the education it provided. This prompted a national outcry and debate about the suitability of such labels in inspection reports and led to changes in how reporting was conducted. BBC: https://www.bbc.co.uk/news/education-67639942 (retrieved 4 September 2025).
[2] PIPS was developed by the University of Durham Centre for Evaluation and Monitoring (CEM) and is now available from Cambridge Assessment, as is MidYis mentioned below. There is a similar CEM adaptive computer-base assessment for 14-16 year olds called YELLIS (Year Eleven Information System).
[3] GCSE exams could be cut back to reduce pupil stress. The Telegraph, 18 March 2025. https://www.telegraph.co.uk/news/2025/03/18/gcse-exams-could-be-cut-back-to-reduce-pupil-stress/ (retrieved 5 September 2025).
[4] Nicholas Tate, Maintaining Trust in Public Assessment Systems: An International Perspective. Cambridge Assessment, 2005
[5] Employers’ Use of the National Record of Achievement, 1997. https://www.employment-studies.co.uk/report-summaries/report-summary-employers%E2%80%99-use-national-record-achievement (retrieved 5 September 2025).