Delphi Study on Standardized Systems to Monitor Student Learning Outcomes in Flanders: Mechanisms for Building Trust and/or Control?

Maarten Penninckx, Amy Quintelier, Jan Vanhoof, Sven De Maeyer, Peter Van Petegem


Several countries have implemented monitoring systems where students need to take standardized tests at regular intervals. These tests may serve either a development-oriented goal that supports public trust in schools, or a more accountability-oriented perspective to increase control. Currently, the Flemish education system has no standardized testing. The idea of implementing a monitoring system is highly contentious. By means of a Delphi study with policy makers, education specialists, school governors, principals, teachers, and a student representative (n=24), we identified the characteristics of a monitoring system that would be accepted by different stakeholders. Based on these characteristics, we proposed eight scenarios for future policy development. Next, the desirability of these scenarios was assessed by each respondent. The results show that in order to gain broad social support, a focus on strengthening trust is preferred over a focus on control through such measures as avoiding the public availability of test results. In addition, other key results for the development and implementation of a system to monitor student learning outcomes are discussed.

Klíčová slova

standardized tests; Delphi study; learning outcomes; learning progress; added value; policy scenarios; monitoring system

Celý článek:

PDF (English)


Zobrazit literaturu Skrýt literaturu

Adler, M., & Ziglio, E. (1996). Gazing into the oracle: The Delphi method and its application to social policy and public health. London: Jessica Kingsley Publishers.

Allal, L. (2013). Teachers' professional judgement in assessment: A cognitive act and a socially situated practice. Assessment in Education: Principles, Policy & Practice, 20(1), 20–34.

Amrein-Beardsley, A., Berliner, D. C., & Rideau, S. (2010). Cheating in the first, second, and third degree: Educators' responses to high-stakes testing. Educational Policy Analysis Archives, 18(14), 1–36. | DOI 10.14507/epaa.v18n14.2010

Au, W. (2007). High-stakes testing and curricular control: A qualitative metasynthesis. Educational Researcher, 36(5), 258–267. | DOI 10.3102/0013189X07306523

Baird, J., Ahmed, A., Hopfenbeck, T., Brown, C., & Elliott, V. (2013). Research evidence relating to proposals for reform of the GCSE. Oxford, UK: Oxford University Centre for Educational Assessment.

Beaver, J. K., & Weinbaum, E. H. (2015). State test data and school improvement efforts. Educational Policy, 29(3), 478–503. | DOI 10.1177/0895904813510774

Bishop, J. (1998). The effect of curriculum-based external exit exams on student achievement. Journal of Economic Education, 29(2), 172–182. | DOI 10.1080/00220489809597951

Black, P., Harrison, C., Hodgen, J., Marshall, B., & Serret, N. (2011). Can teachers' summative assessments produce dependable results and also enhance classroom learning? Assessment in Education: Principles, Policy & Practice, 18(4), 451–469. | DOI 10.1080/0969594X.2011.557020

Brennan, R., Kim, J., Wenz-Gross, M., & Siperstein, G. (2001). The relative equitability of high-stakes testing versus teacher-assigned grades: An analysis of the Massachusetts Comprehensive Assessment System (MCAS). Harvard Educational Review, 71(2), 173–217. | DOI 10.17763/haer.71.2.v51n6503372t4578

Burgess, S., Propper, C., & Wilson, D. (2002). Does performance monitoring work? A review of evidence from the UK public sector, excluding health care CMPO Working Paper Series.

Carnoy, M., & Loeb, S. (2002). Does external accountability affect student outcomes? A cross-state analysis. Educational Evaluation and Policy Analysis, 24(4), 305–331. | DOI 10.3102/01623737024004305

Chong, H., Adnan, H., & Zin, R. M. (2012). A feasible means of methodological advance from Delphi methods: A case study. International Journal of Academic Research, 4(2), 247–253.

Cizek, G. J. (2005). High-stakes testing: Contexts, characteristics, critiques, and consequences. In R. P. Phelps (Ed.), Defending standardised testing (pp. 23-54). London: Lawrence Erlbaum Associates Publishers.

Cobb, F., & Russell, N. M. (2014). Meritocracy or complexity: Problematizing racial disparities in mathematics assessment within the context of curricular structures, practices, and discourse. Journal of Education Policy, 30(5), 631–649. | DOI 10.1080/02680939.2014.983551

Collins, S., Reiss, M., & Stobart, G. (2010). What happens when high-stakes testing stops? Teachers' perceptions of the impact of compulsory national testing in science of 11-year olds in England and its abolition in Wales. Assessment in Education: Principles, Policy & Practice, 17(3), 273–286. | DOI 10.1080/0969594X.2010.496205

Dalkey, N. C., & Helmer, O. (1963). An experimental application of the Delphi method to the use of experts. Management Science, 9(3), 458–467. | DOI 10.1287/mnsc.9.3.458

Day, J., & Bobeva, M. (2005). A generic toolkit for the successful management of Delphi studies. Electronic Journal of Business Research Methods, 3(2), 103–116.

De Lange, M., & Dronkers, J. (2007). Hoe gelijkwaardig blijft het eindexamen tussen scholen? Discrepanties tussen de cijfers voor het schoolonderzoek en het centraal examen in het voortgezet onderwijs tussen 1998 en 2005. Nijmegen: Netherlands.

Elstad, E. (2009). Schools which are named, shamed and blamed by the media: School accountability in Norway. Educational Assessment Evaluation and Accountability, 21(2), 173–189. | DOI 10.1007/s11092-009-9076-0

Gipps, C., & Stobart, G. (2009). Fairness in assessment. In C. Wyatt-Smith & J. J. Cumming (Eds.), Educational assessment in the 21st century: Connecting theory and practice (pp. 105–118). London: Springer.

Goodman, D., & Hambleton, R. K. (2005). Some misconceptions about large-scale educational assessments. In R. P. Phelps (Ed.), Defending standardized testing (pp. 91–110). Mahaj, NJ/London: Lawrence Erlbaum Associates Publishers.

Haertel, E. H. (1999). Validity arguments for high-stakes testing: In search of the evidence. Educational Measurement: Issues and Practice, 18(4), 5–9. | DOI 10.1111/j.1745-3992.1999.tb00276.x

Haney, W. (2000). The myth of the Texas miracle in education. Education Analysis Policy Archives, 8(41), 1–20. | DOI 10.14507/epaa.v8n41.2000

Horn, C. (2005). Standardised assessments and the flow of students into the college admission pool. Educational Policy, 19(2), 331–348. | DOI 10.1177/0895904804274057

Hoxby, C. M. (2003). School choice and school competition: Evidence from the United States. Swedish Economic Policy Review, 10(1), 11–67.

Hsu, C., & Sandford, B. A. (2007). The Delphi technique: Making sense of consensus. Practical Assessment, Research & Evaluation, 12(10), 1–8.

Janssens, F. J. G., Rekers-Mombarg, L., & Lacor, E. (2014). Leerwinst en toegevoegde waarde in het primair onderwijs. Den Haag: Ministerie van Onderwijs, Cultuur en Wetenschap; Inspectie van het Onderwijs; Rijksniversiteit Groningen; CED Groep; Universiteit Twente; CITO.

Jürges, H., Büchel, F., & Schneider, K. (2005). The effect of central exit examinations on student achievement: Quasi-experimental evidence from TIMSS Germany. Journal of the European Economic Association, 3(5), 1134–1155. | DOI 10.1162/1542476054729400

Jürges, H., Richter, W. F., & Schneider, K. (2005). Teacher quality and incentives: Theoretical and empirical effects of standards on teacher quality. FinanzArchiv / Public Finance Analysis, 61(3), 298–326. | DOI 10.1628/001522105774978985

Jürges, H., & Schneider, K. (2010). Central exit examinations increase performance... but take the fun out of mathematics. Journal of Population Economics, 23(2), 497–517. | DOI 10.1007/s00148-008-0234-3

Jürges, H., Schneider, K., Senkbeil, M., & Carstensen, C. H. (2009). Assessment drives learning: The effect of central exit exams on curricular knowledge and mathematical literacy. Economics of Education Review, 31(1), 56–65. | DOI 10.1016/j.econedurev.2011.08.007

Karsten, S., Visscher, A., & De Jong, T. (2001). Another side to the coin: The unintended effects of the publication of school performance data in England and France. Comparative Education, 37(2), 231–242. | DOI 10.1080/03050060120043439

Keeves, J. P., Hungi, N., & Afrassa, T. (2005). Measuring value added effects across schools: Should schools be compared in performance? Studies in Educational Evaluation, 31(2-3), 247–266.

Klein, E. D., & Van Ackeren, I. (2012). Challenges and problems for research in the field of statewide exams. A stock taking of differing procedures and standardization levels. Studies in Educational Evaluation, 37(4), 180–188. | DOI 10.1016/j.stueduc.2012.01.002

Klein, S. P., Hamilton, L. S., McCaffrey, D. F., & Stecher, B. M. (2000). What do test scores in Texas tell us? Educational Policy Analysis Archives, 8(49), 1–22.

Klenowski, V. (2014). Towards fairer assessment. Australian Education Research, 41(4), 445–470. | DOI 10.1007/s13384-013-0132-x

Linstone, H. A., & Turoff, M. (1975). The Delphi method. Techniques and applications. London, Amsterdam, Ontario, Sydney, Tokyo: Addison-Wesley Publishing Company.

Loeb, S. (2013). How can value-added measures be used for teacher improvement? In Carnegie Knowledge Network (Ed.), What we know series: Value-added methods and applications. Stanford, CA: Carnegie Knowledge Network.

Marlow, R., Norwich, B., Ukoumunne, O. C., Hansford, L., Sharkey, S., & Ford, T. (2014). A comparison of teacher assessment (APP) with standardised tests in primary literacy and numeracy. Assessment in Education: Principles, Policy & Practice, 21(4), 412–426. | DOI 10.1080/0969594X.2014.936358

Mietzner, D., & Reger, G. (2005). Advantages and disadvantages of scenario approaches for strategic foresight. International Journal of Technology Intelligence and Planning, 1(2), 220–239. | DOI 10.1504/IJTIP.2005.006516

Neumann, M., Trautwein, U., & Nagy, G. (2011). Do central examinations lead to greater grading comparability? A study of frame-of-reference effects on the university entrance qualification in Germany. Studies in Educational Evaluation, 37(4), 206–217. | DOI 10.1016/j.stueduc.2012.02.002

OECD. (2013). Synergies for better learning. An international perspective on evaluation and assessment. Paris: OECD Publishing.

Okoli, C., & Pawlowski, S. D. (2004). The Delphi method as a research tool: An example, design considerations and applications. Information & Management, 42(1), 15–29. | DOI 10.1016/

Onderwijsinspectie. (2013). Onderwijsspiegel 2013 [Education mirror 2013]. Brussel: Onderwijsinspectie / Vlaams Ministerie van Onderwijs en Vorming.

Perryman, J., Ball, S., Maguire, M., & Braun, A. (2011). Life in the pressure cooker – school league tables and English and mathematics teachers' responses to accountability in a results-driven era. British Journal of Educational Studies, 59(2), 179–195. | DOI 10.1080/00071005.2011.578568

Phelps, R. P. (Ed.) (2005). Defending standardized testing. London: Lawrence Erlbaum Associates Publishers.

Popham, W. J. (2005). Can growth ever be beside the point? Educational Leadership, 63(3), 83–84.

Ramsteck, C., Muslic, B., Graf, T., Maier, U., & Kuper, H. (2015). Data-based school improvement: The role of principals and school supervisory authorities within the context of low-stakes mandatory proficiency testing in four German states. International Journal of Educational Management, 29(6), 766–789.

Ritchie, J., & Spencer, L. (1993). Qualitative data analysis for applied policy research. In A. Bryman & R. Burgess (Eds.), Analysing qualitative data (pp. 173–194). London: Routledge.

Sahlberg, P. (2011). Finnish lessons: What can the world learn from educational change in Finland? New York: Teachers College Press.

Saunders, L. (1999). A brief history of educational 'value added': How did we get to where we are? School Effectiveness and School Improvement, 10(2), 233–256.

Saunders, L. (2000). Understanding schools' use of 'value added' data: The psychology and sociology of numbers. Research Papers in Education, 15(3), 241–258. | DOI 10.1080/02671520050128740

Schildkamp, K., Rekers-Mombarg, L., & Harms, T. J. (2012). Student group differences in examination results and utilization for policy and school development. School Effectiveness and School Improvement, 23(2), 229–255.

Schwartz, P. (1991). The art of the long view. London: Century Business.

Segool, N. K., Carlson, J. S., Goforth, A. N., von der Embse, N., & Barterian, J. A. (2013). Heightened test anxiety among young children: Elementary school students' anxious responses to high-stakes testing. Psychology in the Schools, 50(5), 489–499. | DOI 10.1002/pits.21689

Shepard, L. A. (1992). Will standardised test improve student learning? Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing.

Shewbridge, C., Hulshof, M., Nusche, D., & Stoll, L. (2011). School evaluation in the Flemish community of Belgium. In OECD (Ed.), OECD reviews of evaluation and assessment in education. Paris: OECD Publishing.

Sireci, S. G. (2005). The most frequently unasked questions about testing. In R. P. Phelps (Ed.), Defending standardized testing (pp. 111–121). Mahaj, NJ/London: Lawrence Erlbaum Associates Publishers.

Smagorinsky, P., Lakly, A., & Johnson, T. S. (2002). Acquiescence, accommodation, and resistance in learning to teach within a prescribed curriculum. English Education, 34(3), 187–213.

Strauss, A., & Corbin, J. (1990). Basics of qualitative research: Grounded theory procedures and techniques. Newbury Park, CA: SAGE.

Tymms, P. (1997). Responses of headteachers to value-added and the impact of feedback. London: School Curriculum and Assessment Authority.

Van Ackeren, I., Block, R., Klein, E. D., & Kühn, S. M. (2012). The impact of statewide exit exams: A descriptive case study of three German states with differing low stakes exam regimes. Education Policy Analysis Archives, 20(8), 1–32.

Vanhoof, J., Van Petegem, P., Verhoeven, J., & Buvens, I. (2009). Linking the policymaking capacities of schools and the quality of school self-evaluations: The view of school leaders. Educational Management Administration & Leadership, 37(5), 667–686. | DOI 10.1177/1741143209339653

Vanhoof, J., & Van Petegem, P. (2007). Matching internal and external evaluation in an era of accountability and school development: lessons from a Flemish perspective. Studies in Educational Evaluation, 33(2), 101–119. | DOI 10.1016/j.stueduc.2007.04.001

Wang, L., Beckett, G. H., & Brown, L. (2006). Controversies of standardized assessment in school accountability reform: A critical synthesis of multidisciplinary research evidence. Applied Measurement in Education, 19(4), 305–328. | DOI 10.1207/s15324818ame1904_5

Wikeley, F. (1998). Dissemination of research as a tool for school improvement? School Leadership & Management, 18(1), 59–73. | DOI 10.1080/13632439869772

Wiliam, D. (2010). Standardised testing and school accountability. Educational Psychologist, 45(2), 107–122. | DOI 10.1080/00461521003703060

Wössmann, L. (2005). The effect heterogeneity of central examinations: Evidence from TIMSS, TIMSS-Repeat and PISA. Education Economics, 13(2), 143–169. | DOI 10.1080/09645290500031165


Časopis Ústavu pedagogických věd FF MU.

Výkonná redakce: Klára Šeďová, Roman Švaříček, Zuzana Šalamounová, Martin Sedláček, Karla Brücknerová, Petr Hlaďo.

Redakční rada: Milan Pol (předseda redakční rady), Gunnar Berg, Michael Bottery, Hana Cervinkova, Theo van Dellen, Eve Eisenschmidt, Peter Gavora, Yin Cheong Cheng, Miloš Kučera, Adam Lefstein, Sami Lehesvuori, Jan Mareš, Jiří Mareš, Jiří Němec, Angelika Paseka, Jana Poláchová Vašťatková, Milada Rabušicová, Alina Reznitskaya, Michael Schratz, Martin Strouhal, Petr Svojanovský, António Teodoro, Tony Townsend, Anita Trnavčevič, Jan Vanhoof, Arnošt Veselý, Kateřina Vlčková, Eliška Walterová.

Časopis vydává čtyři čísla ročně.

ISSN 1803-7437 (print), ISSN 2336-4521 (online)