Proposal view
Proposal Type: Individual Paper 
Domain: Assessment and Evaluation 
SIG: Assessment and Evaluation 
Type Submitted Paper 
Equipment PC and projector
Paper Details
Title Competency profiles from standard assessments
Abstract

Recently for Germany it was decided to set up educational standards for several subject areas and several degrees of graduation with the goal to improve student performance. These standards define educational goals in terms of competencies for each subject area and three different levels of graduation. A newly developed standard test for Mathematics grade ten is analysed, in this test the standards are organized in “big ideas” and “content related competencies”.



The present contribution addresses the question whether results can be reported simultaneously on five big ideas and six content related competencies of the test. The data to be analysed was collected on a second day of testing following the PISA 2006 assessment in Germany. Using Multidimensional IRT modelling for large scale assessment data, some alternatives of student competency profiles are modelled with different complexity. The paper addresses the question what differential information we get from complex competency profiles, how reliable results can be reported and how complex can profiles be reported with sufficient reliability.

Summary

As a consequence of the mediocre German results in international large scale assessments like TIMSS, PISA or PIRLS for Germany it was decided to set up educational standards for several subject areas and several degrees of graduation with the goal to improve student performances. These standards define educational goals in terms of competencies for each subject area and level of graduation. For the “medium” graduation level in Mathematics (mittlerer Schulabschluss, usually obtained after grade ten) the standards are organized in “big ideas” and “content related competencies”.


With the PISA 2006 assessment in Germany a second day of testing was introduced to assess standard related competencies from grade 9 students in Mathematics (N = 12000). The test was developed according to the standards and comprises 313 items altogether. Each item was constructed to assess one of five big ideas and one or more of six content related competencies. In addition, the items were constructed according to three different levels of difficulty. Each student took two hours of testing time and worked on an average of 70 items, the test consisted of 29 booklets in order to balance the items presented to each student by big ideas, competencies and difficulty and to control for item position effects in the booklets.


The question addressed in this paper is whether and how results can be reported on the big ideas as well as on the content related competencies for each student or for relevant groups of students. More specifically it is asked how to model competency profiles, using item response theory models suitable for large scale assessment studies. What is the differential information we get from complex competency profiles, how reliable are the reported results or how complex can profiles be reported with sufficient reliability?


The approach used to model the profiles utilizes multidimensional IRT models for large scale assessment studies. These models rely on the item responses and a population model on the joint distribution of all competencies and other student characteristics like gender, socio-economic status and parental support for example.


Differentiating the five big ideas requires a five dimensional response model where different items are modelled to assess each dimension. A model on the six content related competencies assign some items to assess different competencies at the same time. A complete model on both aspects, big ideas and content related topics had to define five by six dimensions for each combination of big ideas an competencies. However with about 70 items for each student values on 30 dimensions cannot be very reliable. Alternative models with reduced complexity are defined and their fit to the data are compared. For example, assuming that no differential information from the combinations of big ideas and competencies will be obtained, a model on eleven dimensions, one for each big idea and each content related competency may be specified. Other alternatives are to assume that some of these combinations do in fact show differences between students whereas other combinations don’t. For example, within the big idea “stochastical data” a further differentiation may not be supported by the data. Within the big idea “Algebra” the competencies that show differences between students are not the same as within the big idea “space and shape”. The paper presents results on empirical fit of alternative models as well as the differences these models uncover between students and groups of students and the reliabilities of these differences.



The implications for policy are quite obvious: From a more general perspective, the level of detail to that standard test results can be reported is crucial for the impact on the process of quality improvement of the educational system. From a more assessment focused perspective the approach presented here is used to evaluate the standard test developed for mathematics in the sense that it analyses the discriminative power of the instrument. Both perspectives apply as well to testing standards in other content areas like science, reading or a foreign language.


Keywords Assessment of competence
Item response theory (IRT)
Large-scale national assessment projects
Appendices
Authors
Name Surname Institution Country e-mail EARLI Number Presenting
Claus H. Carstensen IPN Kiel Germany carstensen@ipn.uni-kiel.de   *  
Andreas Frey IPN Kiel Germany frey@ipn.uni-kiel.de    
Visit NQcontent
© European Association for Research on Learning and Instruction, 2012 All rights reserved.