It can be challenging to build an assessment that authentically measures the depth and breadth of student learning and focuses on important competencies rather than isolated decontextualized academic content knowledge. It is especially difficult in an environment where tests must be administered in an hour and immediately scored.
Summative end-of-year assessments and traditionally administered interim assessments are unlikely to be appropriate for this purpose. More and more districts are adopting small “a” accountability systems that promote deeper instruction and assessment among their educators. Districts hold themselves and their educators accountable for students’ performance on these assessments, which are not typically part of the federal accountability system. Students may be assessed on these competencies using portfolios, defenses of learning, projects, performance tasks, or other methods. Ideally, the assessment is a culminating artifact from learning that is important to the student, rather than an isolated event. These assessment programs have considerable potential for making learning relevant for students, parents, and local communities.
Lessons Learned
HumRRO conducted a comprehensive evaluation of an early iteration of competency-based learning and continues to evaluate locally created assessment systems. Here are four lessons we’ve learned:
Lesson #1: Competencies are challenging to define and measure.
The first, and one of the most difficult, questions people ask when first hearing about competency-based education is, “What is a competency?” Like many concepts in education, there is considerable variability in the definition of “competency.” At their core, competencies represent the important things that students should be able to do that have relevance across academic content areas. A couple of common competencies in several systems are “effective communication” and “critical thinking.” These are intentionally broad terms, and they allow students to demonstrate their mastery of those competencies in a wide variety of ways. However, that breadth and accessibility make the competencies very difficult to standardize sufficiently to effectively measure. Standardization, however, is not the goal.
Competencies must be demonstrated within a context. If the context is narrowly limited to allow for a standardized assessment, we are likely to undermine the very competencies we intend to measure. For example, if we want to measure “critical thinking” in mathematics, we might create a series of multiple-choice items that require students to make a series of decisions before coming to a final answer to a problem. In doing so, we may provide the “critical thinking” by the way we construct the series of items.
Critical thinking in mathematics often involves breaking down a complex problem into simpler, solvable steps, organizing those steps in a way that logically leads to a conclusion, resolving errors or missing information discovered when implementing the simple steps, drawing a conclusion, and checking the reasonableness of that conclusion. If we construct a series of test items, we effectively remove several of these “critical thinking” components.
When competencies work best, they are individualized for students, represent work that is important to students, and give students agency for their own work. That means groups of students will be unlikely to do standardized tasks, leaving us to rely on educators to fairly and reliably judge student work across contexts. When those judgments lead to inferences about students (e.g., should they be promoted, graduate, receive a high or low grade), those inferences must be thoroughly validated. Reliability and validity must be approached differently for these assessment systems than for summative end-of-year assessments, but they are no less important.
Lesson #2: Assessment literacy for educators is vital to the success of local assessment.
Consider the competency, “effective communication.” Now, imagine two students, equally effective at communicating, where one creates a complex data visualization to communicate a challenging mathematical concept, and the other composes a piece of music that effectively communicates a mood or emotion. Both demonstrate the competency, but how would one go about rating or measuring their communication proficiency on a common metric? Each student effectively demonstrated the competency in their own way and should be recognized for having done so.
The challenge for educators is finding a way to rate complex mathematical data visualization and musical composition in ways that are accurate for students. Allowing this level of flexibility for students requires that educators be able to judge competencies across a wide range of contexts—and that they recognize when assistance from other educators, community members, or external experts becomes necessary. Imagine a student who is an effective communicator in a language the teacher does not understand. Reliance on teachers’ expertise and judgment, as well as their willingness to seek outside help, is vital to the success of these highly individualized assessment systems.
Teachers must become adept at making reliable and accurate judgments about student work across contexts. This may require cross-validation of scores (where multiple educators score the same student artifacts and compare scores). It may require considerable professional development, both to effectively assess student work and to facilitate the creation of highly personalized and complex student artifacts. Evaluations of these kinds of systems must address educators’ ability to use varied student artifacts to create useful student assessment data.
Lesson #3: Assessment for learning represents a new lens for assessment evaluation.
Typical psychometric evaluations are inappropriate for these types of competency-based and/or locally developed systems. The work of students and educators during the creation of the artifacts that provide evidence of student competencies is also the curriculum. The assessment component of these systems is so interwoven with instruction and curriculum that attempts to isolate assessment as a separate construct will likely yield spurious conclusions.
For example, a high school might require a “defense of learning” as one condition for graduation. The defenses are rated pass/fail and nearly all students pass each year. From a psychometric standpoint, the measure would be considered largely inconsequential—essentially everyone passes and there is no information beyond that to be gleaned from the ratings. However, because the defense is an assessment for learning, it would be more important for an evaluator to examine the content of the defenses, the competencies they address, the time and effort students dedicate to them, the process by which a student moves from planning the defense topic to researching to collecting information and organizing it into a presentation, reviewing and revising that presentation ahead of the formal defense, and what happens to the artifact (the defense presentation) afterwards. These steps are much more difficult areas to evaluate than tests that yield scale scores and performance classifications, but they are better aligned with the performance expectations students are likely to experience after high school.
Lesson #4: Competency-based assessments require a fundamental shift in educational practice.
In Lesson #2, we addressed educators’ assessment literacy, but appropriate use of these kinds of assessments requires a very different approach to teaching and learning than most educators have experienced. The volume of work by educators per student expands with personalized learning, so very large classes may be challenging to manage. The students may move into tangents beyond the teacher’s personal expertise.
Providing tailored guidance for each student during their learning process, engaging outside expertise when necessary, and helping students find their own answers, are all labor-intensive for educators. But, if competency-based local assessment systems are effective, the instruction and curriculum supporting them must be implemented with fidelity. Implementing such a system requires extensive buy-in, professional development, and willingness to do things differently than before. And, because it is difficult, these changes are unlikely to happen evenly among teachers within a school or district, and the benefits of such a system may not be immediately realized on traditional measures. Students’ standardized test scores may not substantially improve, even after a few years of attempting such a shift in practice. This level of effort is difficult to sustain, so fidelity of implementation and participants’ acceptance of, and commitment to, the system must be addressed in evaluations.
This is the third in a three-part blog series highlighting HumRRO’s experience evaluating state K-12 assessment systems and exploring some of our early lessons learned. The first installment focused on interim assessments and the second on diagnostic/growth assessments.