HumRRO scientists David Dorsey, Ph.D., Hillary Michaels, Ph.D., and Steve Ferrara, Ph.D., recently published a chapter that tackles an increasingly important issue: establishing a “Validity Argument Roadmap for Automated Scoring.” This chapter is one among many excellent contributions to the new volume, The Routledge International Handbook of Automated Essay Evaluation, edited by Mark D. Shermis and Joshua Wilson.
Given the rise of automated scoring solutions—many if not most now using some form of Artificial Intelligence (AI)—the chapter offers a practical and overarching perspective on constructing validity arguments that clarify how a given AI-enabled solution does what it was intended to do. While focused on automated scoring of tests and assessments, such as essays, the authors see the roadmap and approach as well-suited to informing other areas of validity research involving the use of AI-enabled tools, such as oral communication and reasoning, across a variety of contexts, including education and the workforce.
The Routledge International Handbook of Automated Essay Evaluation (AEE) is a definitive guide for those working at the intersection of automation, AI, and education. The volume encapsulates the ongoing advancement of AEE, reflecting its application in both large-scale and classroom-based assessments to support teaching and learning endeavors.
In the chapter, the authors cover a range of critical issues, including:
- Relevant professional standards and other governing principles and guidelines applicable to validity arguments.
- Current principles and practices in AI, as they relate to validity arguments. These include how AI scoring engines are optimized for specific scoring tasks, the meaning of “fairness” and its place in validity arguments, and important changes in the AI landscape such as explainable and responsible (or principled) AI.
- The call for more principled approaches to constructing validity technical reports.
- Ongoing challenges and areas for future research, including the role of audits and external evaluations, the pace of change and increased complexity in AI, and challenges around fairness and diversity.
Peer reviewers of the chapter offered comments such as the following:
“I believe that it [the chapter] will make an excellent contribution for providing validity information for users of automated scoring. The literature review alone makes an excellent contribution to the field.”
“I am eager for it to be published as I believe it will strengthen the field on educational measurement by providing an excellent example of what is needed to provide validity information about automated scoring.”
“I loved the chapter and your suggestions since they will move the field in a positive direction.”