top of page


This page highlights some of my recent work. Please see my CV for a complete list of peer-reviewed publications, professional reports, conference presentations, and–in the near future–preprints.

Student, S. R. (2022). Vertical scales, deceleration, and empirical benchmarks for growth. Educational Researcher. Advance online publication.

Empirical growth benchmarks, as introduced by Hill, Bloom, Black, and Lipsey (2008), are a well-known way to contextualize effect sizes in education research. Past work on these benchmarks, both positive and negative, has largely avoided confronting the role of vertical scales, yet technical issues with vertical scales trouble the use of such benchmarks. This article introduces vertical scales and outlines their role in the creation of empirical benchmarks for growth. I then outline three strands of recent vertical scale research that call into question the grounds for relying on these benchmarks. I conclude with recommendations for researchers looking to contextualize observed effects of educational interventions without confounding their effects with vertical scaling artifacts.

Student, S. R., & Gong, B. (2022). Supporting the interpretive validity of student-level claims in science assessment with tiered claim structures. Educational Measurement: Issues and Practice. Advance online publication.

We address two persistent challenges in large-scale assessments of the Next Generation Science Standards: (a) the validity of score interpretations that target the standards broadly and (b) how to structure claims for assessments of this complex domain. The NGSS pose a particular challenge for specifying claims about students that evidence from summative assessments can support. As a solution, we propose tiered claims, which explicitly distinguish between claims about what students have done or can do on test items—which are typically easier to support under current test designs—and claims about what students could do in the broader domain of performances described by the standards, for which novel evidence is likely required. We discuss the positive implications of tiered claims for test construction, validation, and reporting of results.

Student, S. R. (2022). Appraising traditional and purpose-built person fit statistics’ power to detect cheating. Chinese-English Journal of Educational Measurement and Evaluation, 3(1).

Person-fit statistics (PFSs) have been suggested as a tool to detect cheating in large-scale testing, and this study investigates their potential for this application. Most PFSs are equally sensitive to scores that appear spuriously high or spuriously low. Xia & Zheng introduced four PFSs that are meant to be more sensitive to spuriously high scores and therefore may be more appropriate for detecting cheating. Comparing the power of these weighted PFSs against the power of traditional PFSs to detect cheating shows that there is no single best statistic in all or most scenarios, and in most scenarios, most examinees flagged as cheating by person fit analysis did not cheat. Implications for operational use of PFSs to detect cheating are discussed.

bottom of page