Battauz,, M. (2015). EquateIRT: An R package for IRT test equating. Journal of Statistical Software, 68(7), 1–22. https://doi.org/10.18637/jss.v068.i07
Birnbaum,, A. (1968). Some latent trait models and their use in inferring an examinee`s ability. In F. M. Lord, & M. R. Novick, (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison‐Wesley.
Bock,, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51. https://doi.org/10.1007/BF02291411
Bock,, R. D., & Mislevy,, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444. https://doi.org/10.1177/014662168200600405
Bock,, R. D., Muraki,, E., & Pfeiffenberger,, W. (1988). Item pool maintenance in the presence of item parameter drift. Journal of Educational Measurement, 25(4), 275–285. https://doi.org/10.1111/j.1745-3984.1988.tb00308.x
Bock,, R. D., Thissen,, D., & Zimowski,, M. F. (1997). IRT estimation of domain scores. Journal of Educational Measurement, 34(3), 197–211. https://doi.org/10.1111/j.1745-3984.1997.tb00515.x
Casella,, G., & Berger,, R. L. (2002). Statistical inference (2nd ed.). Belmont, CA: Brooks/cole, Cengage Learning.
Cizek,, G. J. (2012). Setting performance standards: Foundations, methods, and innovations. New York, NY: Routledge.
Cook,, L. L., & Petersen,, N. S. (1987). Problems related to the use of conventional and item response theory equating methods in less than optimal circumstances. Applied Psychological Measurement, 11(3), 225–244. https://doi.org/10.1177/014662168701100302
de Ayala,, R. J. (2009). The theory and practice of item response theory. New York, NY: Guilford Press.
Embreston,, S. E., & Reise,, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates.
Every Student Succeeds Act of 2015, Public Law No. 114–95 (2015).
Haebara,, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22(3), 144–149. https://doi.org/10.4992/psycholres1954.22.144
Hambleton,, R. K., & Swaminathan,, H. (1985). Item response theory: Principles and applications. New York, NY: Springer.
Hambleton,, R. K., Swaminathan,, H., & Roger,, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Han,, K. T., Wells,, C. S., & Sireci,, S. G. (2012). The impact of multidirectional item parameter drift on IRT scaling coefficients and proficiency estimates. Applied Measurement in Education, 25(2), 97–117. https://doi.org/10.1080/08957347.2012.660000
Hanson,, B. A., & Béguin,, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common‐item equating design. Applied Psychological Measurement, 26(1), 3–24. https://doi.org/10.1177/0146621602026001001
Hogg,, R. V., McKean,, J. W., & Craig,, A. T. (2019). Introduction to mathematical statistics (8th ed.). Boston, MA: Pearson.
Holland,, P. W., & Dorans,, N. J. (2006). Linking and equating. In R. L. Brennan, (Ed.), Educational measurement (4th ed., pp. 187–220). Washington, DC: American Council on Education, Praeger.
Holland,, P. W., & Wainer,, H. (Eds.). (1993). Differential item functioning. Mahwah, NJ: Lawrence Erlbaum Associates.
Huynh,, H., & Meyer,, P. (2010). Use of robust z in detecting unstable items in item response theory models. Practical Assessment, Research %26 Evaluation, 15, Article 2. https://doi.org/10.7275/ycx6-e864
Kim, S.‐H., & Cohen, A. S. (1992). Effects of linking methods on detection of DIF. Journal of Educational Measurement, 29(1), 51–66. https://doi.org/10.1111/j.1745-3984.1992.tb00367.x
Kim,, S., & Kolen,, M. J. (2004). STUIRT: A computer program for scale transformation under unidimensional item response theory models. (Version 1.0) [Computer software]. Retrieved from https://education.uiowa.edu/centers/center-advanced-studies-measurement-and-assessment/computer-programs
Kolen,, M. J. (2006). Scaling and norming. In R. L. Brennan, (Ed.), Educational measurement (4th ed., pp. 155–186). Westport, CT: American Council on Education, Praeger.
Kolen,, M. J., & Brennan,, R. L. (2014). Test equating, scaling, and linking (3rd ed.). New York, NA: Springer.
Kolen,, M. J., Tong,, Y., & Brennan,, R. L. (2011). Scoring and scaling educational tests. In A. A. von Davier, (Ed.), Statistical models for test equating, scaling, and linking (pp. 43–58). New York, NY: Springer.
Lee,, W.‐C., & Ban,, J.‐C. (2009). A comparison of IRT linking procedures. Applied Measurement in Education, 23(1), 23–48. https://doi.org/10.1080/08957340903423537
Linacre,, J. M. (2020). A user`s guide to WINSTEPS® MINISTEP Rasch‐model computer programs program manual 4.5.4. Retrieved from https://www.winsteps.com/a/Winsteps-Manual.pdf
Lord,, F. M. (1980). Applications of item response theory to practical testing problems. Mahwah, NJ: Lawrence Erlbaum Associates.
Lord,, F. M., & Novick,, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison‐Wesley.
Loyd,, B. H., & Hoover,, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17(3), 179–193. https://doi.org/10.1111/j.1745-3984.1980.tb00825.x
Marco,, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14(2), 139–160. https://doi.org/10.1111/j.1745-3984.1977.tb00033.x
Masters,, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. https://doi.org/10.1007/BF02296272
Meyers,, J. L., Miller,, G. E., & Way,, W. D. (2008). Item position and item difficulty change in an IRT‐based common item equating design. Applied Measurement in Education, 22(1), 38–60. https://doi.org/10.1080/08957340802558342
Meyers,, J. L., Murphy,, S., Goodman,, J., & Turhan,, A. (2012, April). The impact of item position change on item parameters and common equating results under the 3PL model. [Paper presentation]. Annual Meeting of the National Council on Measurement in Education, Vancouver, BC. Retrieved from https://images.pearsonassessments.com/images/tmrs/ImpactofItemPositionChange_NCME.pdf
Miller,, G. E., Rotou,, O., & Twing,, J. S. (2004). Evaluation of the 0.3 logit screening criterion in common item equating. Journal of Applied Measurement, 5(2), 172–177.
Muraki,, E. (1992). A generalized partial credit model: Application of an E.M. algorithm. Applied Psychological Measurement, 16(2), 159–176. https://doi.org/10.1177/014662169201600206
No Child Left Behind Act of 2001, Public Law No. 107–110 (2002).
Pommerich,, M., Nicewander,, W. A., & Hanson,, B. A. (1999). Estimating average domain scores. Journal of Educational Measurement, 36(3), 199–216. https://doi.org/10.1111/j.1745-3984.1999.tb00554.x
R Core Team. (2020). R: A language and environment for statistical computing [Computer software]. (Version 4.0.2) [Computer software] R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
Rasch,, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danmarks paedagogiske Institut.
Rupp,, A. A., & Zumbo,, B. D. (2004). A note on wow to quantify and report whether IRT parameter invariance holds: When Pearson correlations are not enough. Educational and Psychological Measurement, 64(4), 588–599. https://doi.org/10.1177/0013164403261051
Rupp,, A. A., & Zumbo,, B. D. (2006). Understanding parameter invariance in unidimensional IRT models. Educational and Psychological Measurement, 66(1), 63–84. https://doi.org/10.1177/0013164404273942
Samejima,, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometric Monograph No. 17). Psychometric Society. Retrieved from http://www.psychometrika.org/journal/online/MN17.pdf
Sinharay,, S., & Holland,, P. W. (2007). Is it necessary to make anchor tests mini‐versions of the tests being equated or can some restrictions be relaxed? Journal of Educational Measurement, 44(3), 249–275. https://doi.org/10.1111/j.1745-3984.2007.00037.x
Stocking,, M. L., & Lord,, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201–210. https://doi.org/10.1177/014662168300700208
Sykes,, R. C., & Fitzpatrick,, A. R. (1992). The stability of IRT b values. Journal of Educational Measurement, 29(3), 201–211. https://doi.org/10.1111/j.1745-3984.1992.tb00373.x
Sykes,, R. C., & Ito,, K. (1993, April). Item parameter drift in IRT‐based licensure examinations. [Paper presentation]. Annual meeting of the national council on measurement in education, Atlanta, GA. Retrieved from https://files.eric.ed.gov/fulltext/ED359239.pdf
Taherbhai,, H., & Seo,, D. (2013). The philosophical aspects of IRT equating: Modeling drift to evaluate cohort growth in large‐scale assessments. Educational Measurement: Issues and Practice, 32(1), 2–14. https://doi.org/10.1111/emip.12000
Tatsuno,, C. (2006). Kyouiku hyouka no igi, rekishi [Significance and history of educational evaluation]. In C. Tatsuno,, T. Ishida,, & T. Kitano, (Eds.), Kyouiku hyouka jiten (pp. 18–19). Tokyo, Japan: Tosho Bunka Sha.
von Davier,, A. A. (2011). Statistical models for test equating, scaling, and linking. New York, NY: Springer.
Weeks,, J. P. (2010). Plink: An R package for linking mixed‐format tests using IRT‐based methods. Journal of Statistical Software, 35(12), 1–33. https://doi.org/10.18637/jss.v035.i12
Wells,, C. S., Hambleton,, R. K., Kirkpatrick,, R., & Meng,, Y. (2014). An examination of two procedures for identifying consequential item parameter drift. Applied Measurement in Education, 27(3), 214–231. https://doi.org/10.1080/08957347.2014.905786
Wells,, C. S., Subkoviak,, M. J., & Serlin,, R. C. (2002). The effect of item parameter drift on examinee ability estimates. 26(1), 77–87. https://doi.org/10.1177/0146621602261005
Wright,, B. D. (1968). Sample‐free test calibration and person measurement. (ED017810). ERIC. Retrieved from https://files.eric.ed.gov/fulltext/ED017810.pdf