Adams,, R. J., Wilson,, M., & Wang,, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21(1), 1–23. https://doi.org/10.1177/0146621697211001
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing, Washington, DC: American Educational Research Association.
Andersen,, E. B. (1972). The numerical solution of a set of conditional estimation equations. Journal of the Royal Statistical Society: Series B (Methodological), 34(1), 42–54. https://doi.org/10.1111/j.2517-6161.1972.tb00887.x
Andrich,, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561–573. https://doi.org/10.1007/BF02293814
Baker,, F. B., & Kim,, S.‐H. (2004). Item response theory: Parameter estimation techniques (2nd ed.), New York, NY: Marcel Dekker.
Barton,, M. A., & Lord,, F. M. (1981). An upper asymptote for the three‐parameter logistic item‐response model (ETS Research Report Series RR‐81‐20) (pp. 1–8). Princeton, NJ: Educational Testing Service. https://doi.org/10.1002/j.2333-8504.1981.tb01255.x
Bates,, D., Mächler,, M., Bolker,, B., & Walker,, S. (2015). Fitting linear mixed‐effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01
Birnbaum,, A. (1968). Some latent trait models and their use in inferring an examinee`s ability. In F. M. Lord, & M. R. Novick, (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison‐Wesley.
Bock,, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51. https://doi.org/10.1007/BF02291411
Bock,, R. D., & Aitkin,, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801
Bock,, R. D., & Lieberman,, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35(2), 179–197. https://doi.org/10.1007/BF02291262
Bock,, R. D., & Zimowski,, M. F. (1997). Multiple group IRT. In W. J. van der Linden, & R. K. Hambleton, (Eds.), Handbook of modern item response theory. Berlin: Springer.
Bollen,, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Bolt,, D. M., Cohen,, A. S., & Wollack,, J. A. (2001). A mixture item response model for multiple‐choice data. Journal of Educational and Behavioral Statistics, 26(4), 381–409. https://doi.org/10.3102/10769986026004381
Borsboom,, D., & Markus,, K. A. (2013). Truth and evidence in validity theory: Truth and evidence in validity theory. Journal of Educational Measurement, 50(1), 110–114. https://doi.org/10.1111/jedm.12006
Borsboom,, D., Mellenbergh,, G. J., & van Heerden,, J. (2004). The concept of validity. Psychological Review, 111(4), 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061
Cai,, L. (2008). A Metropolis‐Hastings Robbins‐Monro algorithm for maximum likelihood nonlinear latent structure analysis with a comprehensive measurement model (Doctoral dissertation). University of North Carolina at Chapel Hill. Carolina Digital Repository. https://doi.org/10.17615/jq8h-4q24
Cai,, L. (2010a). A two‐tier full‐information item factor analysis model with applications. Psychometrika, 75(4), 581–612. https://doi.org/10.1007/s11336-010-9178-0
Cai,, L. (2010b). High‐dimensional exploratory item factor analysis by a Metropolis‐Hastings Robbins‐Monro algorithm. Psychometrika, 75(1), 33–57. https://doi.org/10.1007/s11336-009-9136-x
Cai,, L. (2010c). Metropolis‐Hastings Robbins‐Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics, 35(3), 307–335. https://doi.org/10.3102/1076998609353115
Cai,, L., Choi,, K., Hansen,, M., & Harrell,, L. (2016). Item response theory. Annual Review of Statistics and Its Application, 3(1), 297–321. https://doi.org/10.1146/annurev-statistics-041715-033702
Cai,, L., & Thissen,, D. (2014). Modern approaches to parameter estimation in item response theory. In S. P. Reise, & D. A. Revicki, (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment, New York, NY: Routledge.
Celeux,, G., & Diebolt,, J. (1992). A stochastic approximation type EM algorithm for the mixture problem. Stochastics and Stochastic Reports, 41(1–2), 119–134. https://doi.org/10.1080/17442509208833797
Chalmers,, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
Chalmers,, R. P. (2015). Extended mixed‐effects item response models with the MH‐RM algorithm: Extended mixed‐effects item response models. Journal of Educational Measurement, 52(2), 200–222. https://doi.org/10.1111/jedm.12072
Chalmers,, R. P. (2016). Generating adaptive and non‐adaptive test interfaces for multidimensional item response theory applications. Journal of Statistical Software, 71(5), 1–38. https://doi.org/10.18637/jss.v071.i05
Cronbach,, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. https://doi.org/10.1007/BF02310555
de Ayala,, R. J. (2009). The theory and practice of item response theory, New York, NY: Guilford Press.
De Boeck,, P., & Wilson,, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach, New York, NY: Springer.
de la Torre,, J., Stark,, S., & Chernyshenko,, O. S. (2006). Markov chain Monte Carlo estimation of item parameters for the generalized graded unfolding model. Applied Psychological Measurement, 30(3), 216–232. https://doi.org/10.1177/0146621605282772
Dempster,, A. P., Laird,, N. M., & Rubin,, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
Desjardins,, C. D., & Bulut,, O. (2018). Handbook of educational measurement and psychometrics using R, Boca Raton, FL: CRC Press.
Fischer,, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37(6), 359–374. https://doi.org/10.1016/0001-6918(73)90003-6
Fisher,, R. A. (1925). Theory of statistical estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 22(5), 700–725. https://doi.org/10.1017/S0305004100009580
Fox,, J.‐P. (2005). Multilevel IRT using dichotomous and polytomous response data. British Journal of Mathematical and Statistical Psychology, 58(1), 145–172. https://doi.org/10.1348/000711005X38951
Gibbons,, R. D., Bock,, R. D., Hedeker,, D., Weiss,, D. J., Segawa,, E., Bhaumik,, D. K., … Stover,, A. (2007). Full‐information item bifactor analysis of graded response data. Applied Psychological Measurement, 31(1), 4–19. https://doi.org/10.1177/0146621606289485
Gibbons,, R. D., & Hedeker,, D. R. (1992). Full‐information item bi‐factor analysis. Psychometrika, 57(3), 423–436. https://doi.org/10.1007/BF02295430
Haebara,, T. (2011). Ryouteki kenkyuu hou [Quantitative research methods], Tokyo, Japan: University of Tokyo Press.
Hambleton,, R. K., & Swaminathan,, H. (1985). Item response theory: Principles and applications, New York, NY: Springer.
Hambleton,, R. K., Swaminathan,, H., & Roger,, H. J. (1991). Fundamentals of item response theory, Newbury Park, CA: Sage.
Hambleton,, R. K., van der Linden,, W. J., & Wells,, C. S. (2010). IRT models for the analysis of polytomousl scored data. In M. L. Nering, & R. Ostini, (Eds.), Handbook of polytomous item response theory models, New York, NY: Routledge.
Hartz,, S. M. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality (Unpublished doctoral dissertation). University of Illinois at Urbana‐Champaign. http://hdl.handle.net/2142/87393
Hastings,, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97–109. https://doi.org/10.2307/2334940
Holland,, P. W., & Wainer,, H. (Eds.). (1993). Differential item functioning, Hillsdale, NJ: Lawrence Erlbaum Associates.
Holzinger,, K. J., & Swineford,, F. (1937). The bi‐factor method. Psychometrika, 2(1), 41–54. https://doi.org/10.1007/BF02287965
Huang,, H.‐Y. (2016). Mixture random‐effect IRT models for controlling extreme response style on rating scales. Frontiers in Psychology, 7(), 1706–. https://doi.org/10.3389/fpsyg.2016.01706
Jöreskog,, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34(2), 183–202. https://doi.org/10.1007/BF02289343
Junker,, B. W., & Sijtsma,, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272. https://doi.org/10.1177/01466210122032064
Kamata,, A., Bauer,, D. J., & Miyazaki,, Y. (2008). Multilevel measurement modeling. In A. A. O`Connell, & D. B. McCoach, (Eds.), Multilevel modeling of educational data (pp. 345–388). Charlotte, NC: Information Age Publishing.
Kane,, M. T. (2006). Validation. In R. L. Brennan, (Ed.), Educational measurement (4th ed., pp. 17–64). Washington, DC: American Council on Education, Praeger Publishers.
Kato,, K., Yamada,, T., & Kawahashi,, I. (2014). R ni yoru koumoku han‐nou riron [Item response theory with R], Tokyo, Japan: Ohmsha.
Kelderman,, H. (1996). Multidimensional rasch models for partial‐credit scoring. Applied Psychological Measurement, 20(2), 155–168. https://doi.org/10.1177/014662169602000205
Kline,, R. B. (2016). Principles and practice of structural equation modeling (4th ed.), New York, NY: The Guilford Press.
Kolen,, M. J., & Brennan,, R. L. (2014). Test equating, scaling, and linking (3rd ed.), New York, NY: Springer.
Lord,, F. M. (1952). A theory of test scores (Psychometric monograph no. 7). Psychometric Corporation. Retrieved from http://www.psychometrika.org/journal/online/MN07.pdf
Lord,, F. M. (1980). Applications of item response theory to practical testing problems, Hillsdale, NJ: Lawrence Erlbaum Associates.
Magis,, D., Yan,, D., & von Davier,, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR, New York, NY: Springer.
Mair,, P. (2018). Modern psychometrics with R, New York, NY: Springer.
Maris,, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64(2), 187–212. https://doi.org/10.1007/BF02294535
Martin,, A. D., Quinn,, K. M., & Park,, J. H. (2011). MCMCpack: Markov Chain Monte Carlo in R. Journal of Statistical Software, 42(9), 1–21. https://doi.org/10.18637/jss.v042.i09
Masters,, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. https://doi.org/10.1007/BF02296272
McKinley,, R. L., & Reckase,, M. D. (1982). The use of the general Rasch model with multidimensional item response data (Research Report ONR 82‐1). American College Testing Prograrm. https://files.eric.ed.gov/fulltext/ED227162.pdf
McKinley,, R. L., & Reckase,, M. D. (1983). An extension of the two‐parameter logistic model to the multidimensional latent space (Research report ORN 83‐2). American College Testing Program. Retrieved from https://files.eric.ed.gov/fulltext/ED241581.pdf
Messick,, S. (1989). Validity. In R. L. Linn, (Ed.), Educational measurement (3rd ed., pp. 13–103). Washington, DC: American Council on Education, Macmillan Publishing Company.
Metropolis,, N., Rosenbluth,, A. W., Rosenbluth,, M. N., Teller,, A. H., & Teller,, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6), 1087–1092. https://doi.org/10.1063/1.1699114
Millsap,, R. E. (2011). Statistical approaches to measurement invariance, New York, NY: Routledge.
Mislevy,, R. J., & Verhelst,, N. (1990). Modeling item responses when different subjects employ different solution strategies. Psychometrika, 55(2), 195–215. https://doi.org/10.1007/BF02295283
Muraki,, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. https://doi.org/10.1177/014662169201600206
Muraki,, E., & Carlson,, J. E. (1995). Full‐information factor analysis for polytomous item responses. Applied Psychological Measurement, 19(1), 73–90. https://doi.org/10.1177/014662169501900109
Nering,, M. L., & Ostini,, R. (Eds.). (2010). Handbook of polytomous item response theory models, New York, NY: Routledge.
Newton,, P. E., & Shaw,, S. D. (2014). Validity in educational and psychological assessment, London, UK: Sage.
Paek,, I., & Cole,, K. (2020). Using R for item response theory model applications, New York, NY: Routledge.
Pinheiro,, J., Bates,, D., DebRoy,, S., Sarkar,, D., & R Core Team. (2020). nlme: Linear and nonlinear mixed effects models. (Version 3.1‐148) [R package]. Retrieved from https://cran.r-project.org/package=nlme
R Core Team. (2020). R: A language and environment for statistical computing (Computer software) (Version 4.0.2). R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
Rasch,, G. (1960). Probabilistic models for some intelligence and attainment tests, Copenhagen, Denmark: Nielsen %26 Lydiche.
Rasch,, G. (1961). On general laws and the meaning of measurement in psychology. In J. Nayman, (Ed.), Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. Contributions to biology and problems of medicine (Vol. 4, pp. 321–333). Berkeley, CA: University of California Press. Retrieved from https://projecteuclid.org/euclid.bsmsp/1200512895
Raudenbush,, S. W., & Bryk,, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.), Thousand Oaks, CA: Sage.
Reckase,, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9(4), 401–412. https://doi.org/10.1177/014662168500900409
Reckase,, M. D. (2009). Multidimensional item response theory, New York, NY: Springer.
Revelle,, W. (2020). psych: Procedures for psychological, psychometric, and personality research. (Version 2.0.7) [R package]. Retrived from https://CRAN.R-project.org/package=psych
Rijmen,, F. (2009). Efficient full information maximum likelihood estimation for multidimensional IRT models (ETS research report RR‐09‐03). Educational Testing Service. https://doi.org/10.1002/j.2333-8504.2009.tb02160.x
Robbins,, H., & Monro,, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400–407.
Roberts,, J. S., Donoghue,, J. R., & Laughlin,, J. E. (2000). A general item response theory model for unfolding unidimensional polytomous responses. Applied Psychological Measurement, 24(1), 3–32. https://doi.org/10.1177/01466216000241001
Rosseel,, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. https://doi.org/10.18637/jss.v048.i02
Rost,, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14(3), 271–282. https://doi.org/10.1177/014662169001400305
Rost,, J. (1991). A logistic mixture distribution model for polychotomous item responses. British Journal of Mathematical and Statistical Psychology, 44(1), 75–92. https://doi.org/10.1111/j.2044-8317.1991.tb00951.x
Rost,, J., & von Davier,, M. (1995). Mixture distribution Rasch models. In G. H. Fischer, & I. W. Molenaar, (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 257–268). New York, NY: Springer.
Rupp,, A. A., & Templin,, J. L. (2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state‐of‐the‐art. Measurement: Interdisciplinary Research %26 Perspective, 6(4), 219–262. https://doi.org/10.1080/15366360802490866
Rupp,, A. A., Templin,, J., & Henson,, R. A. (2010). Diagnostic measurement: Theory, methods, and applications, New York, NY: The Guilford Press.
Rupp,, A. A., & Zumbo,, B. D. (2004). A note on wow to quantify and report whether IRT parameter invariance holds: When Pearson correlations are not enough. Educational and Psychological Measurement, 64(4), 588–599. https://doi.org/10.1177/0013164403261051
Rupp,, A. A., & Zumbo,, B. D. (2006). Understanding parameter invariance in unidimensional IRT models. Educational and Psychological Measurement, 66(1), 63–84. https://doi.org/10.1177/0013164404273942
Rusch,, T., Mair,, P., & Hatzinger,, R. (2018). IRT packages in R. In W. J. van der Linden, (Ed.), Handbook of item response theory. Volume three, Applications, Boca Raton, FL: CRC Press.
Samejima,, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometric monograph no. 17). Psychometric Society. Retrieved from https://www.psychometricsociety.org/sites/main/files/file-attachments/mn17.pdf
Stevens,, S. S. (1946). On the theory of scales of measurement. Science, 103(2684), 677–680. https://doi.org/10.1126/science.103.2684.677
Sympson,, J. B. (1978). A model for testing with multidimensional items. In D. J. Weiss, (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference (pp. 82–98). Minneapolis: University of Minnesota.
Takane,, Y., & de Leeuw,, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52(3), 393–408. https://doi.org/10.1007/BF02294363
Tatsuno,, C. (2006). Kyouiku hyouka no igi, rekishi [Significance and history of educational evaluation]. In C. Tatsuno,, T. Ishida,, & T. Kitano, (Eds.), Kyouiku hyouka jiten (pp. 18–19). Tokyo, Japan: Tosho Bunka Sha.
Templin,, J. L., & Henson,, R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11(3), 287–305. https://doi.org/10.1037/1082-989X.11.3.287
Thissen,, D., & Cai,, L. (2016). Nominal categories models. In W. J. van der Linden, (Ed.), Handbook of item response theory. Volume one, Models, Boca Raton, FL: CRC Press.
Thissen,, D., Cai,, L., & Bock,, R. D. (2013). The nominal categories item response model. In M. L. Nering, & R. Ostini, (Eds.), Handbook of polytomous item response theory models, New York, NY: Routledge.
Thissen,, D., & Steinberg,, L. (1986). A taxonomy of item response models. Psychometrika, 51(4), 567–577. https://doi.org/10.1007/BF02295596
Thissen,, D., Steinberg,, L., & Wainer,, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland, & H. Wainer, (Eds.), Differential item functioning (pp. 67–113). Hillsdale, NJ: Lawrence Erlbaum Associates.
Tijmstra,, J., Bolsinova,, M., & Jeon,, M. (2018). General mixture item response models with different item response structures: Exposition with an application to Likert scales. Behavior Research Methods, 50(6), 2325–2344. https://doi.org/10.3758/s13428-017-0997-0
Tutz,, G. (1990). Sequential item response models with an ordered response. British Journal of Mathematical and Statistical Psychology, 43(1), 39–55. https://doi.org/10.1111/j.2044-8317.1990.tb00925.x
Tutz,, G. (2016). Sequential models for ordered responses. In W. J. van der Linden, (Ed.), Handbook of item response theory. Volume one, Models (pp. 139–151). Boca Raton, FL: CRC Press.
van der Linden,, W. J. (2005). Linear models of optimal test design, New York, NY: Springer.
van der Linden,, W. J. (Ed.) (2016a). In Handbook of item response theory. Volume one, Models, Boca Raton, FL: CRC Press.
van der Linden,, W. J. (Ed.) (2016b). In Handbook of item response theory. Volume two, Statistical tools, Boca Raton, FL: CRC Press.
van der Linden,, W. J. (Ed.) (2018). In Handbook of item response theory. Volume three, Applications, Boca Raton, FL: CRC Press.
van der Linden,, W. J., & Glas,, C. A. W. (Eds.). (2010). Elements of adaptive testing, New York, NY: Springer.
van der Linden,, W. J., & Hambleton,, R. K. (Eds.). (1997). Handbook of modern item response theory, New York, NY: Springer.
Verhelst,, N. D., Glas,, C. A. W., & de Vries,, H. H. (1997). A steps model to analyze partial credit. In W. J. van der Linden, & R. K. Hambleton, (Eds.), Handbook of modern item response theory, (123–138). New York, NY: Springer.
von Davier,, A. A. (Ed.). (2011). Statistical models for test equating, scaling, and linking, New York, NY: Springer.
von Davier,, M., Gonzalez,, E., & Mislevy,, R. J. (2009). What are plausible values and why are they useful? IERI Monograph Series: Issues and Methodologies in Large‐Scale Assessments, 2, 9–36.
Wainer,, H., & Dorans,, N. J. (2000). Computerized adaptive testing: A primer (2nd ed.), New York, NY: Routledge.
Wei,, G. C. G., & Tanner,, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man`s data augmentation algorithms. Journal of the American Statistical Association, 85(411), 699–704. https://doi.org/10.1080/01621459.1990.10474930
Whitely,, S. E. (1980). Multicomponent latent trait models for ability tests. Psychometrika, 45(4), 479–494. https://doi.org/10.1007/BF02293610
Yan,, D., von Davier,, A. A., & Lewis,, C. (2014). Computerized multistage testing: Theory and applications, Boca Raton, FL: CRC Press.
Yao,, L., & Schwarz,, R. D. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed‐format tests. Applied Psychological Measurement, 30(6), 469–492. https://doi.org/10.1177/0146621605284537
Yoshida,, T., Ishii,, H., & Haebara,, T. (2012). Syakudo no sakusei, shiyou to datousei no kentou [Constrution, use, and validation of psychological scales]. The Annual Report of Educational Psychology in Japan, 51, 213–217. https://doi.org/10.5926/arepj.51.213
Zumbo,, B. D. (2014). In E. K. H. Chan, (Ed.), Validity and validation in social, behavioral, and health sciences, New York, NY: Springer.
Zwinderman,, A. H. (1991). A generalized Rasch model for manifest predictors. Psychometrika, 56(4), 589–600. https://doi.org/10.1007/BF02294492