Центр психометрики и измерений в образовании

 

Психометрические исследования — это методы измерения и оценки различных характеристик людей, включая психологические особенности, знания, компетенции, навыки и т.д., с использованием статистических методов.

 

Track №1 Computational Psychometrics: Introduction& Applications

Instructors:

Alina A. von Davier, SVP, ACTNext

Mikhail Yudelson, Senior Research Scientist, ACTNext

Abstract
In this session, we will use lecture, discussions, and software demo to introduce a new area, Computational Psychometrics (CP; von Davier, 2015; 2017), and the best practices in data logging, data mining (DM), visualization, and machine learning (ML) techniques, as well as methods for evaluating results from the analyses of Big Data. The session is designed for researchers with a background in measurement but less experience with data mining or machine learning.

In this workshop we will introduce the basic concepts of computational psychometrics (CP; von Davier, 2015; 2017), focusing on data mining, machine learning, and data visualization with applications in assessment. CP merges the data driven approaches with the theoretical (cognitive) models to provide a rigorous framework for the measurement of skills in the presence of Big Data.  We will discuss five types of big data in educational assessment: a) ancillary information about the test takers; b) process data from simulations, games, and learning experiences; c) data from collaborative interactions; d) data from multimodal sensors; and e) large data sets from tests with continuous administrations over time.

Learning technology offers rich data for developing novel educational assessments and learning systems. It is also increasingly recognized that technology designed for learning and assessment can be enhanced by reliable measurement of target constructs. Extensions and evaluations of the work for the inclusion of learning features and complex data need to be considered (Arielli-Atalli et. al., 2019). Moreover, the data obtained from learning and assessment technologies are typically “complex,” in the sense that they involve sources of statistical dependence that violate the assumptions of conventional psychometric models. These data are also referred to process data. Complex assessment and learning data have also been hypothesized to involve a variety of “non-cognitive” factors such as motivation and social skills, thereby increasing the dimensionality of the measurement problem and the potential for measurement bias. The question is how can one measure, predict, and classify test takers’ skills in a simulation or game-based assessment or in an intelligent tutoring system with feedback loops? How can one build diagnostic & recommendation systems at scale that provides insights at a sufficiently granular level to be useful but also at scale for hundred thousand of students in real time; DM and ML tools merged with the theoretical (cognitive) models may help answer this question (Breiman, 2001, O’Connor, 2008).

 As mentioned before, there are other types of assessment data that may benefit from the DM techniques. For example, data that consist of test scores and background questionnaire data over many administrations from a test with an almost continuous administration mode. In this situation, one research question may be about patterns in the data, another may be whether tests scores for specific subpopulations can be predicted as part of the quality control efforts.   

We will discuss similarities and differences between machine learning and statistical inference (Lee & von Davier, 2013) and will discuss the components needed to support real world computational psychometrics, e.g. using international data standards (IMS Caliper, CASE) and diagnostic algorithms (Elo, BKT, AFM, PFA, etc.). We will also present examples of research projects and applications in learning and assessment from ACTNext. We will showcase our Recommender & Diagnostic (RAD) engine, and the collaborative-problem solving educational game (von Davier et al, 2019) and will discuss the data governance needed for the big data (von Davier et. al, 2019b), e.g. Data Lakes, Data cubes.

Learning Objectives
1. Discuss different types of data for which DM and ML techniques are useful (process data, learning data, and other big data).
2. Provide participants with recommended practices for logging process data to support analyses.
3. Provide participants with an overview of methods that can be used to determine the reliability and validity of such analyses.
4. Discuss several the potential of several (cognitive) models for the CP analyses.

Learning outcomes
1. Be aware of the basic concepts of computational psychometrics and five types of big data in educational assessment.
2. Know similarities and differences between machine learning and statistical inference
3. Know methods that can be used to determine the reliability and validity of such analyses.


Prerequisites
1. Arieli-Attali, M., Ward, S., Thomas, J., Deonovic, B. &von Davier, A.A.  (2019). The Expanded Evidence-Centered Design (e-ECD) for Learning and Assessment Systems: A Framework for Incorporating Learning Goals and Process within Assessment Design. Frontiers Psychology https://www.frontiersin.org/articles/10.3389/fpsyg.2019.00853/full
2. 
Bergner, Y. & von Davier, A.A. (2018). Process Data in NAEP. Past, Present, and Future. Journal of Educational and Behavioral Statistics, 27. doi:10.3102/1076998618784700
3. Chopade P., Stoeffler K., Khan S., Rosen Y., Swartz S., von Davier A. (2018). Human-Agent Assessment: Interaction and Sub-skills Scoring for Collaborative Problem Solving. In: Penstein Rosé C. et al. (eds) Artificial Intelligence in Education. AIED 2018. Lecture Notes in Computer Science, vol 10948, pp 52-57. Springer, Cham. https://doi.org/10.1007/978-3-319-93846-2_10
4. Breiman, L.(2011). Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statist. Sci. 16 (2001), no. 3, 199--231. doi:10.1214/ss/1009213726. http://projecteuclid.org/euclid.ss/1009213726.
5. 
O’Connor, B. (2018). Statistics vs. Machine Learning, fight! http://brenocon.com/blog/2008/12/statistics-vs-machine-learning-fight/, Originally published in December 2008, Retrieved in July 2015.
6. O’Connor (2013).  Wasserman on Stats vs ML, and previous comparisons, http://brenocon.com/blog/2013/02/wasserman-on-stats-vs-ml-and-previous-comparisons/, Originally published in February 2013, Retrieved in July 2015
7. Lee, Y-H & von Davier, A. A. (2013). Monitoring scale scores over time via quality control charts, model-based approaches, and time series techniques. Psychometrika, 78,557-575.
8. Polyak, Stephen T., Alina A. von Davier, and Kurt Peterschmidt. "Computational psychometrics for the measurement of collaborative problem solving skills." Frontiers in psychology 8 (2017): 2029.
9. Polyak, S., Edwards, D., Agrawal, A., Severson, J., Stoeffler, K., Best, A., Cosman, J., MacMillan, I., Dingler, C., Gambrell, J., von Davier, A.A. (December, 2016). Machine Learning Clustering Techniques: Interpreting Game Log Evidence of Collaborative Problem Solving Skills for Middle School Students. Presentation at the Neural Information Processing Systems conference (NIPS 2016), Machine Learning for Education Workshop. Barcelona, Spain.
10. Stoeffler K., Rosen Y., Bolsinova M., von Davier A.A. (2018) Gamified Assessment of CollaborativeSkills with Chatbots. In: Penstein Rosé C. et al. (eds) Artificial Intelligence in Education. AIED2018. Lecture Notes in Computer Science, vol 10948. Springer doi: https://doi.org/10.1007/978-3-319-93846-2_64
11. Tin Kam Ho, in M. Graham, M. Fitzpatrick, T. McGlynn (eds.), The National Virtual Observatory: Tools And Techniques For Astronomical Research, Astronomical Society of the Pacific, ASP Conference Series Vol. CS-382, 2008, 29-36.
12. von Davier, A. A. (2015, July). Virtual & collaborative assessments: Examples, implications, and challenges for educational measurement. Invited Talk at the Workshop on Machine Learning for Education, International Conference of Machine Learning, Lille, France http://dsp.rice.edu/ML4Ed_ICML2015
13. von Davier, A. A. (2017). Computational psychometrics in support of collaborative assessments. In A.A. von Davier (Ed.). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement.
14. von Davier, A.A., Chung Wong, P., Yudelson, M., Polyak, S. (2019). The argument for a “data cube” for large-scale psychometric data. Frontiers in Education. https://www.frontiersin.org/articles/10.3389/feduc.2019.00071/full
15. von Davier, A.A., Deonovic, B., Yudelson, M., Polyak, S., & Woo, A. (2019). Computational Psychometrics Approach for Holistic Learning and Assessment Systems. Frontiers in Education. https://www.frontiersin.org/articles/10.3389/feduc.2019.00069/full

MOOC:

16. Data Mining with Weka is a 5 week MOOC, which was held first in late 2013. Check out the MOOC site for video. https://weka.waikato.ac.nz/explorer http://www.cs.waikato.ac.nz/ml/weka/book.html


Schedule of the track №1

Day 1. Design.

(4 hours)

    1. Assessment and learning systems design.
    2. Frameworks, including ECD and agent-based.
    3. Educational Data Analytics. Questions we want to answer with Educational Data Analytics

 ·        Big questions that were answered.
 ·        Questions that are actively researched.
 ·        Stakeholders and why they could care about Educational Data Analytics.
 

Day 2. Data in learning and assessment systems

(4 hours theory)

    1. Types of educational data and its granularity.

 ·        Assessment scores and sub-scores, course grades, badging and competencies in mini-courses, transaction-level process data.
 ·        Skill-level metadata: knowledge/proficiencies/skills/concepts and their granularity.
 ·        Item-level metadata: difficulty, efficacy.
 ·        Behavioral. Gaming, affect.

  

    1. Dependencies in the data.

 ·        Across time (learning).
 ·        Across items/tasks (performance assessment).
 ·        Across people (collaborative problem-solving).

   

    1. Typical data
 ·        Assessment data
 ·        Process data

      
        d. Data collection and standards.

 ·        Known standards. IMS Global’s Caliper and CASE.
 ·        Organizing data collection defensively.

  

Day 3 & 4. Modeling educational data

(4 hours theory, 4 hours hands-on).

 

    1. Types of models and questions they answer. Assessment (formative and summative). Learning. Stratification (equity in assessment and learning). Product improvement.
    2. Assessment models. IRT models (1PL, 2PL, 3PL, MIRT). LLTM.
    3. Models of learning. BKT, AFM/PFA.
    4. Other. Elo, Urnings, Mastery model.

 

Day 5. Model validation.

(2 hours theory, 2 hours hands-on).

 

    1. Modeling metrics. Accuracy, RMSE, AUC, etc.
    2. Content/Item design and quality. Equating.
    3. Statistical validation of model quality. Cross-validating models.

 ·        Cross-validation as a method.
 ·        Stratification
 ·        Cross-validation as a ranking approach (5-random runs of 2-fold cross-validation)

 
        l.Qualitative evaluation. Learning curves, parameter loadings.
        m. Designing and selecting the best model.