October 31, 2023
Subscribe to Vretta Buzz
Educational assessment data is fundamental for improving education quality - a goal emphasized by global initiatives like the Sustainable Development Goals (SDGs)1 and the Education 2030 agenda2. With the roll-out of SDG 4 and Education 2030 objectives, many jurisdictions have been challenged to create accurate indicators for tracking progress. This is where the data management in educational assessment comes into play, effectively using evidence to shape educational policy.
In this context, monitoring student progress through assessments at pivotal educational stages is especially important because it contains valuable and sensitive information within educational systems. Reporting and decisions based on this data, which impacts various stakeholders from students to teachers, invite scrutiny if not managed according to best practices and established protocols.
The need for rigorous data management becomes even more pertinent as technological advancements continuously refine data management practices in the field. Even in global crises, access to data and ability to analyze it helps to understand social problems better and provide evidence-based solutions in social fields, like education3. Therefore, maintaining pace with these technological shifts is essential for assessment organizations and governing bodies. Adapting modern tools for data management is crucial for maintaining credibility and effectiveness.
This article aims to describe the current landscape of educational assessment, discuss trends in data management, and elaborate on modern approaches, showcasing best practices in the field.
Historically, assessments were delivered using paper-and-pencil methods, whether in large-scale assessments or classroom settings. However, keeping up with the environment we live in, accelerated by the 2020-21 pandemic, has led to a dramatic shift towards delivering assessments digitally. Many leading international test delivery organizations have already transitioned to offering their assessments and qualifications through online formats, varying by the specific assessments and the jurisdictions in which they operate.
In our journey through this digital transformation, data analysis becomes even more pivotal, serving as the cornerstone of educational assessment. The advanced metrics and analytics enabled by digital platforms provide not only easier access to evaluations of the reliability and validity of assessments but also unprecedented insights into learning outcomes. With these insights in hand, the 2022 guidelines from the International Test Commission and the Association of Test Publishers4 emerged as an initial framework for best practices in technology-based assessments, yet they only mark the beginning.
In the next section, we delve deeper into why data analysis is so critical and how it shapes the landscape of educational assessment today.
Data and evaluation teams face a critical decision when selecting the right tools for data collection and analysis. This decision entails the thoughtful consideration of various factors, including resources, security, accuracy, and the potential operational value within the context of the jurisdiction. Therefore, before delving into the complexities of data analysis, understanding the foundational importance of the methods and platforms used for data collection is vital. The following elements set the stage for accurate and meaningful analysis:
Data Collection Methods: In the digital age, both online and offline methods are valuable for collecting data in educational assessments. While online platforms offer the advantage of speed and accuracy when properly designed, offline methods still remain critical in settings with limited connectivity for some students in some subjects.
Compatibility, Usability, and Integrity: Compatibility with a range of browsers and operating systems, multi-platform compatibility, is essential to ensure the ease of data use for administrators, markers, and candidates. Such compatibility reduces the chance of errors and enhances data integrity, which is crucial in any assessment. Furthermore, having platforms include built-in features such as data with encryption and multi-factor authentication minimizes data-entry errors and streamlines marker’s assignments and scheduling, making them more accurate and efficient.
Marking Platforms and Design: Comprehensive marking platforms often come equipped with advanced tools for managing human marking. On one hand, features like ongoing item calibration help maintain consistent marking standards, often aided by seeded exemplar assessments. On the other hand, a poorly designed platform can result in numerous issues, from data-entry errors to complex scheduling procedures. Thus, the user interface should be intuitive and designed with the end-users in mind.
In the operational sense, data analysis conducted by psychometricians and statisticians demands algorithmic thinking to iteratively analyze the available and manageable data, ultimately aligning data analysis with meaningful interpretation. Emphasizing the increasing importance of comprehending the intricacies of data storage and management. The following considerations serve as the foundation for effective analysis and can be summarized as follows:
Data Management: Online assessments pose numerous challenges for educational organizations and assessment bodies, including issues related to data storage, sharing, and result transfer. Managing these extensive datasets often requires extensive manual cleaning and preparation due to their sheer volume, which can surpass the capabilities of commonly used data management software like Excel or SPSS.
Score Comparability and Disruptions: Thorough documentation, encompassing data sources, samples, methods, and analyses, is key for ensuring score comparability or equivalence, as it directly correlates with the extent of data loss prevention. So, in such documentation, providing detailed information on data collection procedures, sample descriptions, employed methods and analyses, as well as any limitations or alert notes for interpreting the results becomes very essential. Additionally, data collection systems should rigorously document any technological disruptions, including detailed records of assessment interruptions for each student.
Innovations in Equating and Item Calibration: In educational assessment, supervised machine learning prediction models are increasingly employed to overcome the limitations of conventional item calibration methods. While traditional methods often necessitate large sample sizes for item calibration, machine learning and natural language Artificial Intelligence (AI) processing can reduce the need for extensive pre-testing and equating5. Furthermore, these innovations promote score comparability in computer-adaptive assessment systems and enable innovative approaches to assessment design and automated item generation.
Standard Setting Procedure: The standard-setting process is vital for categorizing student performance into defined levels like "below basic" or "proficient." To establish these levels and ensure a quality control process, panels of experts and educators employ various methods, such as Angoff or Bookmark, and conduct periodic reviews, especially when the assessment undergoes changes. In this context, emerging psychometric and machine learning techniques play an essential role in updating and refining standard-setting procedures, ultimately enhancing the accuracy and relevance of performance criteria in education.
Piloting Reports with Stakeholders: Finally, before formal deployment, piloting reports with stakeholders using various data collection methods is necessary to evaluate the reports' accessibility, usability, and overall comprehension. Any necessary revisions should be informed by the feedback received during this stage.
The first evidence of assessment validity is gathered during the design phase, establishing hypotheses about evidence from student responses. Assessment bodies consider accumulating evidence at each phase of the cycle and use various qualitative and quantitative analyses depending on data collected. In psychometric analysis, different software can perform various analyses, but not all support frameworks like Classical Test Theory (CTT), Item Response Theory (IRT), or Generalizability Theory, and some lack support for tasks like calibration and equating. To navigate these complexities, we have compiled the following list of common analytical tools with short descriptions to aid your understanding of their roles in educational assessment:
SPSS Statistics for Education: This software is a go-to for statistical analysis, offering modules specialized for educational data, such as assessment marking and grade analytics.
Tableau for Education: This tool enables educators and institutions to visualize data in an understandable format, covering everything from student performance metrics to resource allocation.
BILOG: Specialized software for estimating item parameters in educational assessment using IRT models.
MULTILOG: Advanced software for complex IRT modeling, suitable for multidimensional assessments.
PARSCALE: Versatile tool for item response modeling and test scaling, offering various IRT models for educational and other assessments.
Winsteps or Facets: These software tools are typically used for implementing IRT models to analyze assessment data.
Python: An open-source programming language with a versatile range of applications in educational assessment, including item calibration using IRT, and other data analysis tasks.
SAS: A comprehensive software suite applied in educational assessment for item calibration, test marking using IRT, and various data analysis tasks.
R for Educational Data Analysis: An open-source programming language that offers flexibility in advanced statistical modeling and data visualization, widely utilized for complex educational assessment analytics.
In conclusion, while traditional psychometric software often lacks robust data management features, which can hinder comprehensive and efficient data analysis, the tools presented above offer a wide array of capabilities for educational assessment analysis, encompassing statistical analysis, data visualization, IRT modeling, multidimensional assessments, item response modeling, and comprehensive data analytics.
Research design defines the research methods to be employed for addressing questions of interest, whether they pertain to practical, day-to-day decisions within the assessment cycle or broader strategic policy choices within the educational system. In any given situation, data analysis provides the essential technical quality and psychometric rigor to enhance the reliability and validity of assessments across all stages of the process, from design to interpretation. Precise marking ensures consistent and replicable outcomes, laying the groundwork for data-driven decision-making, while robust data analysis validates assessments by equating scores for comparability across different versions or delivery methods. In the context of dealing with the complexities of educational assessment data, various analytical approaches, ranging from conventional statistics to cutting-edge machine learning methods, are shaping the field today. Here is a list of these approaches:
Descriptive Statistics: Descriptive statistics are the foundation of any educational assessment analysis. These statistics summarize and organize the characteristics of a dataset, like the mean, median, and standard deviation, offering a quick snapshot of the dataset's main features.
Inferential Statistics: Inferential statistics go beyond simple description to make predictions or inferences about a larger population based on a sample. This is commonly done through tools like t-tests, ANOVA, and regression analysis.
Classical Test Theory (CTT): CTT is a classic approach to test theory where the primary parameters used are item difficulty and item discrimination that are specific to each test item.
Item Response Theory (IRT): IRT is a modern approach that focuses on three primary parameters: item difficulty, item discrimination, and guessing parameter. Strategies for optimizing IRT models using open-source tools such as IRTPRO, JMetrik, Multidimensional Item Response Theory (MIRT), and TAM may include parameter estimation, linking and equating, as well as model fit analysis.
Machine Learning and AI Techniques: The breakdown of these techniques helps to familiarize practitioners with potential application of these tools for running automatic analysis of complex, multi-dimensional data, delivering insights that are not easily obtainable through conventional statistical techniques:
Machine Learning Techniques: Regression Analysis, Ensemble Methods, Clustering Algorithms (e.g., K-means), Support Vector Machines, Random Forests, Decision Trees.
AI Techniques: Neural Networks, Natural Language Processing (NLP).
In summary, these data analysis approaches enable the measurement of changes and improvement in student performance over time with varying degrees of precision, supporting informed decision-making practices by various educational stakeholders, depending on the specific context of application.
The reporting of student results is a key stage in the assessment cycle, crucial for communicating the assessment's value to stakeholders. In data reporting, rigorous data quality procedures and clearly defined policies ensure accurate, consistent scores, tailored to the assessment's purpose, target audience, and accessibility needs. Data reporting encompasses a range of important aspects, namely usability, quality control, security, confidentiality, and transparency in external analysis, which are detailed below:
Usability: Online reporting tools should be designed to meet the functional needs and interface expectations of users. Extensive user testing ensures tool reliability, and accessible resources help with result interpretation.
Quality Control: A structured quality control checklist is essential for accurate score reporting and should be implemented and documented. Automated systems, subject to expert review, streamline the process, and robust security measures, including audit trails, guard against unauthorized changes.
Confidentiality and Security: Access to scores is restricted to authorized individuals, who must agree to confidentiality terms. Secure login portals provide role-specific access, while data privacy mechanisms and ISO 27001:2013 standards secure individual and group-level reports. When anonymized data are shared for external analysis, the included and excluded variables must be clearly specified.
Visualization and Storytelling: Utilizing graphs, charts, and interactive dashboards enhances data clarity and accessibility, aiding stakeholders in quickly grasping key insights and making informed decisions. Conveying insights through narratives and visuals, inspiring action and ensuring data's impact with effective communication, requiring a blend of hard and soft skills.
In educational assessment, the integration of AI-supported data management enhances the way we collect, handle, and utilize student data. From streamlining data collection processes to fortifying data security, applications of AI offer several benefits that not only enhance the accuracy and efficiency of educational assessments but also ensure the utmost privacy and compliance. Let's delve into the manifold benefits of AI in educational assessment data management, exploring each category:
Automatic Test Data Collection: AI streamlines data collection by automatically collecting digital test submissions from various sources and storing them securely in the cloud for easy access and data protection.
Quality Assurance Tools: AI-powered tools identify and correct errors or inconsistencies in student responses, ensuring the accuracy and reliability of assessment data. Moreover, through metadata analysis, the item bank can be checked to identify identical items, leading to the removal of similar content-related assessments.
AI-supported Trend Analysis: Typically, conducting trend analysis to identify patterns in datasets requires the use of statistical methods and data analysis techniques. However, AI can significantly enhance the efficiency and accuracy of revealing trends and patterns in student performance, particularly when dealing with large and complex datasets and assist in generating dynamic reports with visualizations, aiding in interpreting results.
Intrusion Detection and Data Anonymization: AI-driven cybersecurity employs intrusion detection systems to continuously monitor data access, thereby detecting unauthorized intrusions and safeguarding sensitive student records. Additionally, it utilizes data anonymization techniques to preserve data privacy, especially with regard to personally identifiable information.
Machine Learning in Adaptive Testing: Machine learning algorithms analyze previous answers to serve up items tailored to each student's ability level. For instance, if a student excels in Maths, the algorithm could present more challenging items, making the assessment more efficient.
Deep Learning for Automated Essay marking: Deep learning models streamline educational assessment by offering instant essay grading and plagiarism detection. They efficiently evaluate higher-order thinking skills while ensuring assessment integrity.
Expert Systems in Psychometric Analysis: Expert systems in educational assessment can both enhance test validity and reliability through robust psychometric analysis and provide real-time, personalized feedback to identify students' academic strength and weakness.
Natural Language Processing (NLP) for Open-Ended Responses: NLP can process qualitative data from open-response items to generate insights into curriculum effectiveness or identify potential areas for assessment improvement.
Robotics in Hands-On Assessment: Robots can be used to conduct hands-on assessments in fields like medicine or engineering, accurately measuring a student's ability to perform specific tasks.
Fuzzy Logic for Marking Ambiguity: In assessments with subjective elements, such as artistic performance or ethical reasoning tasks, fuzzy logic algorithms can offer nuanced marking that more accurately reflects the complexity of the subject matter.
Data Unification and Integration: This method involves AI-driven systems that enforce consistent data management policies by meticulously tracking data lineage and seamlessly integrating data from various sources. The outcome is the maintenance of data quality and a comprehensive view of student performance.
Criterion-Based Archiving and Secure Access Control: With AI-empowered archiving tailored to criteria like student performance or assessment type, organizations can efficiently store and access historical records and test results. Utilizing AI for encryption and adaptive access controls ensures the secure sharing of student data and assessment materials, facilitating collaboration while stringently maintaining student confidentiality.
In summary, AI-driven data management in educational assessment offers practical benefits, including streamlined data collection, enhanced quality assurance, data-driven insights, robust security, governance, and efficient archiving. This transforms assessment, benefiting educators and students while ensuring effective data management for the future.
Assessment bodies are actively modernizing their assessments to enhance precision, accessibility, and seamless integration into the digital landscape. There are different scenarios on how a high-stake assessment body may decide to modernize the assessment life cycle along with the data management process. In this context of decision-making, achieving key goals like reducing time-to-report, minimizing errors, and managing costs requires the implementation of a comprehensive data pipeline solution6 based on the following three core design principles:
Sequential Workflow: In this principle, the data pipeline resembles a well-organized assembly line, complete with validation checkpoints at every step, from data extraction through analysis to reporting. These checkpoints ensure real-time tracking of the process, with key stages including import validation, transform validation, and final verification before reporting.
Dynamic and Versatile Workflow: Similar to a chameleon's adaptability, this approach empowers the pipeline to efficiently manage data from any assessment without requiring code replication. It operates as a dynamic and automated system, ensuring seamless data processing and analysis.
Automated Workflow: This aspect of the solution is like the concept of a self-driving car. It operates autonomously, akin to automated driving systems, while also providing users with the flexibility to make adjustments through preset scripts. This balance between automation and customization ensures a streamlined and adaptable process.
A modernized workflow solution designed with the above three key principles is presented step by step, following a carefully orchestrated process in the following order:
Efficiency and Improvement: A solution built upon the three core principles enhances efficiency, reduces errors, and manages costs in educational assessment. Its sequential, dynamic, and automated features form a strong framework for psychometric analysis, technical reporting, and quality assurance, leading to improvements in large-scale assessments.
The Assessment Journey Begins: The assessment process starts as assessment instruments are meticulously designed, rigorously validated, and thoughtfully administered. Scoring is carried out either manually or through automated algorithms.
Data in the Cloud: Scored data finds its home on a scalable cloud infrastructure, ensuring accessibility and security.
Data Transfer and Transformation: Data embarks on a journey to a dedicated data server, where a script is utilized to extract predefined assessment data and load it into memory. This data then undergoes transformations, shaping it into an item matrix that aligns with operational needs and policies.
Analyzing the Data: The journey continues with the analysis phase. Classical Test Theory (CTT) analysis is conducted, producing item-level data, distractor reports, and response frequency analysis.
IRT Analysis: The assessment data then undergoes IRT analysis, encompassing initial item calibration, equating procedures, and marking runs.
Rigorous Verification: Before reporting is enabled, a meticulous verification process is carried out to ensure data reliability and validity.
Summarizing Results: This process culminates in the generation of an HTML report that succinctly summarizes assessment results. This report is seamlessly integrated back into the original system.
User-Friendly Reporting: The final step involves creating user-friendly reports, including Excel spreadsheets and graphical representations such as Item Characteristic Curves (ICC), Item Information Functions (IIF), and Test Characteristic Curves (TCC).
Stakeholder Engagement: These reports are easily accessible to stakeholders who can download and review them. This marks the conclusion of the comprehensive assessment process, including the analysis of performance by various student groups over time.
In summary, the adoption of a comprehensive data pipeline solution guided by the three principles brings efficiency, accuracy, and cost savings to the modernization of high-stakes assessments. This approach ensures data reliability and accessibility, resulting in user-friendly reports that empower stakeholders to make informed decisions and enhance the assessment process. Embracing these core principles advances the precision and integration of educational assessment into the digital age.
The field of educational assessment is at a turning point, transformed by technology and global initiatives. Data management in assessment in education is more pivotal than ever, and AI is accelerating this shift. AI and machine learning are not just enhancements; they are becoming indispensable in shaping educational policies, practices, and research. These technologies also enrich the psychometric reliability of assessments, pushing the boundaries of classic methods. However, technological advancement brings ethical and practical challenges, such as ensuring security, fairness, and data integrity. Therefore, rigorous scrutiny of automated systems is imperative. Educational bodies, policymakers, and stakeholders must collaborate and constantly update best practices to realize the full potential of modern assessments. Such collective action aligns with global objectives like the Sustainable Development Goals and the Education 2030 agenda. Moving forward, the blend of technological innovation and ethical responsibility will guide us towards educational excellence in the 21st century.
Vali Huseyn is an educational assessment specialist who provides expert guidance on the modernization of key stages in the assessment life cycle including assessment authoring (item banking), registering, administering, marking, data analyzing, and reporting. He supports and serves the assessment community through strategic partnerships with various assessment technology organizations, certification bodies, and research centres. He was also instrumental in supporting the modernization of large-scale assessments at The State Examination Centre of Azerbaijan as their Head of Strategic Partnerships and Project Management Unit Lead.
Feel free to connect with Vali on LinkedIn (https://www.linkedin.com/in/valihuseyn/) to learn more about the best practices in transitioning to an online assessment environment.