Reyhane Rahimi; Aso Mojtahedi
Abstract
The goal of this research is to explore the Likert scale questions using two distinct methods: Classical Test Theory and Item Response Theory. By comparing the results of these approaches, the study aims to address the question: "Do the outcomes from these two methodologies align, or do they ...
Read More
The goal of this research is to explore the Likert scale questions using two distinct methods: Classical Test Theory and Item Response Theory. By comparing the results of these approaches, the study aims to address the question: "Do the outcomes from these two methodologies align, or do they contradict each other?" The research design followed a descriptive methodology and utilized secondary analysis techniques. The study population consisted of 977 junior high school students. After the data screening process, the final sample size for analyzing extraversion items was 783 students, 763 students for openness items, and 784 students for conscientiousness items. The research instruments were the three subscales of extraversion, openness, and conscientiousness from the Neo Personality Test. The statistical analysis yielded results indicating that a strong internal consistency among items enhanced the accuracy and validity of outcomes derived from the graded response model. However, when items exhibit low internal consistency, caution should be exercised, as the model may yield erroneous thresholds or discrimination coefficients (i.e., false negative or positive). Overall, combining multiple methods of statistical analysis can significantly contribute to more effective analysis and obtaining highly accurate results.
Abstract
Abstract Test of common types of assessments that are done in the education system. Test results should be invoked for any of the validity, reliability, and has the ability to run each test covers a different aspect of. Poorly made tool that is not only not useful, but dangerous as well. self test ...
Read More
Abstract Test of common types of assessments that are done in the education system. Test results should be invoked for any of the validity, reliability, and has the ability to run each test covers a different aspect of. Poorly made tool that is not only not useful, but dangerous as well. self test mechanism so carefully constructed, run and score can be read. To ensure fairness test scores from different test forms with methods that are commonly referred to as equating is adjusted. Alignment is commonly used as a statistical method for matching test scores to account for differences between different forms of unwanted application form so that scores are comparable. National Education Assessment is described and appropriate manner. The purpose of this study is to Anchor matched groups design and plans for the disparate groups with anchor test and linear equating methods, mean Equipercentile the classical test theory and compare it with the results of the new theory for measuring equating, the equating position measurement system training in the proper manner must be presented and explained.
Noor-Ali Farroukhi; laila bahrami
Abstract
Background: Recognizing multiple sources of measurement error and estimates each source separately, distinguishes between relative and absolute decisions, distinguishes between fixed and random facets and also the capability of dealing with different D study designs can be mentioned as the strength points ...
Read More
Background: Recognizing multiple sources of measurement error and estimates each source separately, distinguishes between relative and absolute decisions, distinguishes between fixed and random facets and also the capability of dealing with different D study designs can be mentioned as the strength points of generalizability theory which have no corresponding statuses in classical test theory. Generalizability theory is unknown for our researchers and there are rare researches in this area. Objective: The Purpose of this article was introduction of generalizability theory and presentation the practical applicability of this theory in assessing the reliability of measurements. Results: In addition to comparison between classical test theory and generalizability theory, conceptual framework of generalizability theory was explained easily. Also, in this article the process of design, analysis and interpretation of a measurement study in shape of an example with relevant calculations and equations explained in detail in 15 steps to guide researchers and test developers who aimed to assessing reliability. Conclusion: This article shows that utility of generalizability theory in reliability estimation especially in complicated measurement situations is more than classical test theory. Generalizability theory enables researchers to decrease errors in plan of measurement through optimization proceedings which will increase accuracy in generalization of results.
Mohammadreza Falsafinejad; Noorali Farroukhi; laila bahrami
Abstract
Background: High school final exams are one of the most decisive tools for scientific assessment of students. Given the importance of this examinations, carried out systematic research on the quality and functionality of their questions is necessary to separate volunteers. Aim: The aim of this study ...
Read More
Background: High school final exams are one of the most decisive tools for scientific assessment of students. Given the importance of this examinations, carried out systematic research on the quality and functionality of their questions is necessary to separate volunteers. Aim: The aim of this study is to determine the psychometric properties and capabilities of the final exam questions in the field of experimental biology and Persian literature in the selection of candidates for admission to undergraduate courses. Methodology: The population, all of the questions of the third year high school final exams string in June 2011. For determining the psychometric properties of these questions, performance of 600 students, selected randomly, in aforementioned lessons in the school districts of the city of Khorramabad were used. Findings: Estimated reliability coefficient in biology and Persian literature was determined by Cronbach alpha, (0.97, 0.96). According to CTT, the average coefficient of difficulty and discrimination in Persian literature and biology test were (0.65, 0.57) and (0.50, 0.65) respectively. In analyzing the IRT; two-parameter model fitted to the data revealed more. Also, in IRT models, the average difficulty and discrimination of questions for Persian literature and biology courses were (-0.69, 1.03) and (-0.09, 0.96) respectively. The most informant rate for two tests of Persian literature and biology, belongs to levels of ability which are (-0.7, 0.1) respectively, the agreement between the CTT and IRT in terms of discrimination parameter the two courses, were (%98.36, %93.59) respectively. Conclusion: Due to the important decisions based on high stakes tests, implications of final examinations in the selection of candidates were discussed.
esmail mam sharifi; ali delavaran; azadeh boluki; somayeh shabani
Volume 3, Issue 7 , April 2012, , Pages 1-34
Abstract
Background: This study was administered with the aim of investigating the psychometric properties of the theoretical part of the driver’s license test. The study sample included the responses of a group of 350 subjects, selected through multi-cluster sampling, to the 30 randomly chosen theoretical ...
Read More
Background: This study was administered with the aim of investigating the psychometric properties of the theoretical part of the driver’s license test. The study sample included the responses of a group of 350 subjects, selected through multi-cluster sampling, to the 30 randomly chosen theoretical questions of the driver’s license test. In order to investigate the psychometric properties of the test questions, the results obtained from both the classical test theory and the item-response test theory were compared and evaluated. The study adopted a descriptive methodology and the accuracy of the sample was verified at the beginning. Factor analysis and Cronboch method were used to determine the unidimensionality of the test. Then the test questions were analyzed based on the classical and the item - response test theories and the question parameters (difficulty, discrimination and guessing) and the ability due to uncertainty were extracted using the simultaneous estimation method. Results: The results of the study confirmed the unidimensionality and independence of the test. After determining the main theory assumptions of the IRT, the model - data fitting was evaluated and the results of the two-parameter model showed better fitting with the data. In the next step, the question parameters and the ability factor were evaluated with the T-test. Results showed that there was no significant relationship in determining the accuracy of estimating the difficulty, slope and ability parameters between the classical theory and the item - response theory. In order to check the reliability and the stability of the test results on the first run, a test - retest was administered to a sample of 30 subjects. Since the present test is a kind of reference criterion, Kappa coefficient of reliability was used to settle the disagreements. Results showed that there is a significant relationship between the first run and the second run and moreover the test has sufficient reliability and validity to be administered in different cases. Conclusion: The analysis of the question parameters and subjects confirmed the simplicity of the test and its high capacity to distinguish the ability of the subjects. It can, therefore, be concluded that the test questions are more accurate with subjects with lower ability. In comparison to the classical theory, the estimated ability in the item - response theory is closer to the real rate. According to the estimated abilities, questions can be selected based on subjects’ abilities which can finally lead to the creation of a question bank.
behnam karimi; m Falsafinejad; fariborz dortaj
Volume 2, Issue 6 , January 2012, , Pages 1-23
Abstract
Background: ease in scoring,performingand identity of multiple choice tests has caused that those apply as the essential instruments in large scale assessments. There was intense criticism toward multiple choices. For example, those not perform all of educational goals (those assess low cognitive levels) ...
Read More
Background: ease in scoring,performingand identity of multiple choice tests has caused that those apply as the essential instruments in large scale assessments. There was intense criticism toward multiple choices. For example, those not perform all of educational goals (those assess low cognitive levels) and because of using guess to answering questions. Herein, some people for solving of these problems were suggested that we should increase choices of questions.
Objectives: The objective of this research was the study of effects of number of item choices on psychometric characteristics of test and items and also on estimated ability of subjects in classical test theory and item- response theory (IRT).
Methods: The statistical population was all of high school’ students of Shiraz. That 608 of them were randomly selected as sample group. In order to response to study questions, we used the empirical method and for data collecting we used two language and arithmetic tests that were provided to this goal.
Results: Data analysis indicated that there was no significant effect of item choices on item parameters and the effect of item choices on estimated psychometric characteristics of subjects in different tests is equal. Furthermore, there was difference between estimated parameters in classical test theory and item-response theory (IRT).
Conclusion: After checking assumptions of item response theory (IRT), this was appeared that data have better fitted with two- parameter model and there was no difference between item choices and fitting with model. In addition, there was difference between estimated ability and item choices too.