negar sharifi; mohammad falsafi; noorali farokhi; ehsan jamali
Abstract
Background: Test fairness is one of the main challenges in transition from paper- pencil towards computerized adaptive testing (CAT). Aim: This study was aimed at investigating differential item function (DIF), assessing intervening factors in clarifying DIF and suggesting the optimal method for DIF ...
Read More
Background: Test fairness is one of the main challenges in transition from paper- pencil towards computerized adaptive testing (CAT). Aim: This study was aimed at investigating differential item function (DIF), assessing intervening factors in clarifying DIF and suggesting the optimal method for DIF in computerized adaptive testing. Method: The empirical method was applied based on the nature of the study area. Data gathering procedure and manipulating the variables were done using simulation method. The responses of 1000 examinees (reference and focal group with equal 500 numbers) to item bank of 55 dichotomous items were simulated based on 3-parameter logistic model with 20 iterations. Fifteen items were manipulated in terms of DIF type and magnitude and test impact was evaluated based on mean difference of comparison groups. Computerized adaptive test with 30 items was administered via Firestar software package. Analysis was done by logistic regression (LR) and item response theory-likelihood ratio test (IRT-LRT) and the methods were compared based on their power and type I error rate. Results: Type I error rate of likelihood ratio test was less than logistic regression. The power of the methods was influenced by type, magnitude of DIF and test impact. Comparing with logistic regression, Item response theory-likelihood ratio test had more power in detecting uniform DIF for the impact and no-impact conditions and it showed more power by increasing the magnitude of DIF. The two methods showed no difference in assessing non-uniform DIF and both of them were poor. Conclusion: Given the power and type I error rate, likelihood ratio test is an optimal approach in detecting uniform DIF. However, assessing non-uniform DIF requires further investigation.
Masoomeh Estaji; Negar Babanezhad Kafshgar
Abstract
The current study was aimed at exploring Differential Item Functioning (DIF) items in Iranian TEFL MA Entrance Exam employing two beneficial and valuable statistical methods: Logistic Regression (LR) and Mantel-Haenszel (MH). Besides, the founded DIF items were gone through a content analysis in order ...
Read More
The current study was aimed at exploring Differential Item Functioning (DIF) items in Iranian TEFL MA Entrance Exam employing two beneficial and valuable statistical methods: Logistic Regression (LR) and Mantel-Haenszel (MH). Besides, the founded DIF items were gone through a content analysis in order to explore the potential linguistic resources of such biases. To this end, the answer sheets of 2217 female and 735 male examinees in 2015 were analyzed to find items containing DIF. The findings of LR technique determined eight items as DIF containing items. Half of the items were advantageous to the men and the other half of the items favoured women. MH procedure explored eleven items as DIF flagging items. Out of these items, six items favoured male test takers and five items showed tendency toward female test takers. No particular linguistic source for such deviated behaviour of items was proposed through the content analysis of the DIF items.
Abstract
According to many experts, entrance examination is the most important test in Iran and demographic characteristics such as gender, region (socio-economic status) and province (language) can affect the performance of participants’ responses to the test items. Controlling for the ability of examinees, ...
Read More
According to many experts, entrance examination is the most important test in Iran and demographic characteristics such as gender, region (socio-economic status) and province (language) can affect the performance of participants’ responses to the test items. Controlling for the ability of examinees, if the answers to the test items are the function of demographic characteristics, the test items have Differential Item Functioning (DIF) toward them. The purpose of this investigation is to study Differential Item Functioning of the test items of Entrance Examinations of National Organization of Educational Testing across the demographic characteristics. The research sample consisted of all examinees of a test booklet which included some special exams of the experimental groups of Konkur from 2008 to 2011. Binary logisitic regression was used in order to analyzing DIF. After assuring unidimensionality through NOHARM approach, DIF analysis results indicated that the most number of detected DIF items were related to gender, region (socio-economic status) and province (language) variables respectively, but effect sizes were very small and negligible. Nevertheless, based on current research results, it is recommended that subject matter committees are formed to detect biased items based on final decisions of the committees and will be considered in designing test items in future.