Hassan Moshtaghian Abarghouei; Mohammad Reza Flasafi Nejad; Noor Ali Farrokhi
Abstract
Identifying distractors as sources of Differential Item Functioning(DIF) in polytomous items has great importance to designers and analysts. Although DIF is one of the common methods for examining the measurement invariance, It is accompanied by challenges and limitations, especially in multiple-choice ...
Read More
Identifying distractors as sources of Differential Item Functioning(DIF) in polytomous items has great importance to designers and analysts. Although DIF is one of the common methods for examining the measurement invariance, It is accompanied by challenges and limitations, especially in multiple-choice items. The purpose of this study was to assess the performance of the Nested logit Model(NLM) for detecting Differential Distractor Functioning(DDF) by using experimental (simulated data) and descriptive-analytical (real data) methods. Six items were simulated under different conditions of difficulty and slope, ability distribution, presence or absence of DIF/DDF, and DIF/DDF magnitude, with a sample size of 2000 and 50 replicates. The data of the Math Entrance Exam (D-form,2018), with a random sample of 2000 men and women constituted the real data. Based on the results of the simulation analysis: The NLM revealed 88% of DIF and 97% of DDF, on average. the Type I error rates is very close to the theoretically expected values, although it showed some inflation in unequal distribution conditions. according to the findings, the detection rate was influenced by the item parameters(difficulty and slope) and the DIF or DDF levels. Based on real data analysis, 2 items represented both DIF(Large and Medium) and DDF (Partial to Moderate) simultaneously, whereas in the NRM approach, 11 items detected as DIF/DDF; so, as expected the approaches based on “divided by distractor” strategy, fewer items were detected as DIF/DDF. The NLM while separating the DDF from the DIF test, allows for a clear evaluation of whether the distractor may be responsible for DIF. Since high-stakes tests have a special role in selection and DIF and DDF analyzes have a special place in determining the validity and measurement invariance of these exam items, it is recommended to screen the bias items, DIF/DDF comprehensive analyzes based on NLM be used.
Masoomeh Estaji; Negar Babanezhad Kafshgar
Abstract
The current study was aimed at exploring Differential Item Functioning (DIF) items in Iranian TEFL MA Entrance Exam employing two beneficial and valuable statistical methods: Logistic Regression (LR) and Mantel-Haenszel (MH). Besides, the founded DIF items were gone through a content analysis in order ...
Read More
The current study was aimed at exploring Differential Item Functioning (DIF) items in Iranian TEFL MA Entrance Exam employing two beneficial and valuable statistical methods: Logistic Regression (LR) and Mantel-Haenszel (MH). Besides, the founded DIF items were gone through a content analysis in order to explore the potential linguistic resources of such biases. To this end, the answer sheets of 2217 female and 735 male examinees in 2015 were analyzed to find items containing DIF. The findings of LR technique determined eight items as DIF containing items. Half of the items were advantageous to the men and the other half of the items favoured women. MH procedure explored eleven items as DIF flagging items. Out of these items, six items favoured male test takers and five items showed tendency toward female test takers. No particular linguistic source for such deviated behaviour of items was proposed through the content analysis of the DIF items.
mohammad ahmadi deh qutbuddini; ebrahim khodai; Valiollah Farzad; ali moghadam zadeh; masoud kabiri
Abstract
The present study has been done with the purpose of investigating the dimensionality and differential item functioning of the testlet-based test of Iran's PIRLS 2011. In order to analyze the dimensionality, graded response and bi-factor item-response theory models were used with full-information maximum ...
Read More
The present study has been done with the purpose of investigating the dimensionality and differential item functioning of the testlet-based test of Iran's PIRLS 2011. In order to analyze the dimensionality, graded response and bi-factor item-response theory models were used with full-information maximum likelihood estimation method and to analyze the differential item functioning multiple-group bi-factor model of Cai et al (2011) was applied. The results of the dimensionality investigation showed that the bi-factor model is better fitted to the data than the graded response model both in Iran's total sample and in boy and girl groups. The results of testlets effect variance showed that effects of second factors on Iranian students' performance in two testlet related to literal comprehension, has caused dimensionality in Iran's PIRLS testlets. The results showed that there was no significant difference in average students' performance of the boy and girl in general latent trait of reading comprehension, but the difference between the average reading proficiency of the boy and the girl in three literal and three informational testlet in favor of girls was significant. The result of differential items functioning based on the bifactor model showed that many items have an uniform and non-uniform differential item functioning, and boys in multiple-choice items and girls in constructed response items have better performances. In general, the results showed that in Iran's PIRLS 2011 testlets, the traits related to the two literal comprehension testlets were differently perceived between boy and girl students, and these two testlet had more local item dependence among girls than boys. Also, the results indicated a difference between the performance of Iranian boy and girl students in the mixed items format test of PIRLS.
Abstract
According to many experts, entrance examination is the most important test in Iran and demographic characteristics such as gender, region (socio-economic status) and province (language) can affect the performance of participants’ responses to the test items. Controlling for the ability of examinees, ...
Read More
According to many experts, entrance examination is the most important test in Iran and demographic characteristics such as gender, region (socio-economic status) and province (language) can affect the performance of participants’ responses to the test items. Controlling for the ability of examinees, if the answers to the test items are the function of demographic characteristics, the test items have Differential Item Functioning (DIF) toward them. The purpose of this investigation is to study Differential Item Functioning of the test items of Entrance Examinations of National Organization of Educational Testing across the demographic characteristics. The research sample consisted of all examinees of a test booklet which included some special exams of the experimental groups of Konkur from 2008 to 2011. Binary logisitic regression was used in order to analyzing DIF. After assuring unidimensionality through NOHARM approach, DIF analysis results indicated that the most number of detected DIF items were related to gender, region (socio-economic status) and province (language) variables respectively, but effect sizes were very small and negligible. Nevertheless, based on current research results, it is recommended that subject matter committees are formed to detect biased items based on final decisions of the committees and will be considered in designing test items in future.
Asghar Minaei; Zahra Ghafari
Abstract
The biggest worry, discussed in tests being inequitable, is the presence of bias possibility or differential functioning. Because bias causes test validity to be suspected and doubted. Objective: In this research differential item functioning on the whole 14 blocks of mathematics tests of TIMSS in grade ...
Read More
The biggest worry, discussed in tests being inequitable, is the presence of bias possibility or differential functioning. Because bias causes test validity to be suspected and doubted. Objective: In this research differential item functioning on the whole 14 blocks of mathematics tests of TIMSS in grade 8 between Iranian girls and boys has been studied using IRT approach. Method: In the order that, first, data have been recoded by SPSS and the assumption of items being unidimensional in all blocks, by NOHARM software have been studied. In the next step the best model that is known as “base model” has been fit for data by BILOG-MG software. In the next process from that base model for studying the items having differential functioning and anchor, IRTLRDIF (Thissen,2001) has been used and at last for final estimation of item and ability parameters, MULTILOG software has been used. Findings show that from 219 items, being studied, of mathematics test in grade 8 of TIMSS 2011, 144 anchor items and 75 items have DIF and items have differential functioning and it is to the loss of girls, the focal group.
asghar minaei
Volume 3, Issue 11 , April 2013, , Pages 113-151
Abstract
Abstract
It is necessary for international tests, such as TIMSS and PIRLS, to have Structural Equivalence, also known as Structural Comparability. In other words, test items should be functionally identical for all the participant countries and groups. The present research was an attempt to examine ...
Read More
Abstract
It is necessary for international tests, such as TIMSS and PIRLS, to have Structural Equivalence, also known as Structural Comparability. In other words, test items should be functionally identical for all the participant countries and groups. The present research was an attempt to examine the structural comparability of TIMSS 2007 8th-Grade Science Test and differential functioning of its items among Iranian and American students, as well as the effect of items with differential functioning on the performance of Iranian students. A combination of confirmatory factor analysis and Item Response Theory was used to analyze the data and answer research questions. The results of factor analysis indicated that the science test had structural comparability between the two groups. This finding suggests that both Iranian and American students use an identical conceptual framework to answer the test items. However, the results of differential item functioning analysis indicated that 62% of TIMSS 2007 8th-Grade Science Test items had differential functioning against Iranian students. In short, the poor performance of Iranian students in TIMSS 2007 Science Test cannot be attributed to differential item functioning, and the causes should be sought elsewhere. Ministry of Education officials and administrators should try to teach the key concepts of any field in an interrelated manner and to cultivate divergent and multidimensional thinking skills and ability in students by preparing appropriate educational contents. Moreover, attempts must be made to distance the system from the traditional method of teaching, which mainly consisted of theoretical lectures, and involve students in practical and laboratory activities.
masoud gerami pour; mohamad reza falsafi nejhad; ali delavar; nour ali farrokhi
Volume 3, Issue 9 , October 2012, , Pages 105-122
Abstract
Although numerous methods have been proposed for detecting biased items, but a few researches have empirically investigated the power and efficiency of these methods. The main goai of this research was to apply IRT-based likelihood ratio test and confirmatory factor analysis in detecting differential ...
Read More
Although numerous methods have been proposed for detecting biased items, but a few researches have empirically investigated the power and efficiency of these methods. The main goai of this research was to apply IRT-based likelihood ratio test and confirmatory factor analysis in detecting differential item functioning (DIF) in high stakes tests. Monte Carlo simulated methods were used to answer the research questions. Required data were simulated through WINGEN2 in the form of 100 tests with 30 items that were fitted to 2PL model. Distributions of item difficulties and discrimination powers of all the tests were normal. Responses of 1000 examinees were also simulated with normal ability distribution for each test. Estimation methods of marginal maximum likelihood and weighted least squares were used to detect type and magnitudes of DIF. Data analysis in consecutive replications showed that IRT based methods were superior to CFA methods in detecting DIF. This superiority was observed in ail DIF conditions (low, moderate and high). After all. differences between two methods were small at 1000 sample sizes. No differences were observed between two methods in detecting different types of DIF. Results of this study confirm the results of Meade and Lautenschlager (2004; 2006), but is in contrast with the results of Flowers et al (2002). Finally, Likelihood Ratio test is recommended if there is any limitation in applying methods of detecting DIF.
m Habibi; fatemeh moradi; balal Izanlo
Volume 2, Issue 6 , January 2012, , Pages 1-27
Abstract
Background: Discussion about invariance of questions and tests is an important issue in assessment.
Objectives: The present study was conducted to compare the invariance of the parameters in item-response theory and confirmatory factor analysis.
Methods: After reviewing the related basics of each approach, ...
Read More
Background: Discussion about invariance of questions and tests is an important issue in assessment.
Objectives: The present study was conducted to compare the invariance of the parameters in item-response theory and confirmatory factor analysis.
Methods: After reviewing the related basics of each approach, the researcher compared the invariance of the parameters in each approach based on empirical data result from International Reading Literacy Study (PIRLS) Test. The sample was 5000 Iranian students (half female and half male) in 2006 who responded to six questions which were related to the scale of attitude toward reading.
Results: Data analysis showed that question 6 is biased using both item-response theory and confirmatory factor analysis. The results, however, were different considering questions 1, 3 and 4. Question 1 was found to be biased based on item-response theory only; questions 3 and 4, on the other hand, were found to be biased based on confirmatory factor analysis.
Conclusion: It is suggested that both approaches be employed when deciding on the invariance of the parameters, since making decisions otherwise will be misleading. Also, it is offered that intercept and differences in the distribution of the ability of the groups and their effects on the invariance be considered as primary.