Options
Effect of different scoring approaches upon credit assignment when using Multiple True-False items in dental undergraduate examinations
ISSN
1396-5883
Date Issued
2018-06-22
Author(s)
Kanzow, Philipp
Witt, Daniela
DOI
10.1111/eje.12372
Abstract
Introduction: Various scoring approaches for Multiple True-False (MTF) items exist. This study aimed at comparing scoring results obtained with different scoring approaches and to assess the effect of item cues on each scoring approaches' result.
Materials and methods: Different scoring approaches (MTF, Count-2, Count-3, "Vorkauf-Method," PS50 , Dichotomized MTF, "Blasberg-Method," Multiple response (MR), Correction for Guessing, "Ripkey-Method," Morgan-Method, Balanced Scoring Method) were retrospectively applied to all MTF items used within electronic examinations of undergraduate dental students at the University Medical Center Göttingen in the winter term 2016/2017 (1297 marking events). Item quality was evaluated regarding formal parameters such as presence of cues and correctness of content. Differences between scoring results of all scoring approaches and the differences between each methods' scoring results of items with and without cues were calculated by Wilcoxon rank sum tests (P < .05).
Results: Average scoring results per item highly differed between the scoring approaches and ranged from 0.46 (MR) to 0.92 (Dichotomized MTF). Presence of cues leads to significantly higher scoring in case of all scoring approaches (P < .001; +0.14 on average). However, effect of cues differed amongst scoring approaches and ranged from +0.04 (Dichotomized MTF) to +0.20 (MR).
Conclusion: Scoring of MTF items is complex. The data presented in this manuscript may help educators make informed choices about scoring algorithms.
Materials and methods: Different scoring approaches (MTF, Count-2, Count-3, "Vorkauf-Method," PS50 , Dichotomized MTF, "Blasberg-Method," Multiple response (MR), Correction for Guessing, "Ripkey-Method," Morgan-Method, Balanced Scoring Method) were retrospectively applied to all MTF items used within electronic examinations of undergraduate dental students at the University Medical Center Göttingen in the winter term 2016/2017 (1297 marking events). Item quality was evaluated regarding formal parameters such as presence of cues and correctness of content. Differences between scoring results of all scoring approaches and the differences between each methods' scoring results of items with and without cues were calculated by Wilcoxon rank sum tests (P < .05).
Results: Average scoring results per item highly differed between the scoring approaches and ranged from 0.46 (MR) to 0.92 (Dichotomized MTF). Presence of cues leads to significantly higher scoring in case of all scoring approaches (P < .001; +0.14 on average). However, effect of cues differed amongst scoring approaches and ranged from +0.04 (Dichotomized MTF) to +0.20 (MR).
Conclusion: Scoring of MTF items is complex. The data presented in this manuscript may help educators make informed choices about scoring algorithms.