The scoring system is designed to measure the concordance between examinees’ scripts and scripts from a panel of experts.
To create a scoring key, the SCT is submitted to a reference panel. The composition of the panel members depends on the assessment situation. Certified practicing specialists would be used for a certification examination, while family physicians working in the related field to the test would be considered as the reference panel for a continuing medical education activity. Most SCTs built so far for research purposes with residents and medical students used specialists from a chosen field to make up the reference panel. During their completion of the test, physicians are asked to identify the questions they find confusing or not relevant. These questions are then either discarded or rewritten(24). The number of physicians used to develop the scoring system must be sufficient to express the variability in answers that a reference panel may show for each item. A recently published study suggests that ten to 15 physicians are needed to maximize the reliability and validity of the aggregate scoring key.
The scoring process of the SCT is based on the principle that any answer reflects an acceptable opinion and that answers for which there is no agreement among all the panel should not be discarded. In fact, questions that have straightforward answers for which all members of the panel agree upon do not make good SCT cases. Those questions are not probing areas of uncertainty. For SCT questions, any answer given by a physician, who is part of the panel, has an intrinsic value, even if others might not agree with it. This scoring method is termed aggregate scoring, initially proposed by Norman and then Norcini.
When scoring each question, examinees answers receive a credit mark corresponding to the proportion of panel members who selected that rating on the scale. The maximum score for each question is one (1) for the modal answer. Other panel members’ choices receive a partial credit. Answers not chosen by panel members receive zero. To get this proportional transformation, the number of members having provided an answer on the Likert scale is divided by the modal value for the item. For example, if there are 15 members in the reference panel who answered a question on a given SCT in the following way: none chose the “-2” and “-1” ratings, two chose the “0” rating, nine answered the “+1” rating, and four chose the “+2” rating. The modal answer is in this example the “+1” rating. Choosing this rating will give 1.0 point to the examinee. The “0” rating will give 0.22 point (2/9) and the “+2” rating 0.44 point (4/9).
Anchors on SCT question | -2 | -1 | 0 | +1 | +2 |
Number of times anchor was chosen by a panel member | 0 | 0 | 2 | 9 | 4 |
Calculation based on modal answer | 0/9 | 0/9 | 2/9 | 9/9 | 4/9 |
Points attributed to examinee | 0 | 0 | 0.22 | 1.0 | 0.44 |
By scoring SCTs with this modal method, all questions are worth a maximum of 1.0 point. The total score for an SCT is the sum of credits obtained on each question, which is transformed in the end to get a maximum score of 100, more easily understood by examinees. Again, as an example, if a given SCT has 65 questions, the maximum crude score obtainable by an examinee is 65. If one obtained 54.5 out of 65 on that SCT, the final score provided would be 83.8 (54.5/65). Scoring is weighted by the degree of agreement among the reference panel. This weighting reflects the way physicians overall answer the question. A score of 100 indicates that the examinee gave on each question the answer that most experts provided, and the lower the score the farther examinees are from the panel’s given script for each situation.