No matter how accurately examination forms are assembled to reflect a specific degree of difficulty, there is always some variability. The goal is to make certain that subsequent forms have the same degree of difficulty as the original form, but this is not always possible. For that reason, a standard scale for reporting results is used.
Suppose one candidate takes a relatively easy examination and receives a raw score of 82. During another administration, a candidate takes a relatively difficult version of the same examination and receives a raw score of 77. Is the first candidate more competent than the second? If just raw scores were used to answer this question, the answer would be yes. But, since the inherent difficulty of the two examinations was different, that answer might be incorrect. In order to control for fluctuations in the difficulty of examination forms, raw scores need to be converted to a standard scale so that accurate statements regarding these two candidates’ competency can be made. Once the scores are converted to a scaled score, it might be that the second candidate actually performed better.
The use of scales is common in every day life. For example, if you were told the temperature was 32 degrees, would this be warm or cold? But, if you knew the scale being used as well as the range from freezing to boiling, you would be better able to interpret the 32 degree temperature. If the scale was Fahrenheit, it would be quite cold; if the scale was Celsius, it would be relatively warm. By knowing the temperature scale being used, we can make a more accurate assessment. The same is true with examinations. By converting raw scores to a scaled score having a specified mean and range, we can make a more accurate assessment of the candidate’s knowledge.
Scaled scores are valuable for several reasons. First, they make the differences in difficulty across forms disappear so that a specific scaled score has the same meaning across time. Second, they eliminate the potential unfairness to candidates who might receive a more difficult exam than other candidates received. Third, they provide decision makers with a standard to use across time so that even with fluctuations in the raw scores, decision makers know that candidates must receive a specified scaled score in order to be deemed competent to practice. And finally, they assist candidates who fail an examination by providing comparable scores across test administrations and allowing the candidates to determine how far below the passing score they were for each administration.
Scaled scores are
merely a transformation of raw scores to a standard so that
comparable results can be reported even if there is variance in the
difficulty of the examination forms.
ERAC's Question and Answer Series is prepared by the CLEAR Examination Resources and Advisory Committee (ERAC).
See also: What is equating and why is it used?
©2007 The Council on
Licensure, Enforcement and Regulation (CLEAR)