What is equating and why is it used?



Even when examination forms are carefully designed to be equivalent, some variation in difficulty between forms is likely.  Variations in difficulty become especially problematic when the purpose of the examination is to determine whether someone is sufficiently competent to practice a profession or obtain a credential.  It is inherently unfair for some candidates to be given harder examinations while others are given easier examinations unless these differences are taken into account.  Equating addresses these differences by determining the equivalent passing score on multiple examination forms.  Equating is often used in conjunction with scaling but these are actually two different processes.  When used together, the equating is performed first followed by the scaling. 


Suppose a new certification examination is developed by an organization.  The organization adheres to all of the �best practices� in psychometrics and the new examination is valid and reliable.  The organization administers the examination and establishes a minimum passing score of 88 items correct out of 110.  Using this standard, three fourths of the candidates pass and are awarded the certification.  The organization then develops a second form of the examination but when they apply the passing standard of 88 items correct, they are concerned because only half of the candidates will pass.


There are two possible explanations for the change in pass rate; differing candidate abilities and differing exam form difficulty.  The second group of candidates might have been less well prepared for the examination or perhaps they really are less able.  Alternately, the second form of the examination might be more difficult than the first form.  Equating allows us to investigate both of these possibilities and to address them so that a fair standard can be applied to the new examination form.


While there are several methods for conducting equating, one of the most frequently used methods is based on having a subset of questions that is common to both examinations.  Suppose that 40 of the items on the 110 item certification examination appeared on both forms.  By comparing the performance of the first and second groups on the 40 item subtest, the organization can evaluate whether the two groups of candidates appear to be performing at the same or different levels of ability.  Assuming they are performing at equal levels on the common items, the next step is to compare their performance on the 70 items that were unique to the two examination forms. 


The organization might find that the first group of candidates answered an average of 60 out of 70 items correctly on the first examination form but the second group of candidates only answered an average of 50 out of 70 items correctly.  This finding indicates that the second examination form is harder than the first.  As a result, the passing score for the second examination form should be lower than that used on the first examination form.  After performing all of the calculations, the organization determines that the passing score on the second examination should be 81 items correct out of 110.  When this standard is applied, the pass rate for the second exam form becomes similar to the rate on the first examination form (three fourths of the candidates pass).


The organization must now report scores to candidates.  This can be confusing because candidates who sat for the first examination had to answer 88 items correctly but those taking the second examination only need to answer 81 items correctly.  A candidate who failed the first examination with a score of 86 might be upset when they find out that a friend taking the second examination passed with a score of 82.  Over time, it can become difficult to keep track of the varying passing scores required for multiple examination forms.  To avoid this problem, examinations that have been equated usually report scaled scores.

