Frequently Asked Questions About Licensing Exams

Characteristics of a performance assessment

CLEAR Exam Review (Winter 1996)
Norman R. Hertz

Question: What are the key features I should look for in evaluating the quality of a criterion-referenced (e.g., Angoff) passing score study?

Answer: The credibility of the passing score can be questioned if all of the procedures are not properly carried out. Under the best conditions, the results from passing score workshops are likely to be challenged, so it is important to ensure that all steps are completed. The major steps are described below.

Selection of Workshop Participants
A diverse group of participants plays an important part in establishing a passing score that accurately reflects competence required for practice. The participants should have representation by specialty areas of practice, years of licensed practice, gender, and ethnicity. One of the most important population characteristics is years of licensed practice. Since the passing score is established at the level that represents minimum acceptable competence, it is important that at least one-half of the participants have recently been licensed. Participants who have recently passed the licensing examination are likely to be more cognizant of the skills required for entry level than participants who have been licensed for many years. In selecting senior practitioners to serve as participants, be careful not to select practitioners who were granted licensure without taking the licensing examination.

Development of the Criterion
The term criterion-referenced passing score implies that the passing score is based upon a standard - a criterion. For licensing examinations, the criterion represents the level of candidate competence that demonstrates sufficient knowledge or skill to be able to practice safely. The question that is usually asked of the workshop participants is, "What percent of the minimally competent candidates would answer this question correctly?" The question cannot be answered reliably unless the parameters for the minimally competent candidate have been established.

The criterion of minimal competence should be established by having the workshop participants develop behaviorally based examples of performance expected from a minimally competent candidate. Do not assume that the participants have a common understanding of the performance expected of a minimally competent candidate.

Calibration of Participants
The participants must exercise their individual judgment during the passing score workshop. However, the participants should evaluate test questions consistently in terms of the percent of minimally competent candidates they believe would answer the question correctly. One should not expect the participants to evaluate the items identically, but the participants should apply the same standards in making their judgments.

Discussion of the Ratings
To maintain the integrity of the passing score workshop, independent ratings are a must. The importance of discussing the ratings becomes evident during the discussion when a participant brings up an issue that was unnoticed by the other participants. It is not uncommon that the new information greatly influences the ratings assigned by the other participants. Discussion of the ratings are also important when statistical information about the performance of the item is provided by the workshop leader. Participants may discuss and change their ratings when the statistical data about the difficulty of the item is different from their initial perception of the level of difficulty of the item.

Analysis of Passing Score Data
Data collected from the passing score workshops should be analyzed to estimate the reliability of the recommended passing score and to establish confidence intervals around the passing score. The mean (average) of the participants' ratings is the most representative score and should normally be used as the passing score. However, any score within the confidence interval may be used if sufficient rationale can be provided. Rationale for selecting a passing score other than the mean may include an extremely high or low mean, or consistently extreme ratings by some of the participants. items vary in difficulty, so you should expect the participants' ratings would also reflect the variability.


CLEAR Exam Review (Winter 1996)
Norman R. Hertz

Question: Examination programs oftentimes are composed of multiple-choice questions and also contain a performance assessment (e.g., oral, design, practical, clinical, etc.). What are the issues in designing these tests?

Answer: The most important concept to be addressed in responding to your question is whether the performance examination measures the skill or knowledge better than some other form of assessment. According to Guion (1995), the meaning of performance assessment is more obscure and subject to irrelevant sources of variance.

Three themes are important to consider in designing performance assessments according to Guion (1995). First, scores should have clear, unambiguous meaning, the constructs that the scores represent must be clearly defined. The constructs that are being measured are more likely to be clearly defined if they are observable. Second, performance scores should permit fair, meaningful comparisons. For occupational licensing, the scores are compared against a standard. For the comparisons to be fair, the scores should be reliable and the assessment should be measuring the same construct. Third, validity-reducing errors should be minimal. One of the shortcomings of performance examinations is that they are usually designed so that the choice of tasks is potentially a source of measurement error. If a few tasks that are similar are selected for measurement, the results of the assessment do not provide a clear indication of the candidate's capacity to perform all the activities in the profession. On the other hand, if the tasks being assessed are more varied, the reliability of the assessment may be decreased. The selection of the tasks becomes a problem because performance examinations require more time to administer than, for example, multiple-choice examinations.

The scoring procedure is a very critical element in performance assessment. The best procedure would be to establish one passing score based upon the sum of the scores for each type of examination. One of the advantages of adding the scores is that the reliability of the examination results are maximized. Secondly, it is very difficult to support the concept that each of the tests measures such different parts of practice that each examination must be passed separately.

Furthermore, performance examinations are not inherently better than multiple-choice examinations. Well-developed multiple-choice examinations are still a viable means of assessing competence. The best approach to test development is to measure all the elements of practice in a multiple-choice examination, and use alternative forms of testing, such as an oral or practical examination, only if there are elements that cannot be measured in the multiple-choice examination. Multiple-choice tests are less costly to develop and score than performance examinations, and they provide a suitable methodology for assessing even the most complex competencies. Most performance examinations require the use of judges to evaluate and to score the examinations. When judges are used to score the examinations, issues of objectivity must be addressed. In summary, until you are convinced that the content can only be tested with a performance examination, stay with a multiple-choice examination.


Back to index

© 2002 Council on Licensure, Enforcement and Regulation