|
Frequently Asked Questions About Licensing Exams |
Characteristics of a performance assessment
CLEAR Exam Review
(Winter 1996)
Norman R. Hertz
Question: What are the key features I should look for in evaluating the quality of a criterion-referenced (e.g., Angoff) passing score study?
Answer: The credibility of the passing score can be questioned if all of the procedures are not properly carried out. Under the best conditions, the results from passing score workshops are likely to be challenged, so it is important to ensure that all steps are completed. The major steps are described below.
Selection of Workshop Participants
A diverse group of participants plays an
important part in establishing a passing score that accurately
reflects competence required for practice. The participants
should have representation by specialty areas of practice, years
of licensed practice, gender, and ethnicity. One of the most
important population characteristics is years of licensed
practice. Since the passing score is established at the level
that represents minimum acceptable competence, it is important
that at least one-half of the participants have recently been
licensed. Participants who have recently passed the licensing
examination are likely to be more cognizant of the skills
required for entry level than participants who have been licensed
for many years. In selecting senior practitioners to serve as
participants, be careful not to select practitioners who were
granted licensure without taking the licensing examination.
Development of the Criterion
The term criterion-referenced passing score
implies that the passing score is based upon a standard - a
criterion. For licensing examinations, the criterion represents
the level of candidate competence that demonstrates sufficient
knowledge or skill to be able to practice safely. The question
that is usually asked of the workshop participants is, "What
percent of the minimally competent candidates would answer this
question correctly?" The question cannot be answered
reliably unless the parameters for the minimally competent
candidate have been established.
The criterion of minimal competence should be established by having the workshop participants develop behaviorally based examples of performance expected from a minimally competent candidate. Do not assume that the participants have a common understanding of the performance expected of a minimally competent candidate.
Calibration of Participants
The participants must exercise their
individual judgment during the passing score workshop. However,
the participants should evaluate test questions consistently in
terms of the percent of minimally competent candidates they
believe would answer the question correctly. One should not
expect the participants to evaluate the items identically, but
the participants should apply the same standards in making their judgments.
Discussion of the Ratings
To maintain the integrity of the passing
score workshop, independent ratings are a must. The importance of
discussing the ratings becomes evident during the discussion when
a participant brings up an issue that was unnoticed by the other
participants. It is not uncommon that the new information greatly
influences the ratings assigned by the other participants.
Discussion of the ratings are also important when statistical
information about the performance of the item is provided by the
workshop leader. Participants may discuss and change their
ratings when the statistical data about the difficulty of the
item is different from their initial perception of the level of
difficulty of the item.
Analysis of Passing Score Data
Data collected from the passing score
workshops should be analyzed to estimate the reliability of the
recommended passing score and to establish confidence intervals
around the passing score. The mean (average) of the participants'
ratings is the most representative score and should normally be
used as the passing score. However, any score within the
confidence interval may be used if sufficient rationale can be
provided. Rationale for selecting a passing score other than the
mean may include an extremely high or low mean, or consistently
extreme ratings by some of the participants. items vary in
difficulty, so you should expect the participants' ratings would
also reflect the variability.
CLEAR Exam Review
(Winter 1996)
Norman R. Hertz
Question: Examination programs oftentimes are composed of multiple-choice questions and also contain a performance assessment (e.g., oral, design, practical, clinical, etc.). What are the issues in designing these tests?
Answer: The most important concept to be addressed in responding to your question is whether the performance examination measures the skill or knowledge better than some other form of assessment. According to Guion (1995), the meaning of performance assessment is more obscure and subject to irrelevant sources of variance.
Three themes are important to consider in designing performance assessments according to Guion (1995). First, scores should have clear, unambiguous meaning, the constructs that the scores represent must be clearly defined. The constructs that are being measured are more likely to be clearly defined if they are observable. Second, performance scores should permit fair, meaningful comparisons. For occupational licensing, the scores are compared against a standard. For the comparisons to be fair, the scores should be reliable and the assessment should be measuring the same construct. Third, validity-reducing errors should be minimal. One of the shortcomings of performance examinations is that they are usually designed so that the choice of tasks is potentially a source of measurement error. If a few tasks that are similar are selected for measurement, the results of the assessment do not provide a clear indication of the candidate's capacity to perform all the activities in the profession. On the other hand, if the tasks being assessed are more varied, the reliability of the assessment may be decreased. The selection of the tasks becomes a problem because performance examinations require more time to administer than, for example, multiple-choice examinations.
The scoring procedure is a very critical element in performance assessment. The best procedure would be to establish one passing score based upon the sum of the scores for each type of examination. One of the advantages of adding the scores is that the reliability of the examination results are maximized. Secondly, it is very difficult to support the concept that each of the tests measures such different parts of practice that each examination must be passed separately.
Furthermore, performance examinations are not inherently better than multiple-choice examinations. Well-developed multiple-choice examinations are still a viable means of assessing competence. The best approach to test development is to measure all the elements of practice in a multiple-choice examination, and use alternative forms of testing, such as an oral or practical examination, only if there are elements that cannot be measured in the multiple-choice examination. Multiple-choice tests are less costly to develop and score than performance examinations, and they provide a suitable methodology for assessing even the most complex competencies. Most performance examinations require the use of judges to evaluate and to score the examinations. When judges are used to score the examinations, issues of objectivity must be addressed. In summary, until you are convinced that the content can only be tested with a performance examination, stay with a multiple-choice examination.
Back to
index
© 2002
Council
on Licensure, Enforcement and Regulation