Abstract
Retrospective rating scales are widely used for formal assessment of typical performance. Raters who are the most familiar/interactive with ratees are routinely recommended to maximize the quality of ratings. This caveat to use the most familiar/interactive raters fails to distinguish sampling parameters of the observations on which ratings are based that may be important to assessing different classes of behavior. We hypothesized that systematic observational schedules would be of greater importance to ratings of public events than familiarity/interaction, per se, while the caveat would hold for ratings of private events. We used the Psychotic Inpatient Profile (PIP), which provides separate factor scores for ratings of public and private events, to examine these hypotheses in a quasi-experimental study with adult inpatients of mental hospitals. A large multiinstitutional data set provided retrospective PIP ratings by two types of raters. The most familiar/interactive local clinical staff for each client completed the PIP after observing on an ad lib schedule, along with ongoing job duties. Unfamiliar, noninteractive raters completed the PIP for each client after observing on a systematic time-sampling schedule for purposes of coding an entirely different instrument. Data were selected so that each of 189 clients received PIP scores from four raters, reflecting functioning during the same time period based on day-shift observations by one rater of each type and evening-shift observations by one rater of each type. Analyses of variance, consistency/discriminability of ratings, and prediction of social-action outcomes all supported the hypotheses. We discuss alternative strategies that are better for assessing typical performance in most circumstances. We also provide recommendations for improving the adequacy of observations for those circumstances in which the standardized retrospective rating scale could be a cost-effective assessment strategy.
Similar content being viewed by others
References
Altmann, J. (1974). Observational study of behavior: Sampling methods.Behaviour, 49, 227–267.
Campbell, D. T. (1958). Systematic error on the part of human links in communications systems.Information and Control, 1, 334–369.
Cooper, W. H. (1981). Ubiquitous halo.Psychological Bulletin, 90, 218–244.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972).The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley.
Endicott, J., & Spitzer, R. L. (1980). Evaluation of psychiatric treatment: Psychiatric ratingc scales. In H. I. Kaplan, A. M. Freedman, & B. J. Saddick (Eds.),The comprehensive textbook of psychiatry (3rd ed., pp. 2391–2409). Baltimore, MD: Williams & Williams.
Farrell, A. D., & Mariotto, M. J. (1982). A multimethod validation of two psychiatric rating scales.Journal of Consulting and Clinical Psychology, 40, 169–172.
Favero, J. L., & Ilgen, D. R. (1989). The effects of ratee prototypicality on rater observation and accuracy.Journal of Applied Social Psychology, 19, 932–946.
Fiske, D. W. (1978).Strategies for personality research. San Francisco: Jossey-Bass.
Funder, D. C., & Colvin, C. R. (1988). Friends and strangers: Acquaintanceship, agreement, and the accuracy of personality judgement.Journal of Personality and Social Psychology, 55, 149–158.
Hall, J. N. (1980). Ward rating scales for long-stay patients: A review.Psychological Medicine, 10, 277–288.
Harris, M. M., & Schaubroeck, J. (1988). A meta-analysis of self-supervisor, self-peer, and peer-supervisor ratings.Personnel Psychology, 41, 43–62.
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance.Psychological Bulletin, 96, 72–98.
Kane, J. S., & Lawler, E. E. (1978). Methods of peer assessment.Psychological Bulletin, 85, 555–586.
Kenrick, D. T., & Funder, D. C. (1988). Profiting from controversy: Lessons from the person-situation debate.American Psychologist, 43, 23–24.
Kraut, A. I. (1975). Prediction of managerial success by peer and training-staff ratings.Journal of Applied Psychology, 60, 14–19.
Landy, F. J. (1986).Psychology of work behavior (3rd ed.). Homewood, IL: Dorsey Press.
Landy, F. J., & Farr, J. L. (1980). Performance rating.Psychological Bulletin, 87, 72–107.
Landy, F. J., & Farr, J. L. (1983).The measurement of work performance: Methods, theory, and applications. New York: Academic Press.
Latham, G. P., & Wexley, K. N. (1981).Increasing productivity through performance appraisal. Reading, MA: Addison-Wesley.
Lentz, R. J., Paul, G. L., & Calhoun, J. F. (1971). Reliability and validity of three measures of functioning with “hard-core” chronic mental patients.Journal of Abnormal Psychology, 78, 69–76.
Licht, M. H. & Paul, G. L. (1987). Replicability of TSBC codes and higher-order scores. In G. L. Paul (Ed.),Observational assessment instrumentation for service and research — The Time-Sample Behavioral Checklist: Assessment in residential treatment settings, Part 2 (pp. 69–94). Champaign, IL: Research Press.
Lorr, M., & Vestre, N. D. (1985).Psychotic Inpatient Profile Manual (3rd ed.). Los Angeles: Western Psychological Services.
Lyerly, S. B. (1973).Handbook of psychiatric rating scales (PHS No. 495). Washington, DC: U.S. Government Printing Office.
Mariotto, M. J., & Licht, M. H. (1986). Ongoing assessment of functioning with DOC systems: Practical and technical issues. In G. L. Paul (Ed.),Principles and methods to support cost-effective quality operations: Assessment in residential treatment settings, Part 1 (pp. 191–222). Champaign, IL: Research Press.
Mariotto, M. J., Paul, G. L., & Licht, M. H. (1987). Concurrent relationships of TSBC higher-order scores with information from other instruments. In G. L. Paul (Ed.),Observational assessment instrumentation for service and research —The Time-Sample Behavioral Checklist: Assessment in residential treatment settings, Part 2 (pp. 177–210). Champaign, IL: Research Press.
McReynolds, P., & Ludwig, K. (1984). Christian Thomasius and the origin of psychological rating scales.ISIS, 75, 546–553.
Morrison, P. B., & Paniagua, F. A. (1990). Assumptions of agreement and familiarity on the Abbreviated Conners Teachers Rating Scale.Behavioral Residential Treatment, 5, 121–127.
Neimeyer, R. A., Neimeyer, G. J., & Landfield, A. W. (1983). Conceptual differentiation, integration and empathic prediction.Journal of Personality, 51, 185–191.
Paul, G. L. (Ed.) (1979). New assessment systems for residential treatment, management, research, and evaluation.Journal of Behavioral Assessment, 1, 181–184.
Paul, G. L. (1986). The nature of DOC and QICS encoding devices. In G. L. Paul (Ed.),Principles and methods to support cost-effective quality operations: Assessment in residential treatment settings, Part 1 (pp. 63–112). Champaign, IL: Research Press.
Paul, G. L. (1987). Discriminations of TSBC higher-order scores among groups differing on clinically relevant characteristics. In G. L. Paul (Ed.),Observational assessment instrumentation for service and research — The Time-Sample Behavioral Checklist: Assessment in residential treatment settings, Part 2 (pp. 147–176). Champaign, IL: Research Press.
Paul, G. L., & Mariotto, M. J. (1986). Potential utility of the sources and methods: A comprehensive paradigm. In G. L. Paul (Ed.),Principles and methods to support cost-effective quality operations: Assessment in residential treatment settings, Part 1 (pp. 113–164). Champaign, IL: Research Press.
Paul, G. L., & Mariotto, M. J. (1987). Predictive relationships of TSBC higher-order scores to other measures of performance and outcomes. In G. L. Paul (Ed.),Observational assessment instrumentation for service and research — The Time-Sample Behavioral Checklist: Assessment in residential treatment settings, Part 2 (pp. 211–236). Champaign, IL: Research Press.
Paul, G. L., Mariotto, M. J., & Redfield, J. P. (1986). Sources and methods for gathering information in formal assessment. In G. L. Paul (Ed.),Principles and methods to support cost-effective quality operations: Assessment in residential treatment settings, Part 1 (pp. 27–62). Champaign, IL: Research Press.
Paul, G. L., Licht, M. H., Power, C. T., & Engel, K. L. (1987). The data base for TSBC evidence and normative comparisons. In G. L. Paul (Ed.),Observational assessment instrumentation for service and research — The Time-Sample Behavioral Checklist: Assessment in residential treatment settings, Part 2 (pp. 51–68). Champaign, IL: Research Press.
Paunonen, S.,V. (1989). Consensus in personality judgments: Moderating effects of target-rater acquaintanceship and behavioral observability.Journal of Personality and Social Psychology, 56, 823–833.
Rich, B. E., Paul, G. L., & Mariotto, M. J. (1988). Judgmental relativism as a validity threat to standardized psychiatric rating scales.Journal of Psychopathology and Behavioral Assessment, 10, 241–257.
Rorer, L. G. (1990). Personality assessment: A conceptual survey. In L. A. Pervin (Ed.),Handbook of personality theory. New York: Guilford Press.
Rothstein, H. R. (1990). Interrater reliability of job performance ratings: Growth to asymptote level with increasing opportunity to observe.Journal of Applied Psychology, 75, 322–327.
Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data.Psychological Bulletin, 88, 413–428.
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability.Psychological Bulletin, 86, 420–428.
Thompson, C. (Ed.). (1989).The instruments of psychiatric research. New York: Wiley.
Tsui, A. S., & Ohlott, P. (1988). Multiple assessment of managerial effectiveness: Interrater agreement and consensus in effectiveness models.Personnel Psychology, 41, 779–803.
Wiggins, J. S. (1988).Personality and prediction: Principles of personality assessment. Malabar, FL: Krieger.
Author information
Authors and Affiliations
Additional information
This study was the basis of a master's thesis at the University of Houston by the senior author under the direction of the junior authors. Richard M. Rozelle served on the examination committee. This study was partially supported by grants to Gordon L. Paul from the National Institute of Mental Health, Public Health Service (MH-15353; MH-25464); the Illinois Department of Mental Health and Developmental Disabilities; the Joyce Foundation; the MacArthur Foundation; the Owsley Foundation; the Cullen Foundation; and the Center for Public Policy of the University of Houston.
Rights and permissions
About this article
Cite this article
Braun, G.B., Paul, G.L. & Mariotto, M.J. Familiar/interactive raters are not always best: The influence of sampling schedules and class of behavior. J Psychopathol Behav Assess 15, 153–176 (1993). https://doi.org/10.1007/BF00960615
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00960615