Skip to main content
Log in

Familiar/interactive raters are not always best: The influence of sampling schedules and class of behavior

  • Published:
Journal of Psychopathology and Behavioral Assessment Aims and scope Submit manuscript

Abstract

Retrospective rating scales are widely used for formal assessment of typical performance. Raters who are the most familiar/interactive with ratees are routinely recommended to maximize the quality of ratings. This caveat to use the most familiar/interactive raters fails to distinguish sampling parameters of the observations on which ratings are based that may be important to assessing different classes of behavior. We hypothesized that systematic observational schedules would be of greater importance to ratings of public events than familiarity/interaction, per se, while the caveat would hold for ratings of private events. We used the Psychotic Inpatient Profile (PIP), which provides separate factor scores for ratings of public and private events, to examine these hypotheses in a quasi-experimental study with adult inpatients of mental hospitals. A large multiinstitutional data set provided retrospective PIP ratings by two types of raters. The most familiar/interactive local clinical staff for each client completed the PIP after observing on an ad lib schedule, along with ongoing job duties. Unfamiliar, noninteractive raters completed the PIP for each client after observing on a systematic time-sampling schedule for purposes of coding an entirely different instrument. Data were selected so that each of 189 clients received PIP scores from four raters, reflecting functioning during the same time period based on day-shift observations by one rater of each type and evening-shift observations by one rater of each type. Analyses of variance, consistency/discriminability of ratings, and prediction of social-action outcomes all supported the hypotheses. We discuss alternative strategies that are better for assessing typical performance in most circumstances. We also provide recommendations for improving the adequacy of observations for those circumstances in which the standardized retrospective rating scale could be a cost-effective assessment strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Altmann, J. (1974). Observational study of behavior: Sampling methods.Behaviour, 49, 227–267.

    Google Scholar 

  • Campbell, D. T. (1958). Systematic error on the part of human links in communications systems.Information and Control, 1, 334–369.

    Google Scholar 

  • Cooper, W. H. (1981). Ubiquitous halo.Psychological Bulletin, 90, 218–244.

    Google Scholar 

  • Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972).The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley.

    Google Scholar 

  • Endicott, J., & Spitzer, R. L. (1980). Evaluation of psychiatric treatment: Psychiatric ratingc scales. In H. I. Kaplan, A. M. Freedman, & B. J. Saddick (Eds.),The comprehensive textbook of psychiatry (3rd ed., pp. 2391–2409). Baltimore, MD: Williams & Williams.

    Google Scholar 

  • Farrell, A. D., & Mariotto, M. J. (1982). A multimethod validation of two psychiatric rating scales.Journal of Consulting and Clinical Psychology, 40, 169–172.

    Google Scholar 

  • Favero, J. L., & Ilgen, D. R. (1989). The effects of ratee prototypicality on rater observation and accuracy.Journal of Applied Social Psychology, 19, 932–946.

    Google Scholar 

  • Fiske, D. W. (1978).Strategies for personality research. San Francisco: Jossey-Bass.

    Google Scholar 

  • Funder, D. C., & Colvin, C. R. (1988). Friends and strangers: Acquaintanceship, agreement, and the accuracy of personality judgement.Journal of Personality and Social Psychology, 55, 149–158.

    Google Scholar 

  • Hall, J. N. (1980). Ward rating scales for long-stay patients: A review.Psychological Medicine, 10, 277–288.

    Google Scholar 

  • Harris, M. M., & Schaubroeck, J. (1988). A meta-analysis of self-supervisor, self-peer, and peer-supervisor ratings.Personnel Psychology, 41, 43–62.

    Google Scholar 

  • Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance.Psychological Bulletin, 96, 72–98.

    Google Scholar 

  • Kane, J. S., & Lawler, E. E. (1978). Methods of peer assessment.Psychological Bulletin, 85, 555–586.

    Google Scholar 

  • Kenrick, D. T., & Funder, D. C. (1988). Profiting from controversy: Lessons from the person-situation debate.American Psychologist, 43, 23–24.

    Google Scholar 

  • Kraut, A. I. (1975). Prediction of managerial success by peer and training-staff ratings.Journal of Applied Psychology, 60, 14–19.

    Google Scholar 

  • Landy, F. J. (1986).Psychology of work behavior (3rd ed.). Homewood, IL: Dorsey Press.

    Google Scholar 

  • Landy, F. J., & Farr, J. L. (1980). Performance rating.Psychological Bulletin, 87, 72–107.

    Google Scholar 

  • Landy, F. J., & Farr, J. L. (1983).The measurement of work performance: Methods, theory, and applications. New York: Academic Press.

    Google Scholar 

  • Latham, G. P., & Wexley, K. N. (1981).Increasing productivity through performance appraisal. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Lentz, R. J., Paul, G. L., & Calhoun, J. F. (1971). Reliability and validity of three measures of functioning with “hard-core” chronic mental patients.Journal of Abnormal Psychology, 78, 69–76.

    Google Scholar 

  • Licht, M. H. & Paul, G. L. (1987). Replicability of TSBC codes and higher-order scores. In G. L. Paul (Ed.),Observational assessment instrumentation for service and research — The Time-Sample Behavioral Checklist: Assessment in residential treatment settings, Part 2 (pp. 69–94). Champaign, IL: Research Press.

    Google Scholar 

  • Lorr, M., & Vestre, N. D. (1985).Psychotic Inpatient Profile Manual (3rd ed.). Los Angeles: Western Psychological Services.

    Google Scholar 

  • Lyerly, S. B. (1973).Handbook of psychiatric rating scales (PHS No. 495). Washington, DC: U.S. Government Printing Office.

    Google Scholar 

  • Mariotto, M. J., & Licht, M. H. (1986). Ongoing assessment of functioning with DOC systems: Practical and technical issues. In G. L. Paul (Ed.),Principles and methods to support cost-effective quality operations: Assessment in residential treatment settings, Part 1 (pp. 191–222). Champaign, IL: Research Press.

    Google Scholar 

  • Mariotto, M. J., Paul, G. L., & Licht, M. H. (1987). Concurrent relationships of TSBC higher-order scores with information from other instruments. In G. L. Paul (Ed.),Observational assessment instrumentation for service and research —The Time-Sample Behavioral Checklist: Assessment in residential treatment settings, Part 2 (pp. 177–210). Champaign, IL: Research Press.

    Google Scholar 

  • McReynolds, P., & Ludwig, K. (1984). Christian Thomasius and the origin of psychological rating scales.ISIS, 75, 546–553.

    Google Scholar 

  • Morrison, P. B., & Paniagua, F. A. (1990). Assumptions of agreement and familiarity on the Abbreviated Conners Teachers Rating Scale.Behavioral Residential Treatment, 5, 121–127.

    Google Scholar 

  • Neimeyer, R. A., Neimeyer, G. J., & Landfield, A. W. (1983). Conceptual differentiation, integration and empathic prediction.Journal of Personality, 51, 185–191.

    Google Scholar 

  • Paul, G. L. (Ed.) (1979). New assessment systems for residential treatment, management, research, and evaluation.Journal of Behavioral Assessment, 1, 181–184.

    Google Scholar 

  • Paul, G. L. (1986). The nature of DOC and QICS encoding devices. In G. L. Paul (Ed.),Principles and methods to support cost-effective quality operations: Assessment in residential treatment settings, Part 1 (pp. 63–112). Champaign, IL: Research Press.

    Google Scholar 

  • Paul, G. L. (1987). Discriminations of TSBC higher-order scores among groups differing on clinically relevant characteristics. In G. L. Paul (Ed.),Observational assessment instrumentation for service and research — The Time-Sample Behavioral Checklist: Assessment in residential treatment settings, Part 2 (pp. 147–176). Champaign, IL: Research Press.

    Google Scholar 

  • Paul, G. L., & Mariotto, M. J. (1986). Potential utility of the sources and methods: A comprehensive paradigm. In G. L. Paul (Ed.),Principles and methods to support cost-effective quality operations: Assessment in residential treatment settings, Part 1 (pp. 113–164). Champaign, IL: Research Press.

    Google Scholar 

  • Paul, G. L., & Mariotto, M. J. (1987). Predictive relationships of TSBC higher-order scores to other measures of performance and outcomes. In G. L. Paul (Ed.),Observational assessment instrumentation for service and research — The Time-Sample Behavioral Checklist: Assessment in residential treatment settings, Part 2 (pp. 211–236). Champaign, IL: Research Press.

    Google Scholar 

  • Paul, G. L., Mariotto, M. J., & Redfield, J. P. (1986). Sources and methods for gathering information in formal assessment. In G. L. Paul (Ed.),Principles and methods to support cost-effective quality operations: Assessment in residential treatment settings, Part 1 (pp. 27–62). Champaign, IL: Research Press.

    Google Scholar 

  • Paul, G. L., Licht, M. H., Power, C. T., & Engel, K. L. (1987). The data base for TSBC evidence and normative comparisons. In G. L. Paul (Ed.),Observational assessment instrumentation for service and research — The Time-Sample Behavioral Checklist: Assessment in residential treatment settings, Part 2 (pp. 51–68). Champaign, IL: Research Press.

    Google Scholar 

  • Paunonen, S.,V. (1989). Consensus in personality judgments: Moderating effects of target-rater acquaintanceship and behavioral observability.Journal of Personality and Social Psychology, 56, 823–833.

    Google Scholar 

  • Rich, B. E., Paul, G. L., & Mariotto, M. J. (1988). Judgmental relativism as a validity threat to standardized psychiatric rating scales.Journal of Psychopathology and Behavioral Assessment, 10, 241–257.

    Google Scholar 

  • Rorer, L. G. (1990). Personality assessment: A conceptual survey. In L. A. Pervin (Ed.),Handbook of personality theory. New York: Guilford Press.

    Google Scholar 

  • Rothstein, H. R. (1990). Interrater reliability of job performance ratings: Growth to asymptote level with increasing opportunity to observe.Journal of Applied Psychology, 75, 322–327.

    Google Scholar 

  • Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data.Psychological Bulletin, 88, 413–428.

    Google Scholar 

  • Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability.Psychological Bulletin, 86, 420–428.

    Google Scholar 

  • Thompson, C. (Ed.). (1989).The instruments of psychiatric research. New York: Wiley.

    Google Scholar 

  • Tsui, A. S., & Ohlott, P. (1988). Multiple assessment of managerial effectiveness: Interrater agreement and consensus in effectiveness models.Personnel Psychology, 41, 779–803.

    Google Scholar 

  • Wiggins, J. S. (1988).Personality and prediction: Principles of personality assessment. Malabar, FL: Krieger.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

This study was the basis of a master's thesis at the University of Houston by the senior author under the direction of the junior authors. Richard M. Rozelle served on the examination committee. This study was partially supported by grants to Gordon L. Paul from the National Institute of Mental Health, Public Health Service (MH-15353; MH-25464); the Illinois Department of Mental Health and Developmental Disabilities; the Joyce Foundation; the MacArthur Foundation; the Owsley Foundation; the Cullen Foundation; and the Center for Public Policy of the University of Houston.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Braun, G.B., Paul, G.L. & Mariotto, M.J. Familiar/interactive raters are not always best: The influence of sampling schedules and class of behavior. J Psychopathol Behav Assess 15, 153–176 (1993). https://doi.org/10.1007/BF00960615

Download citation

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00960615

Key words

Navigation