Abstract
When sequences of discrete events, or other units, are independently coded by two coders using a set of mutually exclusive and exhaustive codes, but the onset times for the codes are not preserved, it is often unclear how pairs of protocols should be aligned. Yet such alignment is required before Cohen’s kappa, a common agreement statistic, can be computed. Here we describe a method—based on the Needleman and Wunsch (1970) algorithm originally devised for aligning nucleotide sequences—for optimally aligning such sequences; we also offer the results of a simulation study of the behavior of alignment kappa with a number of variables, including number of codes, varying degrees of observer accuracy, sequence length, code variability, and parameters governing the alignment algorithm. We conclude that (1) under most reasonable circumstances, observer accuracies of 90% or better result in alignment kappas of .60 or better; (2) generally, alignment kappas are not strongly affected by sequence length, the number of codes, or the variability in the codes’ probability; (3) alignment kappas are adversely affected when missed events and false alarms are possible; and (4) cost matrices and priority orders used in the algorithm should favor substitutions (i.e., disagreements) over insertions and deletions (i.e., missed events and false alarms). Two computer programs were developed: Global Sequence Alignment, or GSA, for carrying out the simulation study, and Event Alignment, or ELign, a user-oriented program that computes alignment kappa and provides the optimal alignment given a pair of event sequences.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Abbott, A., &Barman, E. (1997). Sequence comparison via alignment and Gibbs sampling: A formal analysis of the emergence of the modern sociological article.Sociological Methodology,27, 47–87.
Altschul, S. F., &Erickson, B. W. (1985). Significance of nucleotide sequence alignments: A method for random sequence permutation that preserves dinucleotide and codon usage.Molecular Biology & Evolution,2, 526–538.
Baeza-Yates, R. A., Gavaldà, R., Navarro, G., &Scheihing, R. (1999). Bounding the expected length of longest common subsequences and forests.Theory of Computing Systems,32, 435–452.
Bakeman, R., &Gottman, J. M. (1997).Observing interaction: An introduction to sequential analysis (2nd ed.). New York: Cambridge University Press.
Bakeman, R., McArthur, D., Quera, V., &Robinson, B. F. (1997). Detecting sequential patterns and determining their reliability with fallible observers.Psychological Methods,2, 357–370.
Bakeman, R., &Quera, V. (1995).Analyzing interaction: Sequential analysis with SDIS and GSEQ. New York: Cambridge University Press.
Bakeman, R., &Robinson, B. F. (1994).Understanding log-linear analysis with ILOG: An interactive approach. Hillsdale, NJ: Erlbaum.
Booth, H. S., Maindonald, J. H., Wilson, S. R., &Gready, J. E. (2004). An efficient Z-score algorithm for assessing sequence alignments.Journal of Computational Biology,11, 616–625.
Boutet de Monvel, J. (1999). Extensive simulations for longest common subsequences: Finite size scaling, a cavity solution, and configuration space properties.European Physical Journal B,7, 293–308.
Chvátal, V., &Sankoff, D. (1999). An upper-bound technique for lengths of common subsequences. In D. Sankoff & J. B. Kruskal (Eds.),Time warps, string edits, and macromolecules: The theory and practice of sequence comparison (2nd ed., pp. 353–357). Stanford, CA: CSLI Publications.
Cohen, J. (1960). A coefficient of agreement for nominal scales.Educational & Psychological Measurement,20, 37–46.
Dancčík, V. (1994). Upper bounds for the expected length of longest common subsequences.Bulletin of the European Association for Theoretical Computer Science,54, 248.
Deken, J. (1979). Some limit results for longest common subsequences.Discrete Mathematics,26, 17–31.
Deken, J. (1999). Probabilistic behavior of longest-common-subsequence length. In D. Sankoff & J. Kruskal (Eds.),Time warps, string edits, and macromolecules: The theory and practice of sequence comparison (2nd ed., pp. 359–362). Stanford, CA: CSLI Publications.
Dijkstra, W. (2007). Sequence Viewer (Version 4.2a). [Computer software]. Retrieved from home.fsw.vu.nl/w.dijkstra/sequenceviewer.html.
Dijkstra, W., &Taris, T. (1995). Measuring the agreement between sequences.Sociological Methods & Research,24, 214–231.
Durbin, R., Eddy, S., Krogh, A., &Mitchison, G. (1998).Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press.
Ewens, W. J., &Grant, G. R. (2001).Statistical methods in bioinformatics: An introduction. New York: Springer.
Fichman, M. (1999).Finding patterns in sequences: Applying sequence comparison techniques to study behavior processes. Unpublished manuscript, Carnegie Mellon University.
Fu, W.-T. (2001). ACT-PRO action protocol analyzer: A tool for analyzing discrete action protocols.Behavior Research Methods, Instruments, & Computers,33, 149–158.
Galisson, F. (2000, August).Introduction to computational sequence analysis. Tutorial presented at the 8th International Conference on Intelligent Systems for Molecular Biology, San Diego.
Gardner, W. (1995). On the reliability of sequential data: Measurement, meaning, and correction. In J. M. Gottman (Ed.),The analysis of change (pp. 339–359). Mahwah, NJ: Erlbaum.
Giegerich, R., & Wheeler, D. (1996).Pairwise sequence alignment. VSNS BioComputing Division, Technische Fakultät, Universität Bielefeld: Available at www.techfak.uni-bielefeld.de/bcd/Curric/ PrwAli/prwali.html.
Gusfield, D. (1997).Algorithms on strings, trees, and sequences: Computer science and computational biology. New York: Cambridge University Press.
Hardy, P., & Waterman, M. S. (1997).The sequence alignment software library at USC. Unpublished manuscript, University of Southern California.
Hirschberg, D. S. (1997). Serial computations of Levenshtein distances. In A. Apostolico & Z. Galil (Eds.),Pattern matching algorithms (pp. 123–141). New York: Oxford University Press.
Kaye, K. (1980). Estimating false alarms and missed events from interobserver agreement: A rationale.Psychological Bulletin,88, 458–468.
Kruskal, J. B. (1999). An overview of sequence comparison. In D. Sankoff & J. B. Kruskal (Eds.),Time warps, string edits, and macromolecules: The theory and practice of sequence comparison (2nd ed., pp. 1–44). Stanford, CA: CSLI Publications.
Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F., &Wootton, J. C. (1993). Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment.Science,262, 208–214.
Levenshtein, V. I. (1965). Binary codes capable of correcting deletions, insertions, and reversals.Doklady Akademii Nauk SSSR,163, 845–848.
Mannila, H., & Ronkainen, P. (1997). Similarity of event sequences. InProceedings of the Fourth International Workshop on Temporal Representation and Reasoning: TIME’97 (pp. 136-139). Daytona Beach, FL.
McVicar, D., &Anyadike-Danes, M. (2000).Predicting successful and unsuccessful transitions from school to work using sequence methods. Belfast, U.K.: Economic Research Institute of Northern Ireland.
Needleman, S. B., &Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins.Journal of Molecular Biology,48, 443–453.
Paterson, M., &Dančík, V. (1994). Longest common subsequences. In I. Prívara, B. Rovan, & P. Ruzicka (Eds.),Proceedings of 19th International Symposium on Mathematical Foundations of Computer Science (pp. 127–142). Berlin: Springer.
Sankoff, D., &Kruskal, J. B. (Eds.) (1999).Time warps, string edits, and macromolecules: The theory and practice of sequence comparison (2nd ed.). Stanford, CA: CSLI Publications.
Sankoff, D., &Mainville, S. (1999). Common subsequences and monotone subsequences. In D. Sankoff & J. B. Kruskal (Eds.),Time warps, string edits, and macromolecules: The theory and practice of sequence comparison (2nd ed., pp. 363–365). Stanford, CA: CSLI Publications.
Scherer, S. (2001). Early career patterns: A comparison of Great Britain and West Germany.European Sociological Review,17, 119–144.
Waterman, M. S. (1995).Introduction to computational biology: Maps, sequences and genomes. London: Chapman & Hall.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Quera, V., Bakeman, R. & Gnisci, A. Observer agreement for event sequences: Methods and software for sequence alignment and reliability estimates. Behav Res 39, 39–49 (2007). https://doi.org/10.3758/BF03192842
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BF03192842