An R Package for Probabilistic Latent Feature Analysis of Two-Way Two-Mode Frequencies

A common strategy for the analysis of object-attribute associations is to derive a lowdimensional spatial representation of objects and attributes which involves a compensatory model (e.g., principal components analysis) to explain the strength of object-attribute associations. As an alternative, probabilistic latent feature models assume that objects and attributes can be represented as a set of binary latent features and that the strength of object-attribute associations can be explained as a non-compensatory (e.g., disjunctive or conjunctive) mapping of latent features. In this paper, we describe the R package plfm which comprises functions for conducting both classical and Bayesian probabilistic latent feature analysis with disjunctive or a conjunctive mapping rules. Print and summary functions are included to summarize results on parameter estimation, model selection and the goodness of fit of the models. As an example the functions of plfm are used to analyze product-attribute data on the perception of car models, and situation-behavior associations on the situational determinants of anger-related behavior.


Introduction
The analysis of a two-way frequency table is a basic task in data analysis which is of interest to researchers in various domains of applied research (Agresti 2002).Depending on the type of distribution which is appropriate for modelling the frequencies, one may distinguish between several types of frequency data.In particular, a multinomial distribution is appropriate for modelling the frequencies in a two-way contingency table, and a Poisson distribution is appropriate for modelling frequencies which represent (unbounded) counts.Another type of two-way frequency data, which is the focus of the present paper, arises when the frequencies are derived by aggregating three-way three-mode or two-way three-mode binary observations across the entities of one mode.In that case, a binomial distribution for modelling the frequencies may be considered appropriate.
Both three-way three-mode and two-way three-mode binary data are of interest in several substantive domains.First, three-way three-mode binary data occur when multiple raters judge for each of a set of objects and for each of a set of attributes whether or not a certain object has a certain attribute.For instance, when investigating product perception in a marketing context, one may ask consumers to judge whether products have a certain attribute.
In personality psychology, one may ask respondents to judge whether they would display a specific behavior in a specific situation.In psychiatric diagnosis, one may ask several clinicians to judge whether or not a certain patient has a certain symptom.
Second, two-way three-mode data occur when respondents nested in groups respond to a set of binary items.For instance, in the context of cross cultural research, respondents of different countries may respond to binary statements in a survey.In educational research, pupils nested in schools may complete an intelligence test that consists of binary items.
As fully modelling three-way three-mode or two-way three-mode binary data may be complex, researchers may start by analyzing the marginal two-way frequency table that is obtained by aggregating three-way three-mode data or two-way three-mode data across entities of one mode (i.e., raters or respondents).
More specifically, one may use a classical multivariate technique such as principal components analysis (PCA) or correspondence analysis (CA) to derive a low-dimensional spatial representation of the row-and column elements, or exploratory factor analysis (EFA) to reveal the latent factor structure underlying the observed frequencies.Applying such a dimension reduction or latent variable technique can be interesting in several applications: For instance, in a marketing context, a correspondence analysis of product-attribute frequencies can be used to derive a so-called perceptual map of products and attributes in a geometric space, the dimensions of which reflect the most important cognitive dimensions that drive product perception (Hoffman and Franke 1986;Torres and Bijmolt 2009).Furthermore, correspondence analysis and its extensions are a popular technique for ordination of species in vegetation science (Oksanen et al. 2012).Finally, in the context of document-retrieval conducting a PCA (or a two-mode factor analysis) on a term-by-document frequency matrix can help to reveal the semantic structure of a text (Deerwester, Dumais, Furnas, Landauer, and Harshman 1990;Landauer, Foltz, and Laham 1998).
PCA can be obtained by applying a singular value decomposition to the raw frequency table, or to the matrix of correlations or covariances between the columns.R functions for applying PCA to the matrix of correlations or covariances are princomp() (R Core Team 2013), dudi.pca() in the package ade4 (Dray and Dufour 2007;Chessel, Dufour, and Thioulouse 2004) and PCA() in the package FactoMineR (Lê, Josse, and Husson 2008;Husson, Josse, Lê, and Mazet 2012).R functions that apply singular value decomposition to the frequency table are prcomp() (R Core Team 2013) and lsa() in the package lsa for latent semantic analysis of a term by document matrix of frequencies (Wild 2011).
EFA uses a latent variable approach to model the correlations between the column elements by assuming that observed variables are a linear combination of a number of common latent factors and a specific error term.EFA differs from applying PCA to the correlation matrix in that it only decomposes the common variance in terms of latent factors.To conduct EFA with R one may use factanal() (R Core Team 2013).
Correspondence analysis involves using a singular value decomposition to decompose the Chisquare statistic associated to the frequency table.R functions for applying correspondence analysis are corresp() in the package MASS (Venables and Ripley 2002), ca() in the package ca (Nenadić and Greenacre 2007), dudi.coa() in the package ade4 (Dray and Dufour 2007;Chessel et al. 2004), CA() in the package FactoMineR (Lê et al. 2008;Husson et al. 2012) and cca() in vegan (Oksanen et al. 2012).As the metric assumptions underlying spatial approaches may be doubtful (Tversky 1977), non-spatial categorization-based approaches may be a useful alternative to model a two-way frequency matrix which results from aggregating replicated binary associations.Adopting a feature-based approach, one assumes that row-and colum elements can be represented as a set of binary latent features, and that the strength of the association between row-and column elements is a function of the pattern of latent features of these elements.Note that by representing each element as a set of binary latent features, one obtains an overlapping clustering of both the row-and column elements.More specifically, to model a two-way frequency table on the basis of latent features one may use a compensatory model (Meeds, Ghahramani, Neal, and Roweis 2007;Miller, Griffiths, and Jordan 2009), a deterministic non-compensatory model (Schepers, Van Mechelen, and Ceulemans 2011), or a probabilistic non-compensatory model (Candel and Maris 1997;Maris, De Boeck, and Van Mechelen 1996;Meulders, De Boeck, and Van Mechelen 2001a;Meulders, De Boeck, Van Mechelen, and Gelman 2005;Meulders, De Boeck, Van Mechelen, Gelman, and Maris 2001b).As an alternative, one may use a two-way partitioning method to simultaneously cluster both the row-and column elements (Schepers and Hofmans 2009).
The aim of this paper is to present the R (R Core Team 2013) package plfm (Meulders 2013) for analyzing two-way frequency data with the non-compensatory probabilistic latent feature models (PLFMs) which were originally introduced by Maris et al. (1996).The PLFM is related to the above described dimension-reduction techniques (PCA, CA) and to EFA in that it aims to explain the observed frequencies by representing row-and column elements in terms of a small set of latent variables.However, it differs from the dimension-based approaches in that it represents row-and column elements in terms of binary latent features (instead of continuous dimensions), and in that it explains observed associations as a non-compensatory (i.e., disjunctive or conjunctive) function of feature patterns (rather than as a compensatory function).In Section 2 we will first describe probabilistic latent feature models.Next, in Section 3 we describe the functions which are included in the R package plfm.In Section 4, we discuss classical probabilistic latent feature analysis: Section 4.1 describes the plfm() function which includes an improved algorithm for locating the posterior mode(s) of PLFMs.Section 4.2 describes the stepplfm() function which can be used to fit a series of PLFMs with with different numbers of latent features.In Section 4.3 the functions for conducting classical probabilistic latent feature analysis are illustrated with an application on the perception of car models.In Section 5 we describe Bayesian probabilistic latent feature analysis which involves computing a sample of the observed posterior distribution (see Section 5.1) with the function bayesplfm() (see Section 5.2).The Bayesian approach is illustrated in Section 5.3 with an application on the situational determinants of anger-related behavior.

Probabilistic latent feature models
To describe PLFMs, we further consider the situation of I raters who make binary judgments on the associations of J objects and K attributes.Let D ijk denote the observed association which equals 1 if object j (j = 1, . . ., J) has attribute k (k = 1, . . ., K) according to rater i (i = 1, . . ., I) and 0 otherwise.The number of raters who indicate an association between object j and attribute k is denoted as f 1 jk = i d ijk , and the number of raters who judge pair (j, k) not to be associated is denoted as f 0 jk = i (1 − d ijk ).In order to explain the observed binary observations, PLFMs assume a two-fold process: 1.When rater i judges whether object j has attribute k, it is assumed that both the object and the attribute are described in terms of F binary latent features.In particular, the binary latent variable Z obj ijkf indicates whether feature f (f = 1, . . ., F ) is regarded as a property of object j when rater i judges pair (j, k).Also, the binary latent variable Z att ijkf indicates whether feature f (f = 1, . . ., F ) is linked to attribute k when rater i judges pair (j, k).Furthermore, it is assumed that the categorization of objects and attributes in terms of the latent features is a stochastic process, that is Z obj ijkf ∼ Bernoulli(θ obj jf ) and Z att ijkf ∼ Bernoulli(θ att kf ).
2. It is assumed that the observed judgment D ijk is obtained as a deterministic mapping C(•) of the latent categorization of objects and attributes, namely, Maris et al. (1996) propose two mapping rules.First, with a disjunctive communality (DC) mapping rule it is assumed that In other words, the object has the attribute if at least one of the latent features which is linked to the attribute is also assigned to the object.
Second, with a conjunctive dominance (CD) rule it is assumed that Stated otherwise, the object has the attribute if all the latent features which are linked to the attribute are also assigned to the object.From the distribution of the latent variables and the mapping rule, one can derive the probability that the object is associated to the attribute: with p(z obj ijk |θ obj j ) and p(z att ijk |θ att k ) products of Bernoulli distributed variables and with P (D ijk = 1|z obj ijk , z att ijk ) fixed 0/1 probabilities that follow from the mapping rule.In particular, for the disjunctive communality rule, one may derive that: In the same way, it can be derived that with a conjunctive dominance rule it holds: Statistical inference for PLFMs is based on the observed posterior distribution p(θ|d) ∝ p(θ)p(d|θ) with p(θ) being the prior distribution of the parameters and p(d|θ) being the likelihood of the observed data, and with θ = (θ obj , θ att ) being a vector that comprises all the model parameters.From the statistical independence of the latent variables Z obj ijkf and Z att ijkf it follows that the observed variables D ijk are independent and that D ijk ∼ Bernoulli(π jk ).As a result, the likelihood of the model reads: Furthermore, it is convenient to specify a mild concave Beta(θ|2, 2) prior distribution p(θ) ∝ θ(1 − θ) for each parameter θ as this will guarantee the existence of a posterior mode in the interior of the parameter space.

Components of the package
The R package plfm comprises the following components: The function plfm() can be used for classical probabilistic latent feature analysis of disjunctive and conjunctive PLFMs with a particular number of features.The function uses an accelerated EM algorithm to compute the posterior mode(s) of PLFMs.In addition, it computes standard errors of the estimated parameters, as well as criteria for model selection and goodness of fit.
The function stepplfm() can be used to fit a series of disjunctive and/or conjunctive probabilistic latent feature models with different numbers of latent features.
The function bayesplfm() can be used for Bayesian probabilistic latent feature analysis.It uses a data-augmented Gibbs sampling algorithm to compute a sample of the posterior distribution in the neighbourhood of a specific posterior mode.The function also computes the posterior mean, the posterior median, 95% posterior intervals, and a convergence diagnostic for each model parameter.
Print and summary methods are provided for each of the above functions.In addition, for the stepplfm() function, a plot method is included to visualize the fit of a series of models with different numbers of features with respect to a certain fit criterion (e.g., AIC, BIC, variance accounted for by the model, etc.).
For illustrative purposes two data sets are included: The data set car contains data on the perception of car models, and the data set anger (Meulders, De Boeck, Kuppens, and Van Mechelen 2002;Kuppens, Van Mechelen, and Meulders 2004;Vermunt 2007) contains data on the situational determinants of anger-related behavior.

Classical probabilistic latent feature analysis
4.1.The function plfm() The function plfm() can be used to compute the posterior mode(s) of disjunctive or conjunctive PLFMs with a specific number of latent features.In addition, it yields asymptotic standard errors for the estimated parameters as well as criteria for model selection and goodness of fit.When calling plfm() one should specify the following arguments: datatype: The plfm() function can be applied to frequency data (datatype = "freq") or to a data frame (datatype = "dataframe").When using frequency data as input one should further specify the parameters freq1 and freqtot, and when using a data frame as input one should further specify the parameters data, object, attribute and rating.
freq1: An object-by-attribute matrix with in each cell the number of raters who indicate an association between an object-attribute pair.
freqtot: An object-by-attribute matrix with in each cell the total number of raters who judged the object-attribute pair.If each object-attribute pair has been judged by the same number of raters, one may specify freqtot as a single number.
data: A data frame that consists of three components: the variables object, attribute and rating.Each row of the data frame describes the outcome of a binary rater judgement about the association between a certain object and a certain attribute.
object: The name of the object component in the data frame data.The values of the vector data$object should be (non-missing) numeric or character values (i.e., object labels).
attribute: The name of the attribute component in the data frame data.The values of the vector data$attribute should be (non-missing) numeric or character values (i.e., attribute labels).
rating: The name of the rating component in the data frame data.The elements of the vector data$rating should be the numeric values 0 (no association) or 1 (association), or should be specified as missing (NA).
F: The number of latent features included in the model.

M:
The number of runs that should be conducted using random starting points.printrun: printrun = TRUE prints the analysis type (disjunctive or conjunctive), the number of features (F) and the number of the analysis (out of M runs) to the output screen, whereas printrun = FALSE suppresses the printing.
Accelerated EM algorithm for computation of the posterior mode(s) of PLFMs Let z = (z obj , z att ) be a vector that comprises all the latent observations.As the augmented posterior of PLFMs has a simple structure, namely p(θ|d, z) ∝ p(z|θ)p(θ), one can use algorithms for parameter estimation that especially gain from this fact.In particular, Maris et al. (1996) described and EM algorithm to locate the posterior mode of PLFMs and Meulders et al. (2001b) implemented a data-augmented Gibbs sampling algorithm for computing a sample of the observed posterior distribution of PLFMs.The plfm() function adds two improvements to the EM algorithm presented by Maris et al. (1996).First, as convergence of the EM algorithm may be slow in the neighbourhood of the posterior mode, we accelerated the convergence of the algorithm by implementing the method of Louis (1982), which extends the EM algorithm with a Newton Raphson-step (NR).Second, as implementing the NR-step involves computing the matrix of second derivatives, asymptotic standard errors of the model parameters can be easily computed.
To compute the posterior mode(s) of the posterior distribution for PLFMs we will use an accelerated EM algorithm.Given initial values θ (0) , in iteration (m + 1), the accelerated algorithm for locating the mode(s) of the observed posterior distribution p(θ|d) consists of the following steps (Louis 1982;Tanner 1996): 1. Expectation-step: Compute the expected value of the logarithm of the augmented posterior distribution p(θ|d, z) with respect to the distribution of the latent data, conditional on the observed data and the current guess to the posterior mode, that is, compute 2. Maximization-step: Maximize Q(θ, θ (m) ) with respect to θ in order to obtain θ (m) EM .
3. Newton-Raphson-step: Compute θ (m+1) as It can be shown that the EM algorithm (i.e., iterating between expectation and maximization steps) increases the value of the observed posterior density at each iteration and that it always converges to a stationary point (Tanner 1996).In contrast, for the accelerated algorithm, which includes a NR-step, convergence is not always guaranteed.However, as convergence of the accelerated algorithm becomes more likely in the neighbourhood of the posterior mode, a good strategy is to start with a number of EM-steps and to switch to the accelerated algorithm when the obtained solution is close to a posterior mode.More specifically the function plfm() switches from EM to the accelerated algorithm when the difference between subsequent values of log p(θ|d) becomes smaller than a (user-specified) convergence criterion (i.e., emcrit1), and it stops when the final convergence criterion (emcrit2) has been reached.Appendices A.1 and A.2 provide further details on the computation of the EM-step and the NR-step, respectively.

Computation of asymptotic standard errors
The asymptotic standard error of a parameter θ can be computed as follows: Analytical expressions for the first and second derivatives of the log observed posterior with respect to object-and attribute parameters are listed in Appendix A.3.

Model selection and assessment of goodness of fit
When using PLFMs to explain associations between objects and attributes on the basis of latent features, one has to choose among models with different numbers of features, or different mapping rules.In addition, it is important to investigate how well the model fits the observed data.For model selection, the function plfm() computes the Akaike information criterion (AIC, Akaike 1973Akaike , 1974) ) and the Bayesian information criterion (BIC, Schwarz 1978).Both AIC and BIC take the form of a sum of a badness-of-fit term (minus twice the log likelihood of the fitted model) and a penalty term, which is a measure of the complexity of the model.The model having the lowest value for AIC or BIC is selected.For AIC and BIC the penalty terms equal 2k and log(N )k, respectively, with k being the number of free parameters in the model and with N being the total 'sample size'.For PLFMs the sample size equals the number of raters who judged object-attribute associations.
To assess the goodness of fit of PLFMs, one may investigate to what extent the observed association frequencies f 1 jk are fitted by the model.More specifically, the plfm() function computes a Pearson chi-square test on the J × K table of observed frequencies.Under the null hypothesis that the model generated the data, the Pearson chi-square statistic is (asymptotically) chi-square distributed with degrees of freedom equal to the number of observations minus the number of model parameters (i.e., df = JK − (J + K)F ).As with increasing sample size the Pearson chi-square statistic tends to become very sensitive to small deviations between observed and expected frequencies, models selected on the basis of information criteria will be often rejected on the basis of the Pearson chi-square test.Therefore, it is also interesting to look at a measure of descriptive model fit.In particular, the plfm() function includes two descriptive goodness-of-fit measures, namely, (1) the correlation between observed and expected frequencies, and (2) the proportion of the variance in the observed frequencies accounted for by the model (i.e., the squared correlation between observed and expected frequencies).

The function stepplfm()
The function stepplfm() can be used to fit a series of disjunctive and/or conjunctive PLFMs that assume minF up to maxF latent features.As stepplfm() repeatedly calls the plfm() function it takes the same arguments except for the mapping rule and the number of features, which are specified with the following arguments: minF: The minimum number of latent features.maxF: The maximum number of latent features.maprule: Fit disjunctive (maprule = "disj"), conjunctive (maprule = "conj") or both disjunctive and conjunctive (maprule = "disj/conj") models.

Classical probabilistic feature analysis of the perception of car models
The list car contains data on the perception of car models.The elements of the car-byattribute matrix car$freq1 describe how many of 78 respondents indicate an association between each of 14 car models and each of 27 car attributes.After loading the data, we use the stepplfm() function to estimate disjunctive PLFMs with 1 up to 7 latent features.Note that, as the posterior distribution of probabilistic feature models may be multimodal, (M = 20) runs using random starting points are conducted for each model to investigate the existence of different local maxima.
R> data("car") R> set.seed(5698)R> car.lst <-stepplfm(freq1 = car$freq1, freqtot = 78, maprule = "disj", + minF = 1, maxF = 7, M = 20) In order to choose the number of features, one may use the plot method of the stepplfm() function to plot the BIC values of models with 1 up to 7 features: R> plot(car.lst,which = "BIC") As can be seen in Figure 1, a model with 6 features has the lowest BIC value, and hence it achieves the best balance between complexity and fit.
When using stepplfm() to compute a series of disjunctive (maprule = "disj") or conjunctive (maprule = "conj") models with minF up to maxF features, the results of subsequent plfm analyses are stored in a list with maxF-minF+1 components, each of which is a list of class "plfm".Using names(), a list of all attached entries can be obtained.For instance, for the 6-feature model:
objpar: A J × F matrix of estimated object parameters for the best model (i.e., the model with the highest posterior density among M runs).q q q q q q q 1 2 3 4 5 fitmeasures: Fit measures for the best model including loglikelihood, deviance, logposterior, information criteria (AIC, BIC), a Pearson chi-square goodness-of-fit test on the object-by-attribute table of observed frequencies, and two measures of descriptive fit (i.e., correlation between observed and expected frequencies, variance accounted for by the model).

logpost.runs:
The logposterior values of the models obtained with M runs.
objpar.runs:A M × J × F array of estimated object parameters obtained for each of M runs.
attpar.runs:A M × K × F array of attribute parameters obtained for each of M runs.

bestsolution:
The rank number of the best model (i.e., with the highest logposterior value) among M runs.
gradient.objpar:A J × F matrix with gradients of the object parameters for the best model.
gradient.attpar:A K × F matrix with gradients of the attribute parameters for the best model.

SE.objpar:
A J × F matrix with asymptotic standard errors for the object parameters of the best model.

SE.attpar:
A K × F matrix with asymptotic standard errors for the attribute parameters of the best model.q q q q q q q q q q q q q q q q q q q q 5 10 15 20 prob1: A J × K matrix of predicted object-attribute association probabilities for the best model.

Using
R> plot(car.lst[[6]]$logpost.runs,xlab = "run", ylab = "logposterior value") one may see in Figure 2 that 2 different solutions were identified with M = 20 runs, and that the best solution was obtained in 18 out of 20 runs.
To further inspect the fit and the estimated parameters for the best 6-feature model one may print the model output.
A more detailed summary of the model output including a Pearson chi-square goodness-offit test of the model on the car-by-attributes table, and asymptotic standard errors of the estimated object-and attribute parameters can be obtained using the summary function.In particular using

R> summary(car.lst[[6]])
we may see that that the model fails to fit the car-by-attribute frequencies in an absolute sense (χ 2 = 571, df = 132, p < 0.01), and that the asymptotic standard errors of the estimated parameters are acceptably low (i.e., always lower than .054for object parameters and always lower than .074for attribute parameters): ... bayesplfm() on the final model is often interesting because it provides a more accurate view on parameter uncertainty (e.g., posterior intervals which are valid in small samples), because the sample of the posterior can be used summarize the distribution of any function of the parameters of interest, and because one may use the sample for further model checking (Gelman, Van Mechelen, Verbeke, Heitjan, and Meulders 2005;Meulders et al. 2001bMeulders et al. , 2005)).

PEARSON CHI-SQUARE TEST OBJECT X ATTRIBUTE
Finally, note that using the best posterior mode as a starting point in the Bayesian analysis is fundamentally different from actually including information about the starting point in the prior distribution (e.g., by using a prior distribution which is centered at the best posterior mode): When using bayesplfm() with the best posterior mode as a starting point, the prior distribution involved in this Bayesian analysis is the same as the prior used by plfm() (namely, a Beta(2,2) prior for each model parameter).In other words, the main goal of the Bayesian analysis is to compute a sample of the posterior distribution in the neighbourhood of a specific mode.As a (less efficient) alternative, one could also use random starting points and select the chains that converge to the mode of interest.On the other hand, when including information about the best posterior mode in the prior distribution (e.g., by using a strong prior distribution centered at the best posterior mode), one changes the prior distribution, and consequently also the posterior distribution.If the involved prior distribution is less vague than the Beta(2,2) prior, a Bayesian analysis using this adapted posterior will yield smaller posterior intervals than an analysis with a Beta(2,2) prior.

Bayesian probabilistic feature analysis of anger-related behavior
The list anger contains data on the situational determinants of anger-related behaviors (Meulders et al. 2002;Kuppens et al. 2004;Vermunt 2007).The raw data anger$data consist of a situation × behavior × person array of binary judgments of 101 first year psychology students who indicated whether or not they would display each of 8 anger-related behaviors when being angry at someone in each of 6 situations.The 8 behaviors consist of 4 pairs of reactions that reflect a particular strategy to deal with situations in which one is angry at someone, namely, (1) fighting (fly off the handle, quarrel), ( 2) fleeing (leave, avoid), (3) emotional sharing (pour out one's heart, tell one's story), and (4) making up (make up, clear up the matter).The six situations are constructed from two factors with three levels: (1) the extent to which one likes the instigator of anger (like, dislike, unfamiliar), and (2) the status of the instigator of anger (higher, lower, equal).Each situation is presented as one level of a factor, without specifying a level for the other factor.The elements of the matrix anger$freq1 contain the number of persons who indicated that they would display a certain behavior in a certain situation, and the elements of the matrix anger$freqtot contain the total number of persons who made a judgment for each situation-behavior pair.
After loading the data, we first use the plfm() function to estimate disjunctive and conjunctive models with 1 up to 3 features.Note that models with more than 3 features are not considered as they do not have a positive number of degrees of freedom.
R> plot(anger.lst, which = "BIC") As can be seen in Figure 3, the disjunctive 2-feature model offers the best balance between complexity and goodness of fit as it has the lowest BIC value.Further inspection of the output shows that this model deviates significantly from a perfectly fitting model (χ 2 = 78.3,df = 20, p < 0.01), but that it has a good descriptive fit in that it explains 92% of the variance in the observed situation-behavior frequencies.To further study the disjunctive 2-feature model, we use the bayesplfm() function to compute a sample of the observed posterior distribution.
In doing so, the (best) posterior mode which was identified with the plfm() function is used as the starting point: The algorithm stopped after 2000 iterations as for each parameter, the convergence diagnostic R was smaller than the specified convergence criterion.The output generated by the bayesplfm() function is stored in a list of class "bayesplfm".Using names(), a list of all attached entries can be obtained.For instance, for the disjunctive 2-feature model:
pmean.attpar:A K × F matrix with the posterior means of the attribute parameters.
Rhat.objpar:A J × F matrix of R convergence values for the object parameters.
Rhat.attpar:A K × F matrix of R convergence values for the attribute parameters fitmeasures: A list with two measures of descriptive fit on the J × K table : (1) the correlation between observed and expected frequencies, and (2) the proportion of the variance in the observed frequencies accounted for by the model.The association probabilities and corresponding expected frequencies are computed using the posterior mean of the parameters.

convstat:
The number of object-and attribute parameters that do not meet the convergence criterion.
To inspect the output of the model, one may print the object:  Inspection of the estimated parameters shows that the extracted features have a meaningful interpretation.More specifically, the two features can be interpreted as situation-behavior components which are combined in a disjunctive manner.
The first component (F1) indicates that when being angry at a person one likes (0.90), or when being angry at a person of lower status (0.79), one is more likely to make up (make up (0.91), clear up the matter (0.83)) or to fight (fly off the handle (0.55), quarrel (0.58)).The second component F2 indicates that when being angry at a person of higher status (0.85), or at a person one dislikes (0.76) or with whom one is unfamiliar (0.61), one is more likely to react with emotional sharing (pour out one's hart (0.83), tell one's story (0.92)) or flighting (leave (0.53), avoid (0.64)) than with making up (make up (0.10), clear up the matter (0.11)) or fighting (fly off the handle (0.27), quarrel (0.19)).Finally, when being angry at a person of equal status both components are likely to play a role (i.e., for equal status the estimated feature probabilities for F1 and F2 equal 0.50 and 0.69, respectively), and they will be combined in a disjunctive way.
In addition to the print function one may use the summary.bayesplfm()function to print a more detailed model output of the Bayesian analysis, which also shows 95% posterior intervals and R convergence values for each of the parameters.In particular R> summary(bayesangerdisj2 .81.0 q q q q q q q q q q q q like dislike unfamiliar higher status lower status equal status In order to illustrate using the sample of the posterior, we will simulate 95% posterior intervals of the probabilities for displaying certain behavior types (i.e., fight, flight, share emotions, make up) in a certain situation.As the parameter estimates for different behaviors of a certain type (e.g., leave and avoid as examples of flighting) are always very similar, we compute the average probability of different behaviors of a certain type.In order to simulate 95% posterior intervals we compute situation-behavior probabilities under the model using 2000 draws of the posterior sample.Note that, to compute the situation-behavior probabilities, we can use the gendat() function which is included in the plfm package.apply(repdat$prob1[, c(3, 4)], 1, mean) + prob[, 3, i] <-apply (repdat$prob1[, c(5, 6)], 1, mean) + prob[, 4, i] <-apply (repdat$prob1[, c(7, 8)], 1, mean) + } R> post95 <-function(x) { post95 <-quantile(x, c(0.025, 0.5, 0.975)) } R> p95prob <-apply(prob, c(1, 2), post95) As can be seen in Figure 4, the visualization of the 95% posterior intervals is very useful for evaluating which type of behavior is most likely in a certain situation, and to evaluate whether different behaviors have significantly different probabilities (i.e., non-overlapping 95% posterior intervals) to be displayed in a certain situation.For instance, Figure 4 shows that when being angry at a person of equal status, it is significantly more likely to 'share emotions with someone' than to 'fight', 'flight' or to 'make up'.On the other hand, the probabilities for 'fighting', 'flighting' or 'making up' in this situation do no significantly differ as their 95% posterior intervals overlap.
Note that, to draw latent data vectors z obj ijk or z att ijk we use the function digitsBase() from the R package sfsmisc (Maechler et al. 2013) in order to compute a binary matrix which contains all latent data patterns.

emcrit1:
Convergence criterion which indicates when the estimation algorithm should switch from expectation-maximization (EM) steps to EM+Newton-Rhapson steps.emcrit2: Convergence criterion which indicates final convergence to a local maximum.

Figure 1 :
Figure 1: BIC values for disjunctive models with 1 up to 7 features.

Figure 2 :
Figure 2: Logarithm of the posterior density for disjunctive 6-feature models computed in M = 20 runs.

Figure 3 :
Figure 3: BIC values for disjunctive and conjunctive models with 1 up to 3 features.

withcall:
The parameters used to call the function.sample.objpar:A J × F × N iter× Nchains array with parameter values for the object parameters.The matrix sample.objpar[,, i, c] contains the draw of object parameters in iteration i of chain c.Note: when Nchains = 1 the chain length (N iter) equals maxNiter, and when Nchains > 1 the chain length equals the number of iterations required to obtain convergence.sample.attpar:A K × F × N iter× Nchains array with parameter values for the attribute parameters.

Figure 4 :
Figure 4: Posterior median and 95% posterior interval for the average probability to fight (empty circle), flight (empty triangle), share emotions (filled square), or make up (filled circle) per situation in the disjunctive 2-feature model.