Generating Factor variables for Asymmetry, Non-independence and Skew-symmetry models in Square Contingency Tables using SAS

In this paper, a SAS program (macro) is written to generate factor and regression variables required for implementing asymmetry, non-independence, non-symmetry + independence models as well as skew-symmetry models in discussed in square a × a contingency tables having nominal or ordinal categories. While several authors have developed similar factor variables for use with GLIM, we have extended this to the non-independence and the non-symmetry+independence models. The former includes both the (cid:222) xed and variable distance models as well as the quasi-ordinal symmetry model. Further, our implementation of the asymmetry model in terms of the required factor variable is di ﬀ erent from those de (cid:222) ned for implementation of same in GLIM. Most of the models described in this paper however assume ordinal categories for the contingency table. The SAS macro developed can be applied to any square table of dimension a . We apply the models discussed in this paper to the 5 × 5 Danish mobility data that have been widely analyzed in various literatures.

1 Introduction Lawal (2002) has utilized the non-standard log-linear model approach to model asymmetry, non-independence, skew-symmetry and non-symmetry + independence models in square two-way contingency tables. To implement the non-standard log-linear model approach, we need to generate the relevant factor or regression variables required for such model. Kateri (1993) and Kutylowski (1989) have discussed the generation of factor variables required for the implementation of some of the models being considered in this paper in GLIM. Our implementation of the symmetry model in this paper for instance is consistent with the procedure proposed in Friedl (1995) except that our factor variable for the null symmetry model is deÞned differently. Both ours and Friedl have I(I+1) 2 levels. The implementation of the null asymmetry model therefore involves only this single factor variable whereas, the approach by Kateri and Kutylowski involves two such factor variables designated as sc and ss in both their papers. Further, their programs are written for GLIM.
The class of models considered and implemented in this paper are those described in Goodman (1985) namely, the asymmetry, the non-independence and the non-symmetry+independence models. The Skewsymmetry models are discussed in Yamagushi (1990) and we have presented in Appendix B, SAS Macro %Factors(A) for generating the relevant factor and regression variables necessary for implementing the models discussed in this paper. We demonstrate the use of this macro for the 5 × 5 Danish social mobility data which has been analyzed by various authors as an example. The macro can be utilized to Þt the models discussed here to any square table of any dimension. All the cell entries for the factor or regression variables generated to implement the models described in the following sections are presented in C for the 5× 5 Danish table example. In Appendix B, also, we present the SAS Macro %Analysis (A,count,combined) which utilizes the SAS macro %Factors(A) to implement all the models discussed in the present paper for a square table with dimension A, the argument in the macro.

Asymmetry Models
Asymmetry models (Goodman, 1985) are those models having the symmetry model as their baseline. That is, these models measure deviations from the complete symmetry model S. Models belonging to this group are described in (1a) to (1f ) below.
(1a) The complete symmetry model represents the baseline or null asymmetry model (O) for this group.
The model is described in Goodman (1985) as the null-asymmetry model. To implement the nullsymmetry model (S) for instance, Lawal (2001) suggested generating the factor variable required to Þt this model from the recurrence relation for an a × a table as: 5 table for instance, k =| i − j |= 0, 1, 3, 4. The main diagonal elements have k = 0 and h = 2, 3, 4, 5.
In the programming in SAS, the above recurrence relation and hence the entries for the factor variable S are generated with the following expressions for all (i, j): where k and a are as deÞned above. We note here that when i = j, then k = 0 in (2). The S factor variable has levels that equal a(a + 1)/2 = 15 for a = 5.
Hence, the resulting vector (this is indicated as a factor variable in SAS) necessary for implementing the complete symmetry or null asymmetry model which is generated from the above expression for a 5 × 5 table is: 1 2 3 4 5 2 6 7 8 9 3 7 10 11 12 4 8 11 13 14 5 9 12 14 15 We note here that the factor variable deÞned for S has entries that do not exactly match those generated in Friedl (1995), but the common feature of both vectors here is that both have 15 levels as expected. The S 0 in Friedl (1995) is generated from the expression: The null asymmetry model (S) is based on (a − 1)(a − 2)/2 degrees of freedom.
(1b) The triangle asymmetry (T) model is described in Goodman (1985). This model is the composite model S+T where T is a regression variable deÞned as: This model is equivalent to the conditional symmetry model described in McCullagh (1978), and is based on (a + 1)(a − 2)/2 degrees of freedom.
(1c) The diagonals-parameter symmetry model (DPS) is described in Agresti (1983). The model is the composite model S+D where D is a factor variable generated such that we have: The model is based on (a − 1)(a − 2)/2 degrees of freedom and is described in Goodman (1985) as the diagonals asymmetry model (D).
(1d) The linear diagonals-parameter symmetry model (LDPS) is described in Agresti (1983) andTomizawa (1992). It is the composite model S+F, where F is a regression variable pertaining to the Þxed distance model (Haberman, 1978;Lawal, 1996) and is generated as: The model is equivalent to the ordinal quasi symmetry (OQS) model described in Agresti (1996). The model is based on (a + 1)(a − 2)/2 degrees of freedom and will be described here as the asymmetry Þxed distance (F) model.

(1e)
The odds-symmetry model types I & II (OS1 & OS2) are fully described in Tomizawa (1985) and are composite models S+OS1 and S+OS2 respectively where OS1 and OS2 are factor variables generated as: and respectively. Both models are based on (a − 1)(a − 2) degrees of freedom. Both models will be designated as the asymmetry odds I and II (OS1) & (OS2) models.
(1f ) Lawal (2002) has described the 2-ratio parameter symmetry (2RPS) model introduced in Tomizawa (1987) as the 2-ratio parameter asymmetry (2RPA) model which has the multiplicative formulation: This model is the composite model (S) + (T ) + (F ) and is based on (a 2 − a − 4)/2 degrees of freedom and reduces to the asymmetry triangles model when δ = 1 in equation (8) above.

Skew-Symmetry Models
These class of models have the quasi-symmetry (QS) model as its baseline model. The models are proposed in Yamagushi (1990). Again, these models measure deviations from the baseline QS model. Also belonging to this group are the following models: The quasi-symmetry model (QS) which is the null (O) model for this class of asymmetry models.
The model is described by Goodman (1985) as the asymmetry (RC) model. Yamagushi described it as the null skew-asymmetry model. The model is the composite R+C+S, where R and C refer respectively to rows and column factor variables, where and, The model is based on (a − 1)(a − 2)/2 degrees of freedom.
(2b) The quasi conditional symmetry (QCS) model (Tomizawa, 1992) is equivalent to the uniform skewsymmetry level model in Yamagushi (1990) and is the composite model QS+T, where T is the regression variable deÞned in (3). The model is equivalent to the triangles-parameter skew-symmetry (SP SK) model in Yamagushi (1990). (2d) The quasi-odds symmetry (QOS) model (Tomizawa, 1985) is described in Yamagushi as the middlevalue-effect skew-symmetry model and is designated as the (M SK) model. The model is described in Bishop et. al. (1975) as the adjusted quasi-symmetry (AOS) model. It is the composite model QS+OS1 or QS+OS2. The model is based on (a − 2)(a − 3)/2 degrees of freedom.

Non-Independence Models
This class of models have the independence model as its baseline or null. Consequently, all the models described in this section models the interaction structure or deviations from the independence model in the table. Belonging to this category are the following: (3a) The null or independence model (O) has: ( 1 1 ) where α i and β j relate to the row and column marginals respectively. Thus, the null (O) model is the composite model R+C where R and C are the rows and column factor variables deÞned in (9) and (10) respectively.
(3a) The Þxed and variable distance models (F and V). Both the Þxed distance and variable distance models are described in Haberman (1978). The model has been implemented in Lawal & Upton (1990 and Lawal (1992). The models are often designated as (F) and (V) respectively (Lawal and Upton, 1990). Both models are composite models: O+F and O +V 1 − V (a−1) , where F is a regression variable deÞned in (5) and V h = {V 1 , · · · , V (a−1) } are factor or regression variables deÞned for h = 1, 2, ...., (a − 1) as: Both models (F) and (V) belong to the principal diagonal class models (Upton, 1985) satisfying where Φ ij is the log-odds ratio under this model.

(3b)
The quasi-independence model (Q), the non-independence triangles model (T) and the loyalty model (L) belong to the diagonal band models discussed in Upton (1985). For this class of models, the log-odds ratios satisfy: The models are composite models O+Q, O+T and O+L respectively where Q and L are factor and regression variables deÞned as: and, respectively. T is considered as a factor variable in this case. Model (T) is the triangles (Goodman, 1972) model while (L) is the uniform loyalty model discussed in Upton & Sarlvik (1981) in the context of political election studies. The non-independence models (L), (T) and (Q) have respectively, the degrees of freedom a(a − 2), (a 2 − 2a − 1) and (a 2 − 3a + 1).
(3c) The non-independence diagonals-parameter model (D) and the non-independence absolute diagonalsparameter model (DA) are described in Goodman (1972Goodman ( , 1985 as the asymmetric minor diagonal and symmetric minor diagonal models respectively. The models are the composite models O+D and O+DA respectively, where D is a factor variable deÞned in (4) and DA is also a factor variable deÞned as: Models (D) and (DA) have respectively (a − 2) 2 and (a − 1)(a − 2) degrees of freedom. The diagonalsabsolute triangle (DAT) nonindependence model is the composite model O+DA+T and is based on (a 2 − 3a + 1) degrees of freedom.

Non-symmetry + Independence Models
This class of models has as its baseline, the null non-symmetry+independence model deÞned in Goodman (1985) as: ( 1 6 ) The model is the model in (11) with β j = α j . The model deÞned in (16) is the familiar Halfway (H) model described in Hope (1982). This model is generalized in Goodman (1985) to the triangle non-symmetry+independence model (T) which is a composite model H+T where H comprises of (a − 1) regression variables such that H i is deÞned (Hope, 1982) as: Thus,

Applications
The data below is the well analyzed 5 × 5 Danish social mobility data which gives the cross-classiÞcation of father's and son's occupational status categories in Denmark (Bishop et al., 1975).

TABLE 1 about here
The results of applying all the models discussed in the previous sections to the Danish Social mobility data in Table 1 are presented in Table 2, where G 2 and X 2 refer to the likelihood ratio and Pearson's (Goodness-of-Þt: GOF) test statistics deÞned respectively as:

Conclusions
Results obtained from the implementation of this macro to the 5 × 5 Danish social mobility data agree with results previously published in various literature. Composite models can easily be implemented with the macro. Further, models with main diagonal deleted as in Goodman (1985) can similarly be easily implemented from the SAS program in Appendix B. A detailed program description is provided in appendix A.

Acknowlegments
The author would like to express their thanks to Dr. Sandra Keith of the Department of Mathematics for generating the expression in (2) in MAPLE.
Step It is a Keyword so the default value is WORK which means the factor variables data set will be stored in the work library unless you provide a different value in the macro statement after the equals sign. Note: If you accept the default value, then you only need to specify the dimension parameer A. FactorsDSN = SAS data set name you wish to give the factor variables data set. The default value is GENERATE.
Example1: %Factors(5) or %Factors(5,Lib=work,FactorDSN=generate) Specifies a 5x5 contingency table will be analyzed. Default values Work and Generate will be used for the Library and Data set name for the Factors.
Example2: %Factors(5,Lib=C,FactorDSN=Table) Specifies dimension 5 again. The factor variables data set will be stored in a library named C and be given the name TABLE. This assumes that a Library C has been formed in your SAS session. We usually do this in Step 3 below.
Step 2. Compile macro program ANALYSIS locally or store the compiled macro in a permanent SAS catalog Indicates that the dependent variable in our contingency table data is named COUNT The SAS data set that merges the factors with the count data will be named COMBINED The Library where COMBINED will be stored is the default Work library. The factors variable SAS data set was named GENERATE by default in Step 1.
Example 2. %Analysis(5,Count,Combined,Lib=C,FactorDSN=Table) The first three parameters are the same as in Example 1. The last two parameters are the same as Example 2 of Step 1.
Step 3. Read in count data ---name the SAS data set with the macro variable DataDSN. You can use any valid SAS data set name. We use the name Data1 in the program below.
Store the data in a SAS library of your choice. The library name is to be specified by the macro variable DataLib. We use the Work library in the program below.
We make both of these macro variables global so their values can be referred to in the macro program Analysis in Step 2 above.
Two Libname statements are optionally given. The first Libname statement sets up the library where the count data set (DataDSN) will be stored. If you use the temporary Work library then you can leave this statement commented out. If you want to store the data permanently you will need to provide the correct path to the library in the libname statement.
The second Libname statement sets up the library where the Factors data set and the Combined data set will be stored. Again you can leave this commented out if you use the temporary Work library. If you want to store the data permanently you will need to provide the correct path to the library and the library name that you intend to use when executing the FACTORS and ANALYSIS macros. (we have AAAAA in the statement below just as a place holder where the name you want will go) Our example data set has dimension 5 (5x5 contingency table). The data is input into one column in the following order: The first row elements, second row elements, etc giving us a data set with one column and 25 rows.
Finally we execute the two macros to get our analyses.