Discussion of Tanaka's paper.

The theory of quantification is a method of statistical data analysis of categorical data. In other words, this is a kind of data theory and is closely related to optimum scaling method. This method has been mainly developed by Guttman in Israel and Hayashi in Japan. The multidimensional scaling method, which has been recently developed, is considered to be a continuation of the theory of quantification. Tanaka discussed mathematically some of Hayashi's methods of quantification. The present paper, gives an overview of the methods developed by him and other closely related methods and gives the orientation of those methods introduced by Tanaka. Then, as an illustration of exploratory categorical data analysis, the experimental data of Grizzle are analyzed by using the second method of quantification. The data structure is shown heuristically as a spatial configuration of factors in two-dimensional Euclidean space.

Tanaka discussed my early methods of quantification from the stand point of mathematical statistics and added his newly developed method in the case of ordered categories with some asymptotic theories (1).
The terminology and notations in his paper are somewhat different from those in my papers. It is only remarked here that external criterion or criterion variable is used for outside criterion or outside variable in my original papers. As for the notations, readers must carefully follow. The method of quantification is considered to be a kind of scaling method of categorical data. The most important problem of quantification, both in fundamental idea and in methodology, is assigning numerical vectors to categorical data from the point of view of optimization for our purpose under some minimum assumptions. The idea is briefly described in a previous paper (2). From this idea, many methods including the four methods detailed in Tanaka's paper (1), have been developed; these are shown in Table 1.
This list contains the methods of quantification published by Hayashi with some closely related important methods to orientate his methods. Of course, this is not exhaustive; besides these, interesting methods have been developed by Hayashi's colleagues inside or outside Japan. The methods of quantification are frequently used in data analysis because the computer programs are now available in some of them as Tanaka mentioned. The leading ideas in the methods of quantification play a productive role in data analysis and in developing new statistical methods necessary for detective analysis of data. For example, the fourth method gives one similar realization of the aim of nonmetric multidimensional scaling (MDS) (17)(18)(19)(20)(21)(22)(23) and naturally proceeds to MDS. Some comments are added here. Usually the responses are given in the form,t 8i(,kj) = 1, the i-th element responses in the kj category in thej-th item = 0, otherwise However, we often meet the situation that the responses are not always determinative but may be expressed as a probabilistic event. In this case  ph(,kj) denotes the probability that i element responses in category kj of thej-th item when i belongs to h class. Even in such cases, the calculation in methods of quantification is done in quite the same way as in the dichotomous 1, 0 responses. The information of i E h and pis must be given in the data. Forexample, H = 3., h = +, +, -in item 1 which has of course three categories +, ±, -. It is supposed This model must be verified; also the values of probability must be estimated by some fundamental research before the present analysis. This idea may be crucial in some medical data. From our experience, fluctuation of measurement data which is due to bioactivity and measurement error is not usually neglected and fairly large even if the conditions of Environmental Health Perspectives "s,z means numerical vector given to category I in the k-th item. measurements are strictly regulated. As an illustration of the second method of quantification, the data of Table 3 in Grizzle's paper (24) on cases of coronary heart disease classified by type oflesion, age, location and race are used. These data are reproduced in Table 2. However this application may not be satisfied because of the properties of the data; this analysis will be done for the understanding of the second method.
According to Tanaka's notation, we have the factors listed in Table 3.
Subjectj belongs to one of the 7r and has a response in one category in item 1, i.e., location and race (k = 1), New Orleans white, Oslo, New Orleans Negro and a response in one category in item 2, i.e. age (k = 2), 35-44, 45-54, 55-64, 65-69. Grizzles data are rewritten as convenient for understanding of the second method in Table 4.
This analysis gives an information for an individual element (subject) of a group while Grizzle's result gives an information for in-group relations. Note that the meaning is rather different. Apart from this point, the calculation is shown as below. The numerical vectors given to item-categories and the calculated mean values of external criteria Y(CWi.are  shown in Tables 5 and 6 when the total variance is taken to be equal to 1.
The square root of correlation ratios which are obtained as latent roots in the latent equation are 0.23, 0.20, and 0.09 respectively. The first dimension and second dimension are adopted corresponding to the maximum and second maximum latent root. To make clear the features, Ski and Y(C,t. are shown in Figures 1 and 2. However, the discrimination power may be weak, as expected from the properties of the data; the configuration is interesting and the data structure is well shown.
In Figure 1, age has a linear structure and, New Orleans white and New Orleans Negro have similar features, quite different from Oslo. It is observed in Figure 2 that the values in the first dimension give the discrimination between myocardinal scar existence and nonexistence, whereas the values in the second The correspondence of Figure 1 to Figure 2 reveals the meaning of items.