Pragmatic Hypotheses in the Evolution of Science

Esteves, Luis Gustavo; Izbicki, Rafael; Stern, Julio Michael; Stern, Rafael Bassi

doi:10.3390/e21090883

Open AccessArticle

Pragmatic Hypotheses in the Evolution of Science

¹

Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo 05508-090, Brazil

²

Departamento de Estatística, Universidade Federal de São Carlos, São Paulo 13565-905, Brazil

^*

Author to whom correspondence should be addressed.

Entropy 2019, 21(9), 883; https://doi.org/10.3390/e21090883

Submission received: 1 August 2019 / Revised: 8 September 2019 / Accepted: 10 September 2019 / Published: 11 September 2019

(This article belongs to the Special Issue MaxEnt 2019—The 39th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

This paper introduces pragmatic hypotheses and relates this concept to the spiral of scientific evolution. Previous works determined a characterization of logically consistent statistical hypothesis tests and showed that the modal operators obtained from this test can be represented in the hexagon of oppositions. However, despite the importance of precise hypothesis in science, they cannot be accepted by logically consistent tests. Here, we show that this dilemma can be overcome by the use of pragmatic versions of precise hypotheses. These pragmatic versions allow a level of imprecision in the hypothesis that is small relative to other experimental conditions. The introduction of pragmatic hypotheses allows the evolution of scientific theories based on statistical hypothesis testing to be interpreted using the narratological structure of hexagonal spirals, as defined by Pierre Gallais.

Keywords:

hypothesis tests; precise hypotheses; pragmatic hypotheses

1. Introduction

Standard hypothesis tests can lead to immediate logical incoherence, which makes their conclusions hard to interpret. This incoherence occurs because such tests have only two possible outcomes. Indeed, Izbicki and Esteves [1] shows that there exists no two-valued test that satisfies desirable statistical properties and is also logically coherent.

In order to overcome such an impossibility result, Esteves et al. [2] propose agnostic hypothesis tests, which have three possible outputs: (A) accept the hypothesis, say H, (E) reject H, or (Y) remain agnostic about H. These tests can be made logically coherent while preserving desirable statistical properties. For instance, both conditions are satisfied by the Generalized Full Bayesian Significance Test (GFBST). Furthermore, Stern et al. [3] show that the GFBST’s modal operators and their respective negations can be represented by vertices of the hexagon of oppositions [4,5,6,7,8,9], which is depicted in Figure 1.

This paper complements the above static representation with an analysis of the GFBST in the dynamic evolution of scientific theories. The analysis is based on the metaphor of evolutive hexagonal spirals [10,11], in which the logical modalities associated to scientific theories change over time, as in Figure 2. Our key point in this paradigm is reconciling two apparently contradictory facts. On the one hand, precise or sharp hypotheses, that is, hypotheses that have a priori zero probability are central in scientific theories [12,13]. On the other hand, the GFBST never accepts (A) precise hypotheses. These observations lead to the apparent paradox that, if the GFBST were used to test scientific theories, then the acceptance step in the spiral of scientific theories would be forfeited.

We overcome this paradox by proposing the concept of a “pragmatic hypothesis” associated to a precise hypothesis. Although precise hypotheses are commonly obtained from mathematical theories used in areas of science and technology [12,13], the associated pragmatic hypothesis is an imprecise hypothesis that is sufficiently good from the practical purpose of an end-user of the theories. For instance, Newtonian theory assumes a gravitational force of magnitude given by the equation

F = G m_{1} m_{2} d^{- 2}

, where the gravitational constant G has a precise value. However, the current Committee on Data for Science and Technology (CODATA) value for the gravitational constant is

G = 6.67408 (31) \times 10^{- 11} m^{3} {kg}^{- 1} s^{- 2}

, which includes a standard deviation for the last significant digits,

408 \pm 31

. Thus, it may be reasonable for a given end-user to assume that the theoretical form of the last equation is exact, but that, pragmatically, the constant G can only be known up to a chosen precision. As a result, one might wish to test an imprecise hypothesis associated to the scientific hypothesis of interest [14,15].

This article advocates for the conceptual distinction between a precise scientific theory and an associated pragmatic hypotheses. The alternate use of precise and pragmatic versions of corresponding statistical hypotheses enables the GFBST to (pragmatically) accept scientific hypotheses. Moreover, this alternate use allows the GFBST to track the evolution of scientific theories, as interpreted in the context of Gallais’ hexagonal spirals.

Our main goal in this paper is to formalize testing procedures for a theory taking into consideration the level of precision that is appropriate for a given end-user. To handle this problem, we consider the end-user’s predictions about an experiment of his interest. The variation in these predictions can be explained by a combination of the level of imprecision in the theory and by properties of the end-user’s experiment. For instance, the latter source of variation is influenced by properties of the equipment, including the precision, accuracy, and resolution of measuring devices [16,17], and also error bounds for fundamental constants and calibration factors [18,19,20,21,22,23,24,25,26]. We propose to choose a pragmatic hypothesis in such a way that the imprecision in the end-user’s predictions is mostly due to his experimental conditions and not due to the level of imprecision in the theory that he uses.

In order to develop this argument, Section 2 first adapts Gallais’s metaphor of hexagonal spirals to the evolution of science. Next, Section 3 proposes three methods of decomposing the variability in an end-user’s predictions into the level of precision of the theory he uses and his experimental conditions. Section 3.1 and Section 3.2 use these decompositions in order to build pragmatic hypothesis. They build pragmatic hypotheses for simple hypotheses and then prove that there exists a single way of extending this construction to composite hypotheses while preserving logical coherence in simultaneous hypothesis testing. This methodology is illustrated in Section 4. All proofs are found in Appendix A.

2. Gallais’ Hexagonal Spirals and the Evolution of Science

Following a well-established tradition in structural semantics and narratology [27,28], Gallais and Pollina [10] propose that many classical medieval tales follow the same organizational pattern. More precisely, these narratives exhibit an underlying “intellectual structure” and are organized according to an underlying archetypal format or prototypical pattern. This pattern includes both static and a dynamical aspects. From a static perspective, the logical structure of the narrative is such that each arch is represented by a vertex of the “hexagon of oppositions” [4]. The static hexagon of oppositions is depicted in Figure 1 and represents in each vertex a modal operator among necessity (□), possibility (⋄), contingency (

Δ

), and their negations (¬). These modal operators are structured according to three axes of opposition,

(= = =)

; a triangle of contrariety,

(- - -)

; another triangle of sub-contrariety,

(\dots)

; and several edges of subalteration

(⟶

). From a dynamical perspective, the temporal evolution of the narrative follows a spiral (Figure 2) that unwinds (se déroule) around concentric and expanding hexagons of opposition [10,11].

Because the evolution of science can also be conceived as following a spiral pattern [29], its analysis can benefit from the structure in the works by the authors of [10,11]. From a static perspective, the logical modalities induced by agnostic hypothesis tests [3] can be represented in the hexagon of oppositions. From a dynamic perspective, scientific theories evolve as a spiral which unwinds around the following states:

$A_{1}$ - Extant thesis: This vertex represents a standing paradigm, an accepted theory using well-known formalisms and familiar concepts, relying on accredited experimental means and methods, etc. In fact, the concepts of a current paradigm may become so familiar and look so natural that they become part of a reified ontology. That is, there is a perceived correspondence between concepts of the theory and “dinge-an-sich” (things-in-themselves) as seen in nature [29,30].
$U_{1}$ - Analysis: This vertex represents the moment when some hypotheses of the standing theory are put in question. At this moment, possible alternatives to the standing hypotheses may still be only vaguely defined.
$E_{1}$ - Antithesis: This vertex represents the moment when some laws of the standing theory have to be rejected. Such a rejection of old laws may put in question the entire world-view of the current paradigm, opening the way for revolutionary ideas, as described in the next vertex.
$O_{2}$ - Apothesis/ Prosthesis: This vertex is the locus of revolutionary freedom. Alternative models are considered, and specific (precise) forms investigated. There is intellectual freedom to set aside and dispose of (apothesis) old preconceptions, prejudices and stereotypes, and also to explore and investigate new paths, to put together (prosthesis) and try out new concepts and ideas.
$Y_{2}$ - Synthesis: It is at this vertex that new laws are formulated; this is the point of Eureka moment(s). A selection of old and and new concepts seem to click into place, fitting together in the form of new laws, laws that are able to explain new phenomena and incorporate objects of an expanded reality.
$I_{2}$ - Enthesis: At this vertex new laws, concepts and methods must enter and be integrated into a consistent and coherent system. At this stage many tasks are performed in order to combine novel and traditional pieces or to accommodate original and conventional components into an well-integrated framework. Finally, new experimental means and methods are developed and perfected, allowing the new laws to be corroborated.
$A_{2}$ - New Thesis: At this vertex, the new theory is accepted as the standard paradigm that succeeds the preceding one ( $A_{1}$ ). Acceptance occurs after careful determination of fundamental constants and calibration factors (including their known precision), metrological and instrumentational error bounds, etc. At later stages of maturity, equivalent theoretical frameworks may be developed using alternative formalisms and ontologies. For example, analytical mechanics offers variational alternatives that are (almost) equivalent to the classical formulation of Newtonian mechanics [31]. Usually, these alternative worldviews reinforce the trust and confidence on the underlying laws. Nevertheless, the existence of such alternative perspectives may also foster exploratory efforts and investigative works in the next cycle in evolution.

Table 1 applies this spiral structure to the evolution of the theories of orbital astronomy and chemical affinity. The evolution of orbital astronomy has been widely studied [32]. The evolution of chemical affinity is presented in greater detail in Stern [29], Stern and Nakano [33].

The above spiral structure highlights that a statistical methodology should be able to obtain each of the six modalities in the hexagon of oppositions. Before an acceptance vertex (A) in the hexagon is reached by the spiral of scientific evolution, theoretically precise or sharp hypotheses must be formulated. However, a logically coherent hypothesis test, such as the GFBST, can choose solely between rejecting or remaining agnostic (i.e., corroborating) such sharp hypotheses. Once the evolving theory becomes (part of) a well-established paradigm, the GFBST can be used with the goal of accepting non-sharp hypotheses in the context of the same paradigm, a context that includes fundamental constants and calibration factors (and their respective uncertainties), metrological error bounds, specified accuracies of scientific intrumentation, etc. The non-sharp versions of sharp hypotheses used in such tests are called pragmatic, and their formulation is developed in the following sections.

3. Pragmatic Hypotheses

In order to derive pragmatic hypotheses from precise ones, it is necessary to define an idealized future experiment. Let

θ

be an unknown parameter of interest, which is used to express scientific hypotheses and that takes values in the parameter space,

Θ

. A scientific hypothesis takes the form

H_{0} : θ \in Θ_{0}

, where

Θ_{0} \subset Θ

. Whenever there is no ambiguity,

H_{0}

and

Θ_{0}

are used interchangeably. Also, the determination of

θ

is useful for predicting an idealized future experiment,

Z

, which takes values in

Z

. The uncertainty about Z depends on

θ

by means of

P_{θ^{*}}

, the probability measure over

Z

when it is known that

θ = θ^{*}

,

θ^{*} \in Θ

.

Often, it is sufficient for an end-user to determine a pragmatic hypothesis, that is, that the parameter lies in a set of plausible values, which is larger than the null hypothesis. This set can be chosen in such a way that the variation over predictions about a future experiment is mostly due to experimental conditions rather than to the imprecision in the value of the parameter. This section formally develops a methodology for determining these pragmatic hypotheses.

In order to compare two parameter values, we use a “predictive dissimilarity”,

d_{Z}

, which is a function,

d_{Z} : Θ \times Θ \to R^{+}

, such that

d_{Z} (θ_{0}, θ^{*})

measures how much the predictions made for

Z

based on

θ^{*}

diverge from the ones made based on

θ_{0}

. We define and compare three possible choices for such a dissimilarity.

Definition 1.

The Kullback–Leibler predictive dissimilarity,

{K L}_{Z}

, is

\begin{matrix} {K L}_{Z} (θ_{0}, θ^{*}) & = K L (P_{θ^{*}}, P_{θ_{0}}) = \int_{Z} log (\frac{d P_{θ^{*}}}{d P_{θ_{0}}}) d P_{θ^{*}}, \end{matrix}

that is,

{K L}_{Z} (θ_{0}, θ^{*})

is the relative entropy between

P_{θ^{*}}

and

P_{θ_{0}}

.

Example 1

(Gaussian with known variance.). Let

Z = (Z_{1} \dots, Z_{d}) \sim N (θ, Σ_{0})

be a random vector with a multivariate Gaussian distribution:

\begin{matrix} \frac{d P_{θ} (z)}{d z} & = ∥ 2 π Σ_{0} ∥^{- 0.5} exp (- 0.5 {(z - θ)}^{t} Σ_{0}^{- 1} d (z - θ)) \\ {K L}_{Z} (θ_{0}, θ^{*}) & = \int_{R^{d}} log (\frac{d P_{θ^{*}} (z)}{d P_{θ_{0}} (z)}) d P_{θ^{*}} (z) = 0.5 {(θ_{0} - θ^{*})}^{t} Σ_{0}^{- 1} (θ_{0} - θ^{*}), \end{matrix}

When

d = 1

and

Σ_{0} = σ_{0}^{2}

,

\begin{matrix} K L_{z} (θ_{0}, θ^{*}) & = \frac{{(θ_{0} - θ^{*})}^{2}}{2 σ_{0}^{2}} \end{matrix}

(1)

The KL dissimilarity evaluates the distance between the predictive probability distributions for the future experiment under two parameter values,

θ_{0}

and

θ^{*}

. Although the KL dissimilarity is general, it can be challenging to interpret. In particular, it can be hard to establish the quality of the predictions for

Z

based on

θ^{*}

when

Z

is actually generated from

θ_{0}

and

K L_{Z} (θ_{0}, θ^{*}) \leq ϵ

. A more interpretable dissimilarity is obtained by taking

d_{Z} (θ_{0}, θ^{*})

to measure how far are the best predictions for

Z

based on

θ^{*}

and

θ_{0}

. In this case, if one makes a prediction for

Z

based on

θ^{*}

,

z_{*}

, and

Z

was actually generated using

θ_{0}

, then

d_{Z} (θ_{0}, θ^{*}) \leq ϵ

guarantees that

z_{*}

will be at most

ϵ

apart from the best possible prediction. Such a dissimilarity is discussed in the following definition.

Definition 2

(Best prediction dissimilarity—BP.). Let

\hat{Z} : Θ \to Z

be such that

\hat{Z} (θ_{0})

is the best prediction for

Z

given that

θ = θ_{0}

. For example, one can take

\begin{matrix} \hat{Z} (θ_{0}) & = arg min_{z \in Z} δ_{Z, θ_{0}} (z), \end{matrix}

where

δ_{Z, θ_{0}} : Z \to R

is such that

δ_{Z, θ_{0}} (z)

measures how bad

z

predicts

Z

when

θ = θ_{0}

. The “best prediction dissimilarity”,

{B P}_{Z} (θ_{0}, θ^{*})

, measures how badly

\hat{Z} (θ^{*})

predicts

Z

relatively to

\hat{Z} (θ_{0})

when

θ = θ_{0}

. Formally,

\begin{matrix} {B P}_{Z} (θ_{0}, θ^{*}) & = g (\frac{δ_{Z, θ_{0}} (\hat{Z} (θ^{*})) - δ_{Z, θ_{0}} (\hat{Z} (θ_{0}))}{δ_{Z, θ_{0}} (\hat{Z} (θ_{0}))}), \end{matrix}

where

g : R ⟶ R

is a motononic function. The choice of g in a particular setting aims at improving the interpretation of the best prediction dissimilarity criterion.

Example 2

(BP under quadratic form.). Let

Z = R^{d}

,

μ_{Z, θ} = E [Z | θ]

,

Σ_{Z, θ} = V [Z | θ]

and

S

be a positive definite matrix. Define the quadratic form induced by

S

to be

{∥ z ∥}_{S}^{2} = z^{T} S z

and

\begin{matrix} δ_{Z, θ_{0}} (z) = E [{∥ Z - z ∥}_{S}^{2} | θ = θ_{0}] \end{matrix}

The optimal prediction under

θ^{*}

is

\hat{Z} (θ^{*}) = μ_{Z, θ^{*}}

. It follows that

\begin{matrix} δ_{Z, θ_{0}} (\hat{Z} (θ^{*})) & = E [∥ Z - μ_{Z, θ^{*}} ∥_{S}^{2} | θ = θ_{0}] \\ = ∥ μ_{Z, θ_{0}} - μ_{Z, θ^{*}} ∥_{S}^{2} + E [∥ Z - μ_{Z, θ_{0}} ∥_{S}^{2} | θ = θ_{0}] \end{matrix}

In particular,

δ_{Z, θ_{0}} (\hat{Z} (θ_{0})) = E [∥ Z - μ_{Z, θ_{0}} ∥_{S}^{2} | θ = θ_{0}]

. Therefore,

\begin{matrix} {B P}_{Z} (θ_{0}, θ^{*}) & = g (\frac{∥ μ_{Z, θ_{0}} - μ_{Z, θ^{*}} ∥_{S}^{2}}{E [∥ Z - μ_{Z, θ_{0}} ∥_{S}^{2} | θ = θ_{0}]}) \end{matrix}

(2)

In this example, BP

_{Z}

can be put in the same scale as

Z

by taking

g (x) = \sqrt{x}

. Also, two choices of

S

are of particular interest. When

S = V {[Z | θ = θ_{0}]}^{- 1}

, Equation (2) simplifies to

\begin{matrix} {B P}_{Z} (θ_{0}, θ^{*}) & = g (d^{- 1} {∥ μ_{Z, θ_{0}} - μ_{Z, θ^{*}} ∥}_{Σ_{Z, θ_{0}}^{- 1}}^{2}) \end{matrix}

(3)

Similarly, when

S

is the identity matrix, Equation (2) simplifies to

\begin{matrix} {B P}_{Z} (θ_{0}, θ^{*}) = g (\frac{∥ E [Z | θ = θ_{0}] - E [Z | θ = θ^{*}] ∥_{2}^{2}}{t r (V [Z | θ = θ_{0}])}) \end{matrix}

(4)

Equation (4) admits an intuitive interpretation. The larger the value of

t r (V [Z | θ = θ_{0}])

, the more

Z

is dispersed and the harder it is to predict its value. Also,

∥ E [Z | θ = θ_{0}] - E [Z | θ = θ^{*}] ∥_{2}^{2}

measures how far apart are the best prediction for

Z

under

θ = θ_{0}

and

θ = θ^{*}

. That is,

{BP}_{Z} (θ_{0}, θ^{*})

captures that, if one predicts

Z

assuming that

θ = θ^{*}

when it is actually

θ = θ_{0}

, then the error with respect to the best prediction is increased as a function of the distance between the predictions over the dispersion of

Z

.

Example 3

(Gaussian with known variance.). Consider Example 1 and let

δ_{Z, θ_{0}} (z)

be as in Example 2. It follows from Equation (4) that when

S

is the identity matrix,

\begin{matrix} {B P}_{Z} (θ_{0}, θ^{*}) & = g (\frac{∥ θ_{0} - θ^{*} ∥_{2}^{2}}{t r (Σ_{0})}) \end{matrix}

(5)

Similarly, it follows from Equation (3) that when

S = Σ_{0}^{- 1}

,

\begin{matrix} {B P}_{Z} (θ_{0}, θ^{*}) & = g (d^{- 1} {(θ_{0} - θ^{*})}^{t} Σ_{0}^{- 1} (θ_{0} - θ^{*})) \end{matrix}

(6)

Conclude from Equation (6) that, if

S = Σ_{0}^{- 1}

and

g (x) = x

, then

{B P}_{Z} (θ_{0}, θ^{*}) = 2 d^{- 1} {K L}_{Z} (θ_{0}, θ^{*})

. Also, when

d = 1

,

Σ_{0} = σ_{0}^{2}

and

g (x) = \sqrt{x}

, both Equations (5) and (6) simplify to

\begin{matrix} {B P}_{Z} (θ_{0}, θ^{*}) & = σ_{0}^{- 1} | θ_{0} - θ^{*} | \end{matrix}

(7)

In some situations,

Z

is the average of m independent observations distributed as

N (θ, Σ_{0})

. In this case,

Z \sim N (θ, m^{- 1} Σ_{0})

. It follows from Equation (5) that

B P_{Z} (θ_{0}, θ^{*}) = g (\frac{m ∥ θ_{0} - θ^{*} ∥_{2}^{2}}{t r (Σ_{0})})

when

S

is the identity, and

{B P}_{Z} (θ_{0}, θ^{*}) = g (m d^{- 1} {(θ_{0} - θ^{*})}^{t} Σ_{0}^{- 1} (θ_{0} - θ^{*}))

when

S = Σ_{0}^{- 1}

.

Although

{BP}_{Z}

is more interpretable then

{KL}_{Z}

, it also relies on more tuning variables, such as

δ

,

\hat{Z}

, and g. A balance between these features is obtained by a third predictive dissimilarity, which evaluates how easy it is to recover the value of

θ

between

θ_{0}

or

θ^{*}

based on

Z

.

Definition 3

(Classification distance—CD.). Let

{\hat{θ}}_{θ_{0}, θ^{*}} : Z \to Θ

be such that

\begin{matrix} {\hat{θ}}_{θ_{0}, θ^{*}} (z) & = arg max_{θ \in {θ_{0}, θ^{*}}} f_{Z} (z | θ) \end{matrix}

{\hat{θ}}_{θ_{0}, θ^{*}}

assigns to each possible outcome of the future experiment

z

, in which the values of θ,

θ_{0}

, or

θ^{*}

make the experimental result more likely. The classification distance between

θ_{0}

,

θ^{*}

, and

C D (θ_{0}, θ^{*})

is defined as

\begin{matrix} C D (θ_{0}, θ^{*}) & = 0.5 P ({\hat{θ}}_{θ_{0}, θ^{*}} (Z) = θ_{0} | θ_{0}) + 0.5 P ({\hat{θ}}_{θ_{0}, θ^{*}} (Z) = θ^{*} | θ^{*}) - 0.5 \end{matrix}

C D (θ_{0}, θ^{*}) + 0.5

is the best Bayes utility in an hypothesis test of

θ_{0}

against

θ^{*}

using a uniform prior for θ and the 0/1 utility [15]. By subtracting

0.5

from this quantity,

C D (θ_{0}, θ^{*})

varies between 0 and

0.5

and is a distance. Also,

\begin{matrix} C D (θ_{0}, θ^{*}) & = 0.5 T V (P_{θ_{0}}, P_{θ^{*}}) = 0.25 {∥ P_{θ_{0}} - P_{θ^{*}} ∥}_{1}, \end{matrix}

where

T V (P_{θ_{0}}, P_{θ^{*}}) = {sup}_{A} | P_{θ_{0}} (A) - P_{θ^{*}} (A) |

and

∥ P_{θ_{0}} - P_{θ^{*}} ∥_{1} = \int_{Z} | P_{θ_{0}} (z) - P_{θ^{*}} (z) | d z

is the

L_{1}

-distance between probability measures.

Example 4

(Gaussian with known variance.). Consider Examples 1 and 3, when

d = 1

,

Σ_{0} = σ_{0}^{2}

, obtain

\begin{matrix} {C D}_{Z} (θ_{0}, θ^{*}) & = Φ (\frac{| θ_{0} - θ^{*} |}{2 σ_{0}}) - \frac{1}{2} \end{matrix}

(8)

Note that, in this case,

C D

would be the same as

B P

if, instead of taking

g (x) = \sqrt{x}

, one chose

g (x) = Φ (0.5 \sqrt{x}) - 0.5

.

Although analytical expressions for CD are generally not available, it is possible to approximate it via numerical integration methods.

The choice between predictive dissimilarity functions depends on the type of guarantee the end-user wishes to obtain. In particular,

B P

,

K L

, and

C D

is not an exhaustive list of dissimilarities. However, some of their properties can be useful in obtaining a choice. For instance, although

B P

yields a metric on the parameter space,

K L

and

C D

are dissimilarities between probability functions. That is, although

B P

will generate pragmatic hypothesis that have parameter values numerically close to a given

θ_{0}

,

K L

and

C D

will yield pragmatic hypothesis that have parameter values lead to similar prediction about

Z

. Also, although

K L

evaluates similarity between predictions from an information theoretic perspective,

C D

evaluates them from a perspective of hypothesis tests.

3.1. Singleton Hypotheses

We start by defining the pragmatic hypothesis associated to a singleton hypothesis. A singleton hypothesis is one in which the parameter assumes a single value, such as

H_{0} : θ = θ_{0}

. In this case, the pragmatic hypothesis associated to

H_{0}

is the set of points whose dissimilarity to

θ_{0}

is at most

ϵ

, as formalized below.

Definition 4

(Pragmatic hypothesis for a singleton.). Let

H_{0} : θ = θ_{0}

,

d_{Z}

be a predictive dissimilarity function and

ϵ > 0

. The pragmatic hypothesis for

H_{0}

,

P g ({θ_{0}}, d_{Z}, ϵ)

, is

\begin{matrix} P g ({θ_{0}}, d_{Z}, ϵ) & = {θ^{*} \in Θ : d_{Z} (θ_{0}, θ^{*}) \leq ϵ} \end{matrix}

Note that for

ϵ_{1} < ϵ_{2}

,

P g ({θ_{0}}, d_{Z}, ϵ_{1}) \subseteq P g ({θ_{0}}, d_{Z}, ϵ_{2})

.

Example 5

(Gaussian with known variance). Consider Examples 1 and 3 when

d = 1

,

Σ_{0} = σ_{0}^{2}

and

g (x) = \sqrt{x}

. It follows from Equations (1), (7) and (8) that

\begin{matrix} P g ({θ_{0}}, B P_{Z}, ϵ) & = [θ_{0} - ϵ σ_{0}, θ_{0} + ϵ σ_{0}] \\ P g ({θ_{0}}, K L_{Z}, ϵ) & = [θ_{0} - \sqrt{2 ϵ} σ_{0}, θ_{0} + \sqrt{2 ϵ} σ_{0}] \\ P g ({θ_{0}}, C D_{Z}, ϵ) & = [θ_{0} - 2 Φ^{- 1} (0.5 + ϵ) σ_{0}, θ_{0} + 2 Φ^{- 1} (0.5 + ϵ) σ_{0}] \end{matrix}

Note that the size of each of the pragmatic hypothesis is proportional to

σ_{0}

. This occurs because each predictive dissimilarity functions makes the prediction error due to the unknown parameter value small with respect to that due to the data variability,

σ_{0}^{2}

.

3.2. Composite Hypotheses

Next, we consider pragmatic hypotheses for general hypotheses

H_{0} : θ \in Θ_{0}

, where

Θ_{0} \subset Θ

.

Definition 5.

For each hypothesis

Θ_{0} \subseteq Θ

, predictive dissimilarity

d_{Z}

and

ϵ > 0

,

P g (Θ_{0}, d_{Z}, ϵ)

is the pragmatic hypothesis associated to

Θ_{0}

induced by

d_{Z}

and ϵ. Whenever

d_{Z}

and ϵ are clear or not relevant to the result, we write

P g (Θ_{0})

instead of

P g (Θ_{0}, d_{Z}, ϵ)

.

In order to construct these pragmatic hypotheses, we use logically coherent agnostic hypothesis tests. For each hypothesis, an agnostic hypothesis test can either reject it (1), accept it (0), or remain agnostic (1/2) [34]. Esteves et al. [2] shows that an agnostic hypothesis test is logically coherent if and only if it is based on a region estimator. Such tests are presented in Definition 7 and illustrated in Figure 3.

Definition 6.

Let

X

denote the sample space of the data used to test a hypothesis. A region estimator is a function,

R : X ⟶ P (Θ)

, where

P (Θ)

is the power set of Θ.

Definition 7

(Agnostic test based on a region estimator.). The agnostic test based on the region estimator R for testing

H_{0}

,

ϕ_{H_{0}}^{R}

, such that

\begin{matrix} ϕ_{H_{0}}^{R} (x) & = \{\begin{matrix} 0 & , i f R (x) \subseteq H_{0} \\ 1 & , i f R (x) \subseteq H_{0}^{c} \\ \frac{1}{2} & , o t h e r w i s e . \end{matrix} \end{matrix}

Besides the logical conditions on the hypothesis test, one might also impose logical restraints on how pragmatic hypotheses are constructed. For instance, let A and B be two hypothesis, such that B logically entails A, that is,

B \subseteq A

. If a logically coherent test accepts B, then it also accepts A. This property is called monotonocity [1,35,36,37]. One might also impose that

P g

, such that if a logically coherent hypothesis test accepts

P g (B)

, then it should also accept

P g (A)

. Similarly, let

{(A_{i})}_{i \in I}

be a collection of hypothesis which cover A, that is,

A \subseteq \cup_{i \in I} A_{i}

. If a logically coherent hypothesis test rejects every

A_{i}

, then it rejects A. This property is called union consonance. One might also impose that

P g

is such that, if a logically coherent hypothesis test rejects

P g (A_{i})

for every i, then it should also reject

P g (A)

. The above conditions define the logical coherence of a procedure for constructing pragmatic hypotheses.

Definition 8.

A procedure for constructing pragmatic hypothesis,

P g

, is logically coherent if, for every logically coherent hypothesis test ϕ and sample point x:

If $ϕ_{P g (B)} (x) = 0$ for some $B \subseteq A$ , then $ϕ_{P g (A)} (x) = 0$ .
If $ϕ_{P g (A_{i})} (x) = 1$ for every $i \in I$ and $A \subseteq \cup_{i \in I} A_{i}$ , then $ϕ_{P g (A)} (x) = 1$ .

In order to motivate the above definition, consider that the frequencies of

A A

,

A B

, and

B B

in a given population are

θ_{1}

,

θ_{2}

, and

θ_{3}

, respectively. Note that

B : = {0.25, 0.5, 0.25}

is a subset of

A = {(p^{2}, 2 p (1 - p), {(1 - p)}^{2}) : p \in [0, 1]}

, which denotes the Hardy–Weinberg equilibrium. That is, if the frequencies

A A

,

A B

, and

B B

are, respectively, 0.25, 0.5, and 0.25, then the population follows the Hardy–Weinberg equilibrium. As a result, if one pragmatically accepts that the population satisfies the specified proportions, then one might also wish to pragmatically accept that the population follows the Hardy–Weinberg. Similarly, if one pragmatically rejects for every

p \in [0, 1]

that the frequencies of

A A

,

A B

, and

B B

are, respectively,

p^{2}

,

2 p (1 - p)

, and

{(1 - p)}^{2}

, then one might also wish to pragmatically reject that the population follows the Hardy–Weinberg equilibrium. These conditions are assured in Definition 8.

In a logically coherent procedure for constructing pragmatic hypotheses, the pragmatic hypothesis associated to a composite hypothesis is completely determined by the pragmatic hypotheses associated to simple hypotheses. This result is presented in Theorem 1.

Theorem 1.

A procedure for constructing pragmatic hypothesis,

P g

, is logically coherent if and only if, for every hypothesis,

Θ_{0}

,

P g (Θ_{0}) = ⋃_{θ \in Θ_{0}} P g ({θ})

.

Using Theorem 1, it is possible to determine a logically coherent procedure for constructing pragmatic hypotheses by determining only the pragmatic hypothesis associated to simple hypothesis, such as in Section 3.1. Theorem 1 is illustrated in Section 4. One can also obtain the following general relation between predictive dissimilarities.

Lemma 1.

P g (Θ_{0}, K L_{Z}, 8 ϵ^{2}) \subseteq P g (Θ_{0}, C D_{Z}, ϵ)

.

Besides being logically coherent, it is often desirable in statistics [38,39] and in science [12,13] for a procedure to be invariant to reparametrization, so as to ensure that the procedure reaches the same conclusions whatever the coordinate system is used to specify both the sample and the parameter spaces. For instance, the pragmatic hypothesis that is obtained using the International metric system should be compatible to the one that is obtained using the English metric system. Invariance to reparametrization is formally presented in Definition 10.

Definition 9.

{(P_{θ^{*}}^{*})}_{θ^{*} \in Θ^{*}}

is a reparameterization of

{(P_{θ})}_{θ \in Θ}

if there exists a bijective function,

f : Θ \to Θ^{*}

, such that for every

θ \in Θ

,

P_{θ} = P_{f (θ)}^{*}

.

Definition 10.

Let

{(P_{θ^{*}}^{*})}_{θ^{*} \in Θ^{*}}

be a reparametrization of

{(P_{θ})}_{θ \in Θ}

by a bijective function,

f : Θ \to Θ^{*}

. Also, let

d_{Z}

and

d_{Z}^{*}

be predictive dissimilarity functions. The functions

d_{Z}

and

d_{Z}^{*}

are invariant to the reparametrization if for every logically coherent procedure for constructing pragmatic hypotheses,

P g

,

\begin{matrix} f [P g (Θ_{0}, d_{Z}, ϵ)] & = P g (f [Θ_{0}], d_{Z}^{*}, ϵ), \end{matrix}

Definition 10 states that, if

Θ_{0}

is an hypothesis and invariance to reparametrization holds, then the pragmatic hypothesis obtained in a reparametrization of

Θ_{0}

, say

P g (f [Θ_{0}])

, is the same as the transformed pragmatic hypothesis associated to

Θ_{0}

,

f [P g (Θ_{0})]

. Theorem 2 presents a sufficient condition for obtaining invariance to reparametrization.

Theorem 2.

Let

{(P_{θ^{*}}^{*})}_{θ^{*} \in Θ^{*}}

be a reparameterization of

{(P_{θ})}_{θ \in Θ}

given by a bijective function, f. If

d_{Z}

and

d_{Z}^{*}

satisfy

d_{Z} (θ_{0}, θ) = d_{Z}^{*} (f (θ_{0}), f (θ))

, then

d_{Z}

and

d_{Z}^{*}

are invariant to this reparametrization.

Corollary 1.

If

d_{Z}

and

d_{Z}^{*}

are the same choice between KL, BP, or CD, then

d_{Z}

and

d_{Z}^{*}

are invariant to every reparametrization.

The procedures for constructing pragmatic hypotheses induced by

K L

and

C D

also satisfy an additional property given by Theorem 3.

Theorem 3.

Let

Z_{m} = (Z_{1}, \dots, Z_{m})

, where

Z_{i}

’s are i.i.d.

F_{θ}

and

{(F_{θ})}_{θ \in Θ}

is identifiable [40,41]. Also, let

K L_{m}

and

C D_{m}

be the dissimilarities calculated using

Z_{m}

. If

P g

is logically coherent, then, for every

Θ_{0} \subseteq Θ

and

ϵ > 0

,

(i): ${(P g (Θ_{0}, K L_{m}, ϵ))}_{m \geq 1}$ and ${(P g (Θ_{0}, C D_{m}, ϵ))}_{m \geq 1}$ are non-increasing sequences of sets
(ii): $P g (Θ_{0}, K L_{m}, ϵ) \overset{m \to \infty}{\to} Θ_{0}$ and $P g (Θ_{0}, C D_{m}, ϵ) \overset{m \to \infty}{\to} Θ_{0}$ .

Theorem 3 states that the sequence of pragmatic hypotheses for

Θ_{0}

induced by

d_{Z_{m}}

is non-increasing if the dissimilarity is evaluated by either KL or CD. The greater the number of observable quantities

Z_{m}

, the easier it is to distinguish two parameter values

θ_{0}

and

θ^{*}

, and therefore the smaller the amount of parameters that are taken as close to

θ_{0}

. Also, as the sample size goes to infinity, the pragmatic hypothesis associated to

Θ_{0}

converges to to

Θ_{0}

. In other words, for each

θ_{0} \in Θ_{0}

, no other parameter value can predict infinitely many observable quantities with a precision sufficiently close to that of

θ_{0}

.

4. Applications

In the following, pragmatic hypotheses for standard statistical problems are derived. Coscrato et al. [42] provide additional examples and methods for obtaining pragmatic hypotheses.

Example 6

(Gaussian with unknown variance.). Consider the setting from Example 5, but with

σ^{2}

unknown and

0 < σ^{2} \leq M^{2}

. In this case, the parameter is

θ = (μ, σ^{2})

. Consider the composite hypothesis

H_{0} : {μ_{0}} \times (0, M^{2}]

, which is often written as

H_{0} : μ = μ_{0}

. In this case, let

θ_{0} = (μ_{0}, σ_{0}^{2})

and

Θ_{0} = {μ_{0}} \times (0, M^{2}]

. Proceeding as in Example 5, it follows that

\begin{matrix} P g ({θ_{0}}, B P_{Z}, ϵ) & = [μ_{0} - ϵ σ_{0}, μ_{0} + ϵ σ_{0}] \times (0, M^{2}] \\ P g (Θ_{0}, B P_{Z}, ϵ) & = [μ_{0} - ϵ M, μ_{0} + ϵ M] \times (0, M^{2}] & T h e o r e m 1 \end{matrix}

The rectangular shape of these pragmatic hypotheses seems to be unreasonable, as, for instance, whether a point

(μ, σ^{2})

is close to

(μ_{0}, σ_{0}^{2})

does not depend on

σ_{0}^{2}

. This is a consequence of the choice of δ in Example 5.

Figure 4 presents the pragmatic hypotheses for

H_{0} : μ = 0, σ^{2} = 1

and

H_{0} : μ = 0

when

ϵ = 0.1

and

M^{2} = 2

, and using the KL and CD dissimilarities. Contrary to BP, the hypotheses obtained from these dissimilarities do not have a rectangular shape. In particular, the triangular shape of the pragmatic hypotheses for

H_{0} : μ = 0

is such that the closer

σ^{2}

is to 0, the smaller the range of values for μ that are included in the pragmatic hypothesis. This behavior might be desirable, as when

σ^{2}

is small, there is little uncertainty about the value of

Z

, and consequently a narrow interval of values of μ can predict Z with precision ϵ.

Example 7

(Hardy–Weinberg equilibrium). Let

Z \sim M u l t i n o m i a l (m, θ)

, where

θ = (θ_{1}, θ_{2}, θ_{3})

,

θ_{i} \geq 0

, and

\sum_{i = 1}^{3} θ_{i} = 1

. The Hardy–Weinberg (HW) hypothesis [43],

H_{0}

, which is depicted in the red curve in Figure 5 satisfies

\begin{matrix} H_{0} : θ \in Θ_{0}, & Θ_{0} = \{(p^{2}, 2 p (1 - p), {(1 - p)}^{2}) : 0 \leq p \leq 1\} \end{matrix}

If

θ_{0}^{p} = (p^{2}, 2 p (1 - p), {(1 - p)}^{2})

,

δ_{Z} (z) = {E [∥ Z - z ∥}_{2}^{2} | θ = θ_{0}^{p}]

and

g (x) = \sqrt{x}

, then it follows from Example 2 that

\begin{matrix} {B P}_{Z} (θ_{0}^{p}, θ^{*}) & = {(m \times \frac{{(θ_{1} - p^{2})}^{2} + {(θ_{2} - 2 p (1 - p))}^{2} + {(θ_{3} - {(1 - p)}^{2})}^{2}}{p^{2} (1 - p^{2}) + 2 p (1 - p) (1 - 2 p (1 - p)) + {(1 - p)}^{2} (1 - {(1 - p)}^{2})})}^{0.5} \end{matrix}

The pragmatic hypotheses that are obtained using

K L

,

B P

, and

C D

for the HW hypothesis are depicted in Figure 5. The choice between BP or KL and CD has a large impact over the shape of the pragmatic hypotheses. Although, for BP, the width of the pragmatic hypothesis is approximately uniform along the HW curve, the width of the pragmatic hypotheses obtained using

K L

and

C D

is smaller towards the edges of the HW curve. This behavior could be expected, as towards the edges of the HW curve,

Z

has the smallest variability. The figure also depicts the challenge in calibrating

K L

. Although the pragmatic hypotheses for

B P

and

C D

have similar sizes when using

ϵ = 0.1

, this result was obtained for

K L

while using

ϵ = 0.01

.

The pragmatic hypotheses in Figure 5 are further tested using data from Brentani et al. [44], which is presented in Table 2. This study had the goal of verifying association between the APOE-ϵ4 gene and Alzheimer disease. The lower panels of Figure 5 present the 80% HPD regions for the distribution of this gene in each of the eight groups observed in the study. Additionally, they present two simulated datasets, 9 and 10. Groups 9 and 10 were generated by populations that were, respectively, not under and under the HW equilibrium. Group 9 and 10 fall, respectively, outside and inside of the pragmatic hypothesis.

Example 8

(Bioequivalence). Assume that

Z = (X, Y) \sim N ((μ_{1}, μ_{2}), σ^{2} I_{2})

, with σ known. We derive the pragmatic hypothesis for

H_{0} : μ_{1} = μ_{2}

, that is, for

{(μ_{1}, μ_{2}) \in R^{2} : μ_{1} = μ_{2}}

. Such a test might be used in a bioequivalence study, where X and Y are the concentrations of an active ingredient in a generic (test) drug medication and in the brand name (reference) medication [45], respectively. As

H_{0}

is composite, it helps to derive the pragmatic hypothesis of its constituents.

In order to do so, let

θ_{0} = (μ_{0}, μ_{0})

,

μ_{0} \in R

,

θ^{*} = (μ_{1}^{*}, μ_{2}^{*})

, and

H^{θ_{0}} : θ = θ_{0}

. If

δ_{Z, θ^{*}} (z) = E [{(X - z_{1})}^{2} + {(Y - z_{2})}^{2} | θ = θ^{*}]

and

g (x) = \sqrt{x}

, then

\begin{matrix} {B P}_{Z} (θ_{0}, θ^{*}) & = \sqrt{\frac{{(μ_{1}^{*} - μ_{0})}^{2} + {(μ_{2}^{*} - μ_{0})}^{2}}{2 σ^{2}}} \end{matrix}

Hence,

P g ({θ_{0}}, B P_{Z}, ϵ) = \{(μ_{1}^{*}, μ_{2}^{*}) : {(μ_{1}^{*} - μ_{0})}^{2} + {(μ_{2}^{*} - μ_{0})}^{2} \leq 2 ϵ^{2} σ^{2}\}

, which is a circle with center

(μ_{0}, μ_{0})

and radius

\sqrt{2} ϵ σ

, as depicted on the left panel of Figure 6. In this case, the pragmatic hypothesis is the Tier 1 Equivalence Test hypothesis suggested by the US Food and Drug Administration [45]. The pragmatic hypothesis for

H_{0} : μ_{1} = μ_{2}

is obtained by taking the union of the pragmatic hypotheses associated to its constituents, as illustrated in the right panel of Figure 6. Specifically,

\begin{matrix} P g (H_{0}, B P_{Z}, ϵ) = \{(μ_{1}^{*}, μ_{2}^{*}) : | μ_{2}^{*} - μ_{1}^{*} | \leq ϵ σ\} \end{matrix}

The pragmatic hypothesis for

H_{0}

using KL is obtained similarly. Note that

\begin{matrix} {K L}_{Z} (θ_{0}, θ^{*}) = 0.5 {B P}_{Z}^{2} (θ_{0}, θ^{*}) \end{matrix}

Therefore,

P g ({θ_{0}}, K L_{Z}, ϵ) = \{(μ_{1}^{*}, μ_{2}^{*}) : {(μ_{1}^{*} - μ_{0})}^{2} + {(μ_{2}^{*} - μ_{0})}^{2} \leq 2 ϵ σ^{2}\}

and

\begin{matrix} P g (H_{0}, K L_{Z}, ϵ) & = P g (H_{0}, K L_{Z}, 0.5 ϵ^{2}) \end{matrix}

The pragmatic hypothesis for

H_{0}

that is obtained using CD has no analytic expression. However, by observing that

N (μ, σ^{2}) = μ + σ N (0, 1)

, it is possible to show that there exists a monotonically increasing function,

h : R ⟶ R

, such that

\begin{matrix} P g (H_{0}, C D_{Z}, 0.5 ϵ^{2}) & = \{(μ_{1}^{*}, μ_{2}^{*}) : | μ_{2}^{*} - μ_{1}^{*} | \leq h (ϵ) σ\} \end{matrix}

That is, the pragmatic hypothesis associated to

H_{0}

have the same shape as in the right panel of Figure 6. They differ solely on how many standard deviations correspond to the width of the pragmatic hypothesis.

5. Final Remarks

The spiral structure studied in the work by the authors of [10] can be used to describe scientific evolution. However, in order for the analogy to be complete, it is necessary to indicate what types of scientific theories or hypotheses are effectively tested in the acceptance vertex of the hexagon of oppositions. We defend that these are pragmatic hypotheses, which are sufficiently precise for the end-user of the theory.

In order to make this statement formal, we introduce three methods for constructing a pragmatic hypothesis associated to a precise hypothesis. These methods are based on three predictive dissimilarity functions: KL, BP and CD. Each of these methods have different advantages. For instance, the scale of BP and CD is more interpretable than KL, making it easier to determine whether the former are large or small. On the other hand, BP relies on the definition of more functions than KL and CD, such as

δ_{Z, θ_{0}} (z)

in Definition 2. If these function are chosen inadequately, then the shape of the resultant pragmatic hypothesis might be counterintuitive or meaningless. Finally, CD often does not have an analytic expression. It relies on numerical integration over the sample space, which can be taxing in high dimensions.

The applications at Section 4 present adequate choices of metrics and bounds (defining pragmatic hypotheses) for some given theoretical and experimental setups. Nevertheless, the authors did not propose a general or automated recipe for making these choices, nor do they think this to be a feasible goal. In future research, the authors intend to explore a variety of application cases, some using historical data of important experiments, and discuss possible choices of metrics and bounds for each case. The authors hope that, in time, the accumulation of such examples will provide useful guidelines for the good use of methods developed in this paper, in the same way that the statistical literature provides useful guidelines for choosing good statistical models for practical applications.

Author Contributions

All authors contributed equally to this paper.

Funding

This work was partially supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) grants PQ 306943-2017-4, 301206-2011-2, and 301892-2015-6; and Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) grants 2019/11321-9, 2017/03363-8, 2014/25302-2, CEPID-2013/07375-0, and CEPID-2014/50279-4. This study was also financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Finance Code 001.

Acknowledgments

The authors are grateful for the support of the Institute of Mathematics and Statistics of the University of São Paulo (IME-USP), and the Department of Statistics of UFSCar—The Federal University of São Carlos. Finally, the authors are grateful for advice and comments received from anonymous referees; from participants of the 6th World Congress on the Square of Opposition, held on November 1 to 5, 2018, at Chania, Crete, having as main organizers Jean-Yves Béziau and Ioannis Vandoulakis; and from participants of the 39th International Workshop on Bayesian Inference and Maximum Entropy Methods, held on June 30 to July 5, 2019, at Garching bei München, Bavaria, having as main organizers Udo von Toussaint and Roland Preuss.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs

Proof of Lemma 1.

Let

θ \in P g (Θ_{0}, K L_{Z}, 8 ϵ^{2})

. It follows from Theorem 1 that there exist

θ_{0} \in Θ_{0}

, such that

K L_{Z} (θ_{0}, θ) \leq 0.5 ϵ^{2}

. Conclude from Pinsker’s inequality that

C D_{Z} (θ_{0}, θ) \leq \sqrt{8^{- 1} K L_{Z} (θ_{0}, θ)} = ϵ

, that is,

θ \in P g (Θ_{0}, C D_{Z}, ϵ)

. □

Proof of Theorem 1.

Let

P g

be logically coherent. Pick an arbitrary

θ_{0} \in Θ_{0}

and note that, if

R (x) \equiv P g ({θ_{0}})

, then

ϕ_{P g ({θ_{0}})}^{R} (x) = 0

. Since

P g

is logically coherent, conclude that

ϕ_{P g (Θ_{0})}^{R} (x) \equiv 0

, that is,

P g ({θ_{0}}) \subseteq P g (Θ_{0})

. Since

θ_{0} \in Θ_{0}

was arbitrary, conclude that

\begin{matrix} ⋃_{θ_{0} \in Θ_{0}} P g ({θ_{0}}) & \subseteq P g (Θ_{0}) \end{matrix}

(A1)

Next, let

R (x) \equiv ⋂_{θ_{0} \in Θ_{0}} P g {({θ_{0}})}^{c}

. For every

θ_{0} \in Θ_{0}

,

ϕ_{P g ({θ_{0}})}^{R} (x) = 1

. Since

P g

is logically coherent,

ϕ_{P g (Θ_{0})}^{R} \equiv 1

, that is,

P g (Θ_{0}) \subseteq R^{c} \equiv ⋃_{θ_{0} \in Θ_{0}} P g ({θ_{0}})

. Conclude that

\begin{matrix} P g (Θ_{0}) & \subseteq ⋃_{θ_{0} \in Θ_{0}} P g ({θ_{0}}) \end{matrix}

(A2)

It follows from Equations (A1) and (A2) that

P g (Θ_{0}) = ⋃_{θ_{0} \in Θ_{0}} P g ({θ_{0}})

. It also follows from direct calculation that, if

P g (Θ_{0}) = ⋃_{θ_{0} \in Θ_{0}} P g ({θ_{0}})

, then

P g

is logically coherent. □

Proof of Theorem 2.

Let

Θ_{0} \subseteq Θ

\begin{matrix} P g (f [Θ_{0}], d_{Z}^{*}, ϵ) & = {θ^{*} \in Θ^{*} : \exists θ_{0}^{*} \in f [Θ_{0}] s . t . d_{Z}^{*} (θ^{*}, θ_{0}^{*}) \leq ϵ} \\ = {θ^{*} \in Θ^{*} : \exists θ_{0}^{*} \in f [Θ_{0}] s . t . d_{Z} (f^{- 1} (θ^{*}), f^{- 1} (θ_{0}^{*})) \leq ϵ} \\ = f [{θ \in Θ : \exists θ_{0} \in Θ_{0} s . t . d_{Z} (θ, θ_{0}) \leq ϵ}] \\ = f [P g (Θ_{0}, d_{Z}, ϵ)] \end{matrix}

□

Proof of Theorem 3.

Since the

Z_{i}

’s are i.i.d.,

{KL}_{m} (θ_{0}, θ^{*}) = m {KL}_{Z_{1}} (θ_{0}, θ^{*})

. It follows that

\begin{matrix} P g (Θ_{0}, K L_{m}, ϵ) & = ⋃_{θ_{0} \in Θ_{0}} P g ({θ_{0}}, {KL}_{m}, ϵ) \\ = ⋃_{θ_{0} \in Θ_{0}} P g ({θ_{0}}, m {KL}_{Z_{1}}, ϵ) = ⋃_{θ_{0} \in Θ_{0}} \{θ^{*} \in Θ : K L_{Z_{1}} (θ_{0}, θ^{*}) \leq m^{- 1} ϵ\} \end{matrix}

Thus,

{(P g (Θ_{0}, K L_{m}, ϵ))}_{m \geq 1}

is a non-increasing sequence of sets. It follows that

\begin{matrix} lim_{m \to \infty} P g (Θ_{0}, K L_{m}, ϵ) & = ⋂_{m \geq 1} ⋃_{θ_{0} \in Θ_{0}} \{θ^{*} \in Θ : {KL}_{Z_{1}} (θ_{0}, θ^{*}) \leq m^{- 1} ϵ\} \\ = ⋃_{θ_{0} \in Θ_{0}} ⋂_{m \geq 1} \{θ^{*} \in Θ : {KL}_{Z_{1}} (θ_{0}, θ^{*}) \leq m^{- 1} ϵ\} \\ = ⋃_{θ_{0} \in Θ_{0}} \{θ^{*} \in Θ : {KL}_{Z_{1}} (θ_{0}, θ^{*}) = 0\} \\ = ⋃_{θ_{0} \in Θ_{0}} {θ_{0}} = Θ_{0} \end{matrix}

where the next-to-last equality follows from the assumption that

{(F_{θ})}_{θ \in Θ}

is identifiable. The proofs for the

C D

divergence follows from the fact that

TV (P_{θ_{0}}, P_{θ^{*}}) \leq \sqrt{KL (P_{θ_{0}}, P_{θ^{*}})}

. □

References

Izbicki, R.; Esteves, L.G. Logical consistency in simultaneous statistical test procedures. Log. J. IGPL. 2015, 23, 732–758. [Google Scholar] [CrossRef]
Esteves, L.G.; Izbicki, R.; Stern, J.M.; Stern, R.B. The logical consistency of simultaneous agnostic hypothesis tests. Entropy 2016, 18, 256. [Google Scholar] [CrossRef]
Stern, J.; Izbicki, R.; Esteves, L.; Stern, R. Logically-Consistent Hypothesis Testing and the Hexagon of Oppositions. Log. J. IGPL 2017, 25, 741–757. [Google Scholar] [CrossRef]
Blanché, R. Structures Intellectuelles: Essai sur l’Organisation Systématique des Concepts; Vrin: Paris, France, 1966. (In French) [Google Scholar]
Béziau, J.Y. The power of the hexagon. Log. Univers. 2012, 6, 1–43. [Google Scholar] [CrossRef]
Béziau, J.Y. Opposition and order. In New Dimensions of the Square of Opposition; Béziau, J.Y., Gan-Krzywoszynska, K., Eds.; Philosophia Verlag: Munich, Germany, 2015; pp. 1–11. [Google Scholar]
Carnielli, W.; Pizzi, C. Modalities and Multimodalities (Logic, Epistemology, and the Unity of Science); Springer: Berlin, Germany, 2008. [Google Scholar]
Dubois, D.; Prade, H. On several representations of an uncertain body of evidence. In Fuzzy Information and Decision Processes; Gupta, M., Sanchez, E., Eds.; Elsevier: Amsterdam, The Netherlands, 1982; pp. 167–181. [Google Scholar]
Dubois, D.; Prade, H. From Blanché’s Hexagonal Organization of Concepts to Formal Concept Analysis and Possibility Theory. Log. Univers. 2012, 6, 149–169. [Google Scholar] [CrossRef]
Gallais, P.; Pollina, V. Hegaxonal and Spiral Structure in Medieval Narrative. Yale Fr. Stud. 1974, 51, 115–132. [Google Scholar] [CrossRef]
Gallais, P. Dialectique Du Récit Mediéval: Chrétien de Troyes et l’Hexagone Logique; Rodopi: Amsterdam, The Netherlands, 1982. (In French) [Google Scholar]
Stern, J.M. Symmetry, Invariance and Ontology in Physics and Statistics. Symmetry 2011, 3, 611–635. [Google Scholar] [CrossRef]
Stern, J.M. Continuous versions of Haack’s Puzzles: Equilibria, Eigen-States and Ontologies. Log. J. IGPL 2017, 25, 604–631. [Google Scholar] [CrossRef]
DeGroot, M.H.; Schervish, M.J. Probability and Statistics; Pearson Education: London, UK, 2012. [Google Scholar]
Berger, J.O. Statistical Decision Theory and Bayesian Analysis; Springer Science & Business Media: Berlin, Germany, 2013. [Google Scholar]
Bucher, J.L. The Metrology Handbook, 2nd ed.; ASQ Quality Press: Milwaukee, WI, USA, 2012. [Google Scholar]
Czichos, H.; Saito, T.; Smith, L. Springer Handbook of Metrology and Testing, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Cohen, R.; Crowe, K.; DuMond, J. The Fundamental Constants of Physics; CODATA/Interscience Publishers: Geneva, Switzerland, 1957. [Google Scholar]
Cohen, R. Mathematical Analysis of the Universal Physical Constants. Il Nuovo Cimento 1957, 6, 187–214. [Google Scholar] [CrossRef]
Lévy-Leblond, J. On the conceptual nature of the physical constants. Il Nuovo Cimento 1977, 7, 187–214. [Google Scholar]
Pakkan, M.; Akman, V. Hypersolver: A graphical tool for commonsense set theory. Inform. Sci. 1995, 85, 43–61. [Google Scholar] [CrossRef]
Akman, V.; Pakkan, M. Nonstandard set theories and information management. J. Intell. Inf. Syst. 1996, 6, 5–31. [Google Scholar] [CrossRef] [Green Version]
Wainwright, M.J. Stochastic Processes on Graphs with Cycles: Geometric and Variational Approaches. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, April 2002. [Google Scholar]
Bishop, C.M. Pattern Recognition and Machine Learning (Information Science and Statistics); Springer: Heidelberg, Germany, 2006. [Google Scholar]
Iordanov, B. HyperGraphDB: A generalized graph database. In International Conference on Web-Age Information Management; Springer: Berlin, Germany, 2010; pp. 25–36. [Google Scholar]
Gelman, A.; Vehtari, A.; Jylänki, P.; Sivula, T.; Tran, D.; Sahai, S.; Blomstedt, P.; Cunningham, J.P.; Schiminovich, D.; Robert, C. Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data. arXiv 2014, arXiv:1412.4869. [Google Scholar]
Greimas, A. Structural Semantics: An Attempt at a Method; University of Nebraska Press: Lincoln, NE, USA, 1983. [Google Scholar]
Propp, V. Morphology of the Folktale; University of Texas Press: Austin, TX, USA, 2000. [Google Scholar]
Stern, J.M. Jacob’s Ladder and Scientific Ontologies. Cybern. Human Knowing 2014, 21, 9–43. [Google Scholar]
Stern, J.M. Constructive Verification, Empirical Induction, and Falibilist Deduction: A Threefold Contrast. Information 2011, 2, 635–650. [Google Scholar] [CrossRef] [Green Version]
Abraham, R.; Marsden, J.E. Foundations of Mechanics; Addison-Wesley: Boston, MA, USA, 2013. [Google Scholar]
Hawking, S. The Illustrated On the Shoulders of Giants: The Great Works of Physics and Astronomy; Running Press: Philadelphia, PA, USA, 2004. [Google Scholar]
Stern, J.M.; Nakano, F. Optimization Models for Reaction Networks: Information Divergence, Quadratic Programming and Kirchhoff’s Laws. Axioms 2014, 3, 109–118. [Google Scholar] [CrossRef]
Coscrato, V.; Izbicki, R.; Stern, R.B. Agnostic tests can control the type I and type II errors simultaneously. Braz. J. Probab. Stat. 2019. Available online: https://www.imstat.org/wp-content/uploads/2019/01/BJPS431.pdf (accessed on 9 September 2019).
Izbicki, R.; Fossaluza, V.; Hounie, A.G.; Nakano, E.Y.; de Braganca Pereira, C.A. Testing allele homogeneity: The problem of nested hypotheses. BMC Genet. 2012, 13, 103. [Google Scholar] [CrossRef] [PubMed]
Da Silva, G.M.; Esteves, L.G.; Fossaluza, V.; Izbicki, R.; Wechsler, S. A bayesian decision-theoretic approach to logically-consistent hypothesis testing. Entropy 2015, 17, 6534–6559. [Google Scholar] [CrossRef]
Fossaluza, V.; Izbicki, R.; da Silva, G.M.; Esteves, L.G. Coherent hypothesis testing. Am. Statist. 2017, 71, 242–248. [Google Scholar] [CrossRef]
Pereira, C.A.B.; Stern, J.M.; Wechsler, S. Can a Signicance Test be Genuinely Bayesian? Bayesian Anal. 2008, 3, 79–100. [Google Scholar] [CrossRef]
Stern, J.M.; Pereira, C.A.D.B. Bayesian Epistemic Values: Focus on Surprise, Measure Probability! Log. J. IGPL. 2014, 22, 236–254. [Google Scholar] [CrossRef]
Casella, G.; Berger, R.L. Statistical Inference; Duxbury Press: Pacific Grove, CA, USA, 2002. [Google Scholar]
Wechsler, S.; Izbicki, R.; Esteves, L.G. A Bayesian look at nonidentifiability: A simple example. Am. Statist 2013, 67, 90–93. [Google Scholar] [CrossRef]
Coscrato, V.; Esteves, L.G.; Izbicki, R.; Stern, R.B. Interpretable hypothesis tests. arXiv 2019, arXiv:1904.06605. [Google Scholar]
Hardy, G. Mendelian proportions in a mixed population. 1908. Yale J. Biol. Med. 2003, 76, 79. [Google Scholar] [PubMed]
Brentani, H.; Nakano, E.Y.; Martins, C.B.; Izbicki, R.; Pereira, C.A.d.B. Disequilibrium coefficient: A Bayesian perspective. Stat. Appl. Genet. Mol. 2011, 10. [Google Scholar] [CrossRef]
Chow, S.C.; Song, F.; Bai, H. Analytical similarity assessment in biosimilar studies. AAPS J. 2016, 18, 670–677. [Google Scholar] [CrossRef]

Figure 1. Hexagons of opposition for statistical modalities.

Figure 2. Gallais’ evolutionary spiral.

Figure 3.

ϕ (x)

is an agnostic test based on the region estimator

R (x)

for testing

H_{0}

.

Figure 3.

ϕ (x)

is an agnostic test based on the region estimator

R (x)

for testing

H_{0}

.

Figure 4. Pragmatic hypotheses in Example 6 for

H_{0} : μ = 0

with KL (upper), CD (lower),

ϵ = 0.1

, and

M^{2} = 2

.

H_{0}

is represented by a red line in all figures.

Figure 4. Pragmatic hypotheses in Example 6 for

H_{0} : μ = 0

with KL (upper), CD (lower),

ϵ = 0.1

, and

M^{2} = 2

.

H_{0}

is represented by a red line in all figures.

Figure 5. Pragmatic hypotheses obtained for the HW equilibrium, depicted in red, using

m = 20

,

ϵ = 0.1

for BP and CD and

ϵ = 0.01

for KL. The blue regions indicate the pragmatic hypothesis for HW and

p = \frac{1}{3}

(top) and for HW (bottom). The lower, middle, and right panels were obtained, respectively, with BP, KL, and CD. The green regions in the right panels represents 80% HPD regions for the genotype distribution of each of the eight groups collected by Brentani et al. [44] and two simulated datasets.

Figure 5. Pragmatic hypotheses obtained for the HW equilibrium, depicted in red, using

m = 20

,

ϵ = 0.1

for BP and CD and

ϵ = 0.01

for KL. The blue regions indicate the pragmatic hypothesis for HW and

p = \frac{1}{3}

(top) and for HW (bottom). The lower, middle, and right panels were obtained, respectively, with BP, KL, and CD. The green regions in the right panels represents 80% HPD regions for the genotype distribution of each of the eight groups collected by Brentani et al. [44] and two simulated datasets.

Figure 6. Pragmatic hypotheses using BP in Example 8 when σ is known.

Table 1. Evolution of orbital astronomy and chemical affinity.

Vertex	Orbital Astronomy	Chemical Affinity
$I_{1}$ - Enthesis/ $A_{1}$ - Thesis	Ptolemaic/Copernican cycles and epicycles	Geoffroy affinity table and highest rank substitution
$U_{1}$ - Analysis	Circular or oval orbits?	Ordinal or numeric affinity?
$E_{1}$ - Antithesis	Non-circular orbits	Non-ordinal affinity
$O_{2}$ - Apothesis /Prosthesis	Elliptic planetary orbits, focal centering of sun	Integer affinity values, for arithmetic recombination
$Y_{2}$ - Synthesis	Kepler laws!	Morveau rules and tables!
$I_{2}$ - Enthesis $A_{2}$ - Thesis	Vortex physics theories, Keplerian astronomy	Affinity + stoichiometry substitution reactions
$U_{2}$ - Analysis	Tangential or radial forces?	Total or partial reaction?
$E_{2}$ - Antithesis	Non-tangential forces	Non-total substitutions
$O_{3}$ - Apothesis/Prosthesis	Radial attraction forces, inverse square of distance	Reversible reactions, equilibrium conditions
$Y_{3}$ - Synthesis	Newton laws!	Mass-Action kinetics!
$I_{3}$ - Enthesis/ $A_{3}$ - Thesis	Newtonian mechanics & variational equivalents	Thermodynamic theories for reaction networks

Table 2. Genotype counts for the eight groups in Brentani et al. [44]. Also, the decision of the GFBST agnostic hypothesis test [2] for testing in each group the pragmatic Hardy–Weinberg equilibrium hypothesis with

m = 20

. The decisions are the same for

K L

,

B P

, and

C D

.

Table 2. Genotype counts for the eight groups in Brentani et al. [44]. Also, the decision of the GFBST agnostic hypothesis test [2] for testing in each group the pragmatic Hardy–Weinberg equilibrium hypothesis with

m = 20

. The decisions are the same for

K L

,

B P

, and

C D

.

	AA	AD	DD	Decision
1	4	18	94	Agnostic
2	6	53	74	Accept
3	57	118	100	Agnostic
4	58	97	48	Agnostic
5	120	361	194	Agnostic
6	206	309	142	Accept
7	110	148	44	Accept
8	34	22	12	Agnostic
9	198	282	520	Reject
10	641	314	45	Accept

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Esteves, L.G.; Izbicki, R.; Stern, J.M.; Stern, R.B. Pragmatic Hypotheses in the Evolution of Science. Entropy 2019, 21, 883. https://doi.org/10.3390/e21090883

AMA Style

Esteves LG, Izbicki R, Stern JM, Stern RB. Pragmatic Hypotheses in the Evolution of Science. Entropy. 2019; 21(9):883. https://doi.org/10.3390/e21090883

Chicago/Turabian Style

Esteves, Luis Gustavo, Rafael Izbicki, Julio Michael Stern, and Rafael Bassi Stern. 2019. "Pragmatic Hypotheses in the Evolution of Science" Entropy 21, no. 9: 883. https://doi.org/10.3390/e21090883

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Pragmatic Hypotheses in the Evolution of Science

Abstract

1. Introduction

2. Gallais’ Hexagonal Spirals and the Evolution of Science

3. Pragmatic Hypotheses

3.1. Singleton Hypotheses

3.2. Composite Hypotheses

4. Applications

5. Final Remarks

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Proofs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI