Skip to main content

Non-commutative Logic for Compositional Distributional Semantics

  • Conference paper
  • First Online:
Logic, Language, Information, and Computation (WoLLIC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10388))

  • 487 Accesses

Abstract

Distributional models of natural language use vectors to provide a contextual foundation for meaning representation. These models rely on large quantities of real data, such as corpora of documents, and have found applications in natural language tasks, such as word similarity, disambiguation, indexing, and search. Compositional distributional models extend the distributional ones from words to phrases and sentences. Logical operators are usually treated as noise by these models and no systematic treatment is provided so far. In this paper, we show how skew lattices and their encoding in upper triangular matrices provide a logical foundation for compositional distributional models. In this setting, one can model commutative as well as non-commutative logical operations of conjunction and disjunction. We provide theoretical foundations, a case study, and experimental results for an entailment task on real data.

K. Cvetko-Vah acknowledges the financial support from the Slovenian Research Agency (research core funding No. P1-0222). M. Sadrzadeh, D. Kartsaklis and B. Blundell acknowledge financial support from AFOSR International Scientific Collaboration Grant FA9550-14-1-0079.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abramsky, S., Coecke, B.: A categorical semantics of quantum protocols. In: Proceedings of the 19th Annual IEEE Symposium on Logic in Computer Science (LiCS 2004). IEEE Computer Science Press (2004). arXiv:quant-ph/0402130

  2. Berendsen, J., Jansen, D.N., Schmaltz, J., Vaandrager, F.W.: The axiomatization of override and update. J. Appl. Log. 8, 141–150 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  3. Chomsky, N.: Three models for the description of language. IRE Trans. Inf. Theory 2, 113–124 (1956)

    Article  MATH  Google Scholar 

  4. Coecke, B., Sadrzadeh, M., Clark, S.: Mathematical foundations for distributed compositional model of meaning. Lambek Festschr. Linguist. Anal. 36, 345–384 (2010)

    Google Scholar 

  5. Cvetko-Vah, K.: Skew lattices of matrices in rings. Algebra Univers. 53, 471–479 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  6. Cvetko-Vah, K., Salibra, A.: The connection of skew Boolean algebras and discriminator varieties to church algebras. Algebra Univers. 73, 369–390 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  7. Cvetko-Vah, K., Leech, J., Spinks, M.: Skew lattices and binary operations on functions. J. Appl. Log. 11, 253–265 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  8. Firth, J.: A synopsis of linguistic theory 1930–1955. In: Studies in Linguistic Analysis (1957)

    Google Scholar 

  9. Galatos, N., Jipsen, P., Kowalski, T., Ono, H.: Residuated Lattices: An Algebraic Glimpse at Substructural Logics. Studies in Logic and the Foundations of Mathematics, vol. 151. Elsevier, Amsterdam (2007)

    MATH  Google Scholar 

  10. Harris, Z.: Distributional structure. Word 10, 146–162 (1954)

    Article  Google Scholar 

  11. Jordan, P.: Über nichtkommutative verbände. Arch. Math. 2, 56–59 (1949)

    Article  MathSciNet  MATH  Google Scholar 

  12. Kartsaklis, D., Sadrzadeh, M.: A compositional distributional inclusion hypothesis. In: Amblard, M., de Groote, P., Pogodalla, S., Retoré, C. (eds.) LACL 2016. LNCS, vol. 10054, pp. 116–133. Springer, Heidelberg (2016). doi:10.1007/978-3-662-53826-5_8

    Chapter  Google Scholar 

  13. Kartsaklis, D., Sadrzadeh, M.: Distributional inclusion hypothesis for tensor-based composition. In: COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, Osaka, Japan, 11–16 December 2016, pp. 2849–2860. ACL (2016)

    Google Scholar 

  14. Kotlerman, L., Dagan, I., Szpektor, I., Zhitomirsky-Geffet, M.: Directional distributional similarity for lexical inference. Nat. Lang. Eng. 16(4), 359–389 (2010)

    Article  Google Scholar 

  15. Lambek, J.: Type grammar revisited. In: Lecomte, A., Lamarche, F., Perrier, G. (eds.) LACL 1997. LNCS, vol. 1582, pp. 1–27. Springer, Heidelberg (1999). doi:10.1007/3-540-48975-4_1

    Chapter  Google Scholar 

  16. Lambek, J.: The mathematics of sentence structure. Am. Math. Mon. 65, 154–170 (1958)

    Article  MathSciNet  MATH  Google Scholar 

  17. Leech, J.: Skew lattices in rings. Algebra Univers. 26, 48–72 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  18. Leech, J.: Skew Boolean algebras. Algebra Univers. 27, 497–506 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  19. Leech, J.: Normal skew lattices. Semigroup Forum 44, 1–8 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  20. Leech, J.: Recent developments in the theory of skew lattices. Algebra Univers. 52, 7–24 (1996)

    MathSciNet  MATH  Google Scholar 

  21. Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the 17th International Conference on Computational Linguistics, vol. 2, pp. 768–774. Association for Computational Linguistics (1998)

    Google Scholar 

  22. Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the International Conference on Machine Learning, pp. 296–304 (1998)

    Google Scholar 

  23. Rubenstein, H., Goodenough, J.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)

    Article  Google Scholar 

  24. Schuetze, H.: Automatic word sense discrimination. Comput. Linguist. 24(1), 97–123 (1998)

    Google Scholar 

  25. Weeds, J., Weir, D., McCarthy, D.: Characterising measures of lexical distributional similarity. In: Proceedings of the 20th International Conference on Computational Linguistics, COLING 2004. Association for Computational Linguistics (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karin Cvetko-Vah .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Normalisation Schemes

The raw co-occurrence counts are normalised using two measures:

  • Probability Ratio

    $$\begin{aligned} \frac{P(w,f)}{P(w)P(f)} \end{aligned}$$

    where P(w, c) is the probability that words w and feature f have occurred together, and P(w) and P(f) are probabilities of occurrences of w and f. This measure tells us how often w and f were observed together in comparison to how often they would have occurred were they independent.

  • Positive Pointwise Mutual Information (PPMI)

    $$\begin{aligned} \max (log(\frac{P(w,f)}{P(w)P(f)}), 0) \end{aligned}$$

    This is the positive version of the logarithm of probability ratio, where the negative logarithmic values are sent to 0.

1.2 A.2 Formulae for Computing Entailment

APinc is the average precision applied to feature inclusion. It measures a ranked version of feature inclusion on vectors \(\overrightarrow{u}\) and \(\overrightarrow{v}\), from highest to lowest:

$$\begin{aligned} \textit{APinc}(u,v) = \frac{\sum _r \left[ P(r) \cdot \textit{rel}'(f_r)\right] }{|F(\overrightarrow{u})|} \end{aligned}$$
(1)

In the above, \(f_r\) is the feature in \(\overrightarrow{u}\), denoted by \(F(\overrightarrow{u})\), with rank r; P(r) is the precision at rank r, which measures how many of \(\overrightarrow{v}\)’s features are included at rank r in the features of \(\overrightarrow{u}\), and \(\textit{rel}'(f_r)\) is a relevance measure reflecting how important \(f_r\) is in \(\overrightarrow{v}\). It is computed as follows:

$$\begin{aligned} rel'(f) = \left\{ \begin{array}{lr} 1-\frac{\textit{rank}(f,F(\overrightarrow{v}))}{|F(\overrightarrow{v})|+1} &{} f \in F(\overrightarrow{v}) \\ 0 &{} o.w. \end{array}\right. \end{aligned}$$
(2)

BAPinc balances APinc with the LIN degree of similarity between the vectors. BAPinc was developed in [14] after realising that APinc returns poor results when the vectors had a radically different number of non-zero features; the LIN measure was included to balance out the extra dimensions of the longer vector.

$$\begin{aligned} \textit{BAPinc}(u,v) = \sqrt{\textit{LIN}(u,v) \cdot \textit{APinc}(u,v)} \end{aligned}$$
(3)

LIN is a similarity measure between vectors and was defined in [22]. It can be replaced with any other similarity measure, such as the cosine measure.

SAPinc is a measure developed in [12], based on BAPinc, but for dense vectors. Whereas APinc and BAPinc were developed to compute the degree of entailment between word vectors, which are usually sparse since word vectors live in high dimensional spaces (e.g. 5000), SAPinc was developed to deal with phrase and sentence vectors. These are obtained by composing the vectors of words in lower dimension (e.g. 300), where the compositional operators accumulate the information and return dense results.

$$\begin{aligned} \textit{SAPinc}(u,v) = \frac{\sum _r \left[ P(r) \cdot \textit{rel}'(f_r)\right] }{|\overrightarrow{u}|} \end{aligned}$$
(4)

Here, P(r) and \(rel'(f_r)\) are defined differently, as shown below:

$$\begin{aligned} P(r) = \frac{\big |\{ f_r^{(u)} | f_r^{(u)} \le f_r^{(v)}, 0 < r \le |\overrightarrow{u}| \}\big |}{r} \end{aligned}$$
(5)
$$\begin{aligned} rel'(f_r) = \left\{ \begin{array}{lr} 1 &{} f_r^{(u)} \le f_r^{(v)} \\ 0 &{} o.w. \end{array} \right. \end{aligned}$$
(6)

For more explanations on these measures please see [12, 13].

1.3 A.3  Experimental Results for a Second Sample

The results of the experiment of Sect. 6, with PPMI and probability ratio matrices on the second 1000 sample of the dataset are presented in Fig. 7.

Fig. 7.
figure 7

Results of the non-commutative conjunction experiment with the PPMI and probability ratio on the 2nd sample of dataset.

Similar to the results presented in the paper, the non-commutative operation performs better on recognising the non-commutative conjunctive entailments.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer-Verlag GmbH Germany

About this paper

Cite this paper

Cvetko-Vah, K., Sadrzadeh, M., Kartsaklis, D., Blundell, B. (2017). Non-commutative Logic for Compositional Distributional Semantics. In: Kennedy, J., de Queiroz, R. (eds) Logic, Language, Information, and Computation. WoLLIC 2017. Lecture Notes in Computer Science(), vol 10388. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-55386-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-55386-2_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-55385-5

  • Online ISBN: 978-3-662-55386-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics