Toward an ontology of tobacco, nicotine and vaping products

Abstract Background and aims Ontologies are ways of representing information that improve clarity and the ability to connect different data sources. This paper proposes an initial version of an ontology of tobacco, nicotine and vaping products with the aim of reducing ambiguity and confusion in the field. Methods Terms related to tobacco, nicotine and vaping products were identified in the research literature and their usage characterised. Basic Formal Ontology was used as a unifying upper‐level ontology to describe the domain, and classes with definitions and labels were developed linking them to this ontology. Labels, definitions and properties were reviewed and revised in an iterative manner until a coherent set of classes was agreed by the authors. Results Overlapping, but distinct classes were developed: ‘tobacco‐containing product’, ‘nicotine‐containing product’ and ‘vaping device’. Subclasses of tobacco‐containing products are ‘combustible tobacco‐containing product’, ‘heated tobacco product’ and ‘smokeless tobacco‐containing product’. Subclasses of combustible tobacco‐containing product include ‘cigar’, ‘cigarillo’, ‘bidi’ and ‘cigarette’ with further subclasses including ‘manufactured cigarette’. Manufactured cigarettes have properties that include ‘machine‐smoked nicotine yield’ and ‘machine‐smoked tar yield’. Subclasses of smokeless tobacco product include ‘nasal snuff’, ‘chewing tobacco product’, and ‘oral snuff’ with its subclass ‘snus’. Subclasses of nicotine‐containing product include ‘nicotine lozenge’ and ‘nicotine transdermal patch’. Subclasses of vaping device included ‘electronic vaping device’ with a further subclass, ‘e‐cigarette’. E‐cigarettes have evolved with a complex range of properties including atomiser resistance, battery power, properties of consumables including e‐liquid nicotine concentration and flavourings, and the ontology characterises classes of product accordingly. Conclusions Use of an ontology of tobacco, nicotine and vaping products should help reduce ambiguity and confusion in tobacco control research and practice.


INTRODUCTION
Products containing tobacco are widely used worldwide and their use has been estimated to contribute to more than 7 million deaths per year globally [1]. There are many diverse products that differ in popularity according to geographical region, demographic characteristics, legislation and culture. The products also differ in how they are used, how the tobacco is prepared and their harmfulness to health. Cigarettes contribute to the most deaths [1]. Products containing nicotine, but no other tobacco constituents are also used widely, most commonly in an attempt to stop or reduce cigarette consumption [2,3].
The advent of non-tobacco nicotine-containing products, e-cigarettes and other vaping devices has added to the range of products that need to be considered to understand and address tobacco-related harm [2,3]. We have also seen marketing of tobacco products that are heated, but without being fully combusted. Such a wide variety of products with such varied properties makes it essential to have clarity about the type of product to which one is referring [4][5][6]. This paper aims to improve clarity by setting out an initial version of an ontology of products related to tobacco, nicotine and vaping with clearly defined classes and relationships between them.
The rapid evolution of tobacco, nicotine and vaping products has caused problems when it comes to defining and labelling these, and some high profile organisations have struggled to achieve logical consistency in their nomenclature [4,[6][7][8]. For example, the Food and Drug Association (FDA) in the United States (US), classifies ecigarettes as a 'tobacco product' [6,7]. Unfortunately, this creates two inconsistencies. First of all, the only tobacco constituent in ecigarettes is nicotine, and nicotine replacement therapy (NRT) products are not classified as tobacco products because they are regulated as medicinal products by the Center for Drug Evaluation and Research (CDER). Second, many e-cigarettes do not contain nicotine or any other tobacco constituent or even any psychoactive drug. The World Health Organisation (WHO) similarly classifies as tobacco products ones that have no tobacco constituents [9]. Bloomberg Philanthropies, a major funder of tobacco control research, goes further in labelling ecigarettes as 'flavoured tobacco products', which at a broad level to those without specific product knowledge conflates and equates them with other types of flavoured combustible products [10].
How tobacco-related products are labelled influences the conduct and interpretation of scientific research. Lack of clarity around products has led to misunderstanding and disputes over the interpretation of data (e.g. Jarvis et al.) [11]. For example, estimates of the prevalence of 'youth tobacco use' in the United States have been increased by including e-cigarettes as a tobacco product, whereas headline figures for adult smoking prevalence have focused on cigarette smoking and excluded cigarillos, which are very similar to cigarettes in addictiveness and harmfulness [12]. This confusion has been exacerbated by conflation of ever-use, past 30-day use and 'current' use [11,13].
Another example of confusing use of terminology is the term 'ecigarette or vaping product use-associated lung injury (EVALI)' [14].
The term was adopted by the US Centers for Disease Control and others to describe a clinical condition involving acute lung damage from using certain types of vaping device [15]. It is built into the label that a cause of the condition is use of e-cigarettes. However, the condition has been found not to be caused by use of e-cigarettes per se, but almost certainly by vaping tetrahydrocannabinol (THC)-containing e-liquids adulterated with vitamin E acetate [14]. The widespread conflation of vaping THC with e-cigarette use in this case has led to further inaccurate perceptions of the harmfulness of e-cigarettes among members of the public [16]. Although an ontology may not immediately help with such issues, if scientists can be clear on the classification of products, so too may the writing of press releases, and other types of scientific communication to the press and the public. In particular, the ontology may help to highlight the distinction between harmfulness that arises from products themselves and that with their replaceable constituents such as different forms of e-liquid.
Another area of confusion and lack of clarity are the terms used for inclusion criteria and outcomes in the research studied. For example, it is common for studies to conflate tobacco use, tobacco smoking and cigarette smoking. Therefore, randomised trials often refer to 'smoking cessation' as an outcome measure when the presumed meaning is cigarette smoking, tobacco smoking or even tobacco use.
This may not appear to matter in cultures where cigarette smoking is the only form of smoking, but presents difficulties when attempting to integrate evidence internationally as in other locations or cultures cigarillos or other tobacco products may be more dominant.
Ontologies can go a long way to addressing such confusions.
They are formal ways of representing the world using clearly delineated classes and their inter-relationships, taking care to ensure that each entity is unique and unambiguously defined [17,18]. Ontologies are being used in other areas of study such as biomedicine and social science to address problems of inconsistent and confusing terminology, for example, the Gene Ontology in molecular biology [19] and the Behaviour Change Intervention Ontology in behavioural science [20]. They provide a more secure foundation for advancing the science in a given domain, linking it with other domains, and making the output more usable by policy makers and practitioners.
The definitions of classes in ontologies differ from typical dictionary definitions. In ontologies, the definition is primary and the label is a way of referring to that defined entity, whereas in dictionary definitions the label is primary and the definition makes a claim about the correct usage of the label [17]. For example, in a dictionary definition one can claim that the correct meaning of the word 'elephant' is a large pachyderm with an elongated proboscis and large ears. In fields such as addiction, there is no unique 'true' meaning for a given term, because different people and organisations often use the same term to represent different things and use different terms to represent the same thing.
Use of ontological definitions is important because it avoids fruitless discussions about the 'true' meaning of labels and focuses instead on identifying classes of entities that can be unambiguously defined, whatever one chooses to call them. The label that is given to an entity is important, but only insofar as it allows users to make reference to entities without having to keep looking up the definition. The primacy of the definition allows users from different epistemological positions to take different views about what they mean by a label as long as they make it clear, which definition their preferred label relates to. This in turn means that a field can achieve wider convergence around a set of definitions even in areas where the use of specific labels is disputed. In ontologies, the concept of 'namespace' is used to uniquely specify the ontology to which a given label belongs, and within each ontology unique unambiguous identifiers specify entities.
Therefore, if people want to use a term such as 'dependence' in different ways, they just need to specify the namespace and unique identifier for the term they are using to make clear what it is referring to. Each entity can then have different and potentially overlapping labels and synonyms to reflect different communities of usage.
The Society for the Study of Addiction has sponsored the initial development of an Addiction Ontology (namespace: AddictO) that can be applied across the whole addiction domain [21]. As part of this, and with additional funding from Cancer Research United Kingdom (UK) we are developing an E-Cigarette Ontology (E-CigO) to include all entities relating to e-cigarettes and their study, which includes tobacco, nicotine and vaping products [4]. This paper reports the development of a part of E-CigO concerning tobacco and nicotine products and vaping devices. The aim was to arrive at a set of definitions and labels that could be used to unambiguously characterise these products and show how they relate to each other. If this ontology becomes widely used it should improve research and practice in this area.

Data sources for ontology development
The E-CigO project was initiated with a webinar [22]  These terms were separated into those that were generic (of relevance to the broader scientific domain) and those that were specific to the addiction domain. The resulting list of terms was merged with that arising from the workshop to create a master list of entities of relevance for the domain, with a flag to indicate those that were specific to the e-cigarette domain in particular as opposed to those that were relevant for addiction in general, but not directly for the e-cigarette domain (e.g. measures of alcohol dependence).
The top-down step, which proceeded concurrently, involved an iterative process of attempting to identify sub-domains of entities referred to in addiction papers. The results of this process will be reported elsewhere; for the current paper we note that one of the sub-domains was labelled 'product' and this included all entities relating to products referred to in the addiction literature, including addictive products such as tobacco-related products and alcoholic beverages, chemicals such as nicotine and alcohol and devices constructed to ingest chemicals.
The authors of this paper represent the core team who were responsible for curating the entities collected through the above processes and seeking agreement on what should be included or edited ready for inclusion. Each team member focused on a specific area to develop and sought feedback from the team at weekly meetings (held online) over a 2-year period. This process allowed the team to systematically work through the entities by domain and agree on (i) whether the entity should be included, (ii) its position with the ontology, (iii) label and definition. We sought a majority consensus before including entities, if we could not seek agreement or the scope was beyond our group's expertise, we would seek input from an expert in that area or refer to other ontologies-although this only happened on a few occasions. J.H. was responsible for ontology expertise, consistency checks and assigning identifiers through automated processes.

Use of Basic Formal Ontology as the upper level ontology
A hierarchy of classes is a key feature of ontologies. As one goes up the hierarchy to more and more general classes, it is important for interoperability to connect up as far as possible with a single coherent set of upper-most classes shared across domains. Most ontologies in biological sciences link to a single upper level ontology called Basic Formal Ontology (BFO) [23]. This was the upper-level ontology chosen for AddictO and E-CigO. An explanation of this and information about BFO can be found in our recently published paper [24].
In brief, BFO distinguishes two broad classes of entity: continuants (things that continue to exist as individuals over a period of time, such as objects and their attributes) and occurrents (things that take place or happen in time, such as processes). Within continuants, BFO distinguishes between 'independent continuants' such as objects and regions that have a physical presence and 'dependent continuants' such as qualities and dispositions that are attributes that have no separate existence beyond the things they are attributes of. 'Qualities' are attributes such as age and mass that are fully manifested in the things they are attributes of. Dispositions are attributes such as addictiveness and harmfulness that are more complex in that they are only conditionally manifested in certain contexts; they involve the potential for something to occur under certain conditions. A third important type of dependent continuant, that is not in BFO but is closely linked to it, is the 'information content entity'. These are entities that provide information about things (e.g. 'grams of tobacco' provides information, using a unit of measurement, about an amount of tobacco).
Tracing all the classes in the ontology being developed to classes in BFO provides a basis for linking them to other ontologies that use BFO. In addition, AddictO and E-CigO are developed using the technical standards advocated by the Open Biological and Biomedical Ontology (OBO) Foundry [25], which enable direct re-use of ontology content across ontologies, avoiding duplication of effort. For example, classes representing chemicals such as nicotine do not have to be defined anew in AddictO, but can be imported from the Chemical Entities of Biological Interest (ChEBI) ontology, which adopts the same technical standards. The terms identified for inclusion were entered as labels into spreadsheets for each of the subdomains and the project team discussed and revised the labels, definitions and other fields until agreement was reached within the team. If there was a suitable entity that could be used from an existing ontology, this was imported; otherwise an entity was created. This is an ongoing process. Ontologies grow organically as new entities and relationships are added. Computer programs were written using the ROBOT tool to convert the spreadsheet representation into the standard format for ontologies, Web Ontology Language (OWL) [26].

RESULTS
The tobacco and related products and their properties  AddictO distinguishes tobacco-containing products as products that consist of or include tobacco that are distinct from nicotinecontaining products that in turn are distinguished from vaping devices.
This upper-level distinction is essential to ensure consistency and clarity in the use of terms. Therefore, although nicotine transdermal patches may contain nicotine, they do not contain tobacco, although they are used to ingest a chemical, nicotine, which is usually (but not always) found in tobacco. Similarly vaping devices cannot be classified as either nicotine-or tobacco-containing products because in many cases they contain neither nicotine or tobacco, or indeed a psychoactive substance of any kind.
Entities that the US FDA calls 'tobacco products' had to be considered as a separate class because of the inconsistencies in the definition. This class is not included in Table 1 or Fig. 2, but can be found on Qeios (www.qeios.com/read/VMLSA4.4). Although this definition has been developed for purposes of regulation in the United States, its potential for causing confusion means that it is better avoided in scientific discourse unless the purpose is to refer specifically to the regulatory status of a product in the United States.
To define tobacco-containing products, we have to define tobacco and assign it a parent class. An obvious parent class would be 'psychoactive substance', but biomedical ontologies typically do not use psychological or biological effects of chemicals as their primary basis for classification of substances because these are complex and context-specific attributes. Therefore, 'nicotine' in the CHEBI ontology is classed according to its chemical structure. Another reason for not using psychological or biological effects in the primary classification is that any chemical or substance may have many such actions.
Nicotine is not only psychoactive, it has effects on the cardiovascular and many other body systems. Therefore, we used commodity as the parent for tobacco and product for the parent for tobacco product.
The ontology subdivides tobacco-containing product classes into combustible tobacco-containing products, heated tobacco products and smokeless tobacco-containing products. This is to reflect the most common upper-level division of these classes of product in common usage. It also captures some empirical distinctions. Combustion of tobacco creates additional toxins that increase their harmfulness to health. Combustible products are also designed for users to inhale the smoke, which offers both the potential for increased harm because of toxic chemicals coming into contact with the large and delicate tissues F I G U R E 2 Major classes of tobacco and related products included in the Addiction Ontology. Each rounded box in the Figure represents a class in the ontology, whereas arrows represent relationships between classes. Black unlabelled arrows represent taxonomic (hierarchical) relationships; other relationships are labelled with the relationship type: 'has part' and its inverse 'part of' represent parthood relationships, 'contains' represents inclusion, 'has disposition' captures the relationship between a thing and the dispositional properties or attributes it has (properties that are only realized in certain contexts), and 'inheres in' captures the inverse relationship between a property or attribute and the thing it is a property or attribute of.
T A B L E 1 Major classes of tobacco and related products and their properties. Combustible tobacco-containing product A tobacco-containing product in which the tobacco is processed to make it suitable for burning for the user to ingest the smoke.
Tobacco-containing product www.qeios.com/read/ 1CRCDI Heated tobacco product A tobacco-containing product that contains a tobacco stick that it heats to cause minimal or no combustion to produce smoke or a vapour for inhalation by a person.
Tobacco-containing product www.qeios.com/read/ WF7A8D.2 Smokeless tobacco-containing product A tobacco-containing product in which the tobacco is processed for use in a manner that does not involve combustion. Nasal snuff A smokeless tobacco-containing product that is formed of dry finely-ground tobacco and is ingested by a person into the nasal cavity.
Smokeless nicotine-containing product www.qeios.com/read/ T2DP55 Oral snuff An oral tobacco-containing product that is composed mainly or exclusively of moist, ground or powdered tobacco that is processed to make it suitable for use by a person placing it in the mouth between the gum and the cheek.
Oral tobacco-containing product www.qeios.com/read/ 6NCMYS.2 Cigarillo A cigar that is short and thin with wrapping that is tobacco or tobaccobased paper.
Cigar www.qeios.com/read/ SNP7WR Manufactured cigarette A cigarette that is manufactured in a factory.
Cigarette www.qeios.com/read/ XJ63ZR Hand-rolled cigarette A cigarette whose parts have been purchased separately and assembled by hand or using a device for single assembly.

Types of nicotine-containing products
Oral nicotine-containing product A nicotine-containing product that is manufactured to be held in the mouth or chewed so that nicotine is absorbed through the buccal mucosa.
Nicotine-containing product www.qeios.com/read/ 6IVS58 Nasal nicotine-containing product A nicotine-containing product that is manufactured to enable users to ingest nicotine through the nasal mucosa.
Nicotine-containing product www.qeios.com/read/ GSE0NT Iinhaled nicotine-containing product A nicotine-containing product that is manufactured to enable users to draw a nicotine-containing vapour into the respiratory tract so that nicotine can be absorbed through the lining of this tract.
Nicotine-containing product www.qeios.com/read/ XE3EUB Transdermal nicotine-containing product A nicotine-containing product that is manufactured to be attached to the skin so that nicotine can be absorbed through the skin.
Nicotine-containing product www.qeios.com/read/ J10IA5 Types of e-cigarette E-cigarette An electronic vaping device that is hand held and produces for inhalation an aerosol formed by heating an e-liquid using a battery-powered heating coil.
of the lungs and upper airways, and more rapid ingestion of nicotine, which is the primary psychoactive compound in tobacco.
Combustible tobacco-containing products come in many forms.
The ontology, based on common usage, distinguishes between cigars (including cigarillos) whose outer covering is made from tobacco, cigarettes (whose outer covering is made from a thin paper), and bidi (which are like small cigarettes except that their outer covering could be made from paper or a processed plant leaf). It is important to note that beyond this distinction in terms of components, the classes of combustible tobacco-containing products provide little information on attributes of interest, particularly harmfulness and addictiveness.
This severely restricts the value of relying on these classes per se as a basis for clinical or policy decisions. For example, cigarillos of the type widely used in the United States are classified as cigars, but in terms of harmfulness and addictiveness have much more in common with cigarettes than with large panatella cigars because of key attributes that influence the way they are used [28]. In particular, the smoke is readily inhaled into the lungs and they are cheap and convenient to use.
An important distinction between types of cigarettes is between manufactured cigarettes and hand-rolled cigarettes. This distinction is important because the former can be characterised in terms of machine-smoked nicotine yield, machine-smoked tar yield, and machine smoked carbon monoxide yield, whereas the latter cannot.
The latter are typically much less expensive than the former. Note that machine-smoked yields of cigarettes bear very little relation to the amount of nicotine, tar and carbon monoxide that users typically obtain from them because of compensatory puffing and inhalation patterns [29]. Nevertheless, they need to be represented in the ontology because they are often referred to in research and policy.
A separate major class had to be created for heated tobacco products such as iQos. This is because, although it has been claimed that there is no combustion, they produce carbon monoxide, which indicates that the temperature attained does lead to some combustion. Moreover, they cannot be considered combustible tobaccocontaining products because the tobacco stick that is heated is not consumed by the heat. They could potentially be considered as a vaping device, but they do contain tobacco and so it was judged that it would be best to classify them as tobacco-containing products. Tobacco-containing products are typically, although not necessarily, nicotine-containing products, but many nicotine-containing products are not tobacco-containing products. The route of administration provides some information about the rate of nicotine absorption that can be expected with transdermal being the slowest and pharyngealpulmonary being the fastest. This in turn may provide information on their addictiveness [30]. However, individual products may differ considerably in these attributes so class membership cannot be used definitely for this inference. It should also be noted that the route of administration can allow inferences to be drawn about the rate of nicotine absorption. For example, the transdermal route produces slower absorption than the buccal route, which provides slower absorption than the nasal route, which provides slower absorption than the pulmonary route [30]. The resistance of the atomiser, power rating of the battery, nicotine concentration of the e-liquid and choice of flavourings appear to be important features that should be represented. Figure 2 and Table 1 show the proposed classes for use with e-cigarettes. The Addiction Ontology specifies that e-cigarettes have as parts batteries, atomisers and e-liquid containers, which contain e-liquid, and then specifies the properties of these. This provides a systematic way of capturing the complexity of the differing properties of e-cigarettes, which in turn can provide a basis for gathering data on these properties. More classes that capture different variants of e-cigarettes can be found on AddictO Vocab (addictovocab.org).
Given the varied ways in which tobacco products are labelled in research reports and policy documents, the proposed ontology includes synonyms for some of the classes. Synonyms do not have to be unique, but provide a way of searching the ontology for classes based on colloquial terms. These synonyms can be found in AddictO Vocab.

DISCUSSION
Creating an ontology of tobacco and related products involves making important distinctions between classes that, although they may be related in practice, have to be treated as separate. Tobacco-containing products cannot logically include nicotine-containing products, and vaping devices are a separate class. Each of these classes has subclasses that need to be distinguished from each other in research reports and policy documents. The ontology proposed in this paper makes in-roads to achieving this. The definitions are published on Qeios and readers are invited to comment.
Adopting the terms and definitions in the ontology should help to avoid researchers and policymakers confusing vaping with e-cigarette use, cigarette smoking with tobacco smoking and nicotine use with tobacco use. However, this will not happen overnight and there is a role for journals and research funders to foster good practice.
Researchers can immediately begin to use the ontology by using the labels in their research reports and referring to the definitions in Qeios. Journals and research funders can begin to set standards when it comes to use of terminology to help ensure clarity. Researchers and practitioners can also join the ongoing discussion about the terms and definitions and develop expertise in ontologies to improve and extend the ontology.
Work is currently underway to extend the ontology to the wide range of smokeless tobacco-containing products in use around the world. There is also ongoing work to define and label e-cigarette components, and to use the ontology of products to define tobacco and nicotine use behaviours [4].
It is important to note that the ontology definitions presented here are based on the product characteristics and what they are designed to do by the manufacturer. That is, although an e-cigarette may be used by a sub-community for purposes other than vaping nicotine (e.g vaping illicit substances [30]) the products were not designed for this purpose. Indeed, in this case, it is actually clearer for the reader if they understand that another substance has been used within an e-cigarette for purposes that the product was not designed for, rather than trying to shape the definition of a device to accommodate all the possibly diverging ways it is being used in practice. Problems arise when researchers create their own labels for products (e.g. alternative nicotine delivery systems [ANDS]) or are vague as to the product characteristics.
However, for completeness and to make the resource useful and searchable we include product synonyms. Synonyms provide a useful searchable point of reference, but the primary product label should be used wherever possible. This is because synonyms (e.g. vape as a synonym for e-cigarette), although familiar to some researchers, are often based on colloquialisms and do not reflect the object as either it truly is or how the manufacturer designed the product to be used. As mentioned before, within this ontology synonyms are not always exact equivalences, because they may also offer the ontology user the opportunity to see how the product is being broadly referred to even if those terms are ambiguous. An example of a synonym that is not an exact equivalence is e-cigarette and electronic nicotine delivery device (END). A further example of a synonym that is a colloquial term is when referring to e-cigarette use as 'non-cigarette tobacco use' [31]. This expression is factually incorrect as e-cigarette use is not tobacco use, but could nevertheless be included within the ontology as a synonym connected to the class 'e-cigarette use'.
Retroactively, the ontology can be used to synthesise previously published data, to understand how findings in relation to different products align, depending on how the products are described (if at all), fit within these categories and where errors with the published data may exist. For example, in reviews of the literature on product use and health impacts it is important to separate the different types of ecigarette and tobacco product, before it is possible to draw integrated conclusions about their respective health impacts. The ontology, therefore, provides a vital resource for systematic reviews, living reviews, meta-synthesis and literature scoping reviews.
Limitations of the organisation of definitions in an ontological structure need to be acknowledged. Tobacco, nicotine and vaping research is a broad field encompassing many products that are continuously evolving. The taxonomy presented here is not complete and it is possible that because of the limitations of our scoping exercises some domains and entities have been missed. Ontologies are themselves living entities, continuous works in progress, and we refer readers to http://addictovocab.org/addicto.owl for the most up-todate version. New products and more detail will continually be added to the ontology; therefore, lower level sibling relationships and the level of detail may change as new information becomes available, although the main hierarchy and distinctions between these products will most likely not change dramatically. The ontology and its components are subject to open peer review and public involvement that we expect to add greater depth, clarity, granularity and contextualisation to our representation of products.

CONCLUSIONS
An ontology is proposed that makes important distinctions between tobacco-containing products, nicotine-containing products and vaping devices. It identifies subclasses of these and properties of these subclasses. We have reported an initial version of the ontology. The aim is that the research and practitioner community can make use of the opportunities provided to comment on and help to extend the ontology and that journals and funders will encourage and support researchers to use the ontology to improve clarity in their writing.