AutoWIG: Automatic Generation of Python Bindings for C++ Libraries

Most of Python and R scientific packages incorporate compiled scientific libraries to speed up the code and reuse legacy libraries. While several semi-automatic solutions exist to wrap these compiled libraries, the process of wrapping a large library is cumbersome and time consuming. In this paper, we introduce AutoWIG, a Python package that wraps automatically compiled libraries into high-level languages using LLVM/Clang technologies and the Mako templating engine. Our approach is automatic, extensible, and applies to complex C++ libraries, composed of thousands of classes or incorporating modern meta-programming constructs.


Introduction
Many scientific libraries are written in low-level programming languages such as Fortran, C and C++. Such libraries entail the usage of the traditional edit/compile/execute cycle in order to produce high-performance programs. This leads to low computer's processing time at the cost of high scientist's coding time. At the opposite, scripting languages such as Matlab, Octave [1, for numerical work], Sage [2, for symbolic mathematics], R [3, for statistical analyses] or Python [4, for general purposes] provide an interactive framework that allows data scientists to explore their data, test new ideas, combine algorithmic approaches and evaluate their results on the fly. However, code executed in these high-level languages tends to be slower that their compiled counterpart. Due to growing interest into data science combined with hardware improvements in the last decades, such high-level programming languages have become very popular in various scientific fields. Nevertheless, to overcome performance bottleneck in these languages, most scientific packages of scripting languages incorporate compiled libraries available within the scripting language interpreter. For instance, SciPy [5], a library for scientific computing in Python, is mainly based on routines implemented in Fortran, C and C++. To access compiled code from an interpreter, a programmer has to write a collection of special wrapper functions (aka wrappers). The role of these functions is to convert arguments and return values between the data representation in each language. Although it is affordable for a library to write a few wrappers, the task becomes tedious if the library contains a large number of functions. Moreover, the task is considerably more complex and time consuming if a library uses more advanced programming features such as pointers, arrays, classes, inheritance, templates, operators and overloaded functions. Cython [6], Boost.Python [7], SWIG [8], Rcpp [9] and F2PY [10] are considered as classical approaches for wrapping C, C++ and Fortran libraries to Python, R or other scripting languages but can only be considered as semi-automatic. In fact, while these approaches certainly ease the way of generating wrappers, the process of writing and maintaining wrappers for large libraries is still cumbersome, time consuming and not really designed for evolving libraries. Every change in the library interface implies a change in the wrapper code. Thus, developers have to synchronize two code bases that do not rely on the same kind of knowledge (i.e., C++ vs wrapper definition). To solve this issue, we provide an automatic approach for wrapping C++ libraries. The critical bottleneck in the construction of an automatic approach for wrapping compiled languages libraries is the need to perform the syntactic analysis of the input code, known as parsing. Once the code has been parsed, it is possible to analyze its result for code introspection. Code introspection is the ability to examine code components to know what they represent and what are their relations to other code components (e.g., list all methods for a given class). Introspection of parsed code can therefore be used to automate the generation of wrappers.
In the past, some solutions have been developed to automate the wrapping in Python of large C++ libraries such as Py++ [11] and XDress [12]. These tools require to write a priori complex scripts. These scripts are then interpreted a posteriori to edit the code abstraction and generate wrappers. Such batch processing approaches require high-level of expertise in these software and limit the ability to supervise or debug the wrapping process. The cost of the wrapping processes with such methodologies, although automatic, is thus considered by many developers as prohibitive. The goal 2 of AutoWIG is to overcome these shortcomings. AutoWIG proposes an interactive approach for the wrapping process and an extensible interface to easily incorporate bindings for other languages. In particular, the proposed Python interface provides an easy-to-use environment in which the user can benefit of code introspection on large libraries. The end-user can therefore analyze compiled library components, tests different wrapping strategies and evaluates their outcomes directly.
This paper is organized as follows. Section 2 provides an insight of requirements for an automated wrapping of compiled libraries. Section 3 presents the wrapping strategies that can be considered. Section 4 describes the main aspects of AutoWIG's architecture and current implementations. Section 5 presents C++ coding guidelines that must be respected in order to obtain the most automated wrapping workflow. Section 6 presents different results of AutoWIG application including in particular examples for performing partial wrapping of a library, the wrapping of template libraries and the wrapping of dependent libraries using an actual C++ statistical library set case study. Note that, for the sake of simplicity in the remainder of this paper, it is assumed that the low-level programming language in which compiled libraries are written is C++, the high-level programming language for interfacing libraries is Python and, that the wrappers generated are written using the Boost.Python C++ library. Section 7 will therefore be the occasion to discuss AutoWIG's extensibility or limitations considering other programming languages.

Requirements
Consider a scientist who has designed multiple C++ libraries for statistical analysis. He would like to distribute his libraries and decides to make them available in Python in order to reach a public of statisticians but also less expert scientists such as biologists. Yet, he is not interested in becoming an expert in C++/Python wrapping, even if it exists classical approaches consisting in writing wrappers with SWIG or Boost.Python. Moreover, he would have serious difficulties to maintain the wrappers, since this semiautomatic process is time consuming and error prone. Instead, he would like to automate the process of generating wrappers in sync with his evolving C++ libraries. That's what the AutoWIG software aspires to achieve. Building such a system entails achieving some minimal features: C++ parsing In order to automatically expose C++ components in Python, the system requires parsing full legacy code implementing the last C++ standard. It has also to represent C++ constructs in Python, like namespaces, enumerators, enumerations, variables, functions, classes or aliases.
Pythonic interface To respect the Python philosophy, C++ language patterns need to be consistently translated into Python. Some syntax or design patterns in C++ code are specific and need to be adapted in order to obtain a functional Python package. Note that this is particularly sensible for C++ operators (e.g., (), <, []) and corresponding Python special functions (e.g., call , lt , getitem , setitem ).
Memory management C++ libraries expose in their interfaces either raw pointers, shared pointers or references, while Python handles memory allocation and garbage collection automatically. The concepts of pointer and reference are thus not meaningful in Python. These language differences entail several problems in the memory management of C++ components into Python. A special attention is therefore required for dealing with references (&) and pointers (*) that are highly used in C++.
Error management C++ exceptions need to be consistently managed in Python. Python does not have the necessary equipment to properly unwind the C++ stack when exceptions are thrown. It is therefore important to ensure that exceptions thrown by C++ libraries do not pass into the Python interpreter core. All C++ exceptions thrown by wrappers must therefore be translated into Python errors. Moreover, this translation must preserve the name and content of the exception in order to raise an informative Python error.
Dependency management between components The management of multiple dependencies between C++ libraries with Python bindings is required at run-time from Python. C++ libraries tends to have dependencies. For instance the C++ Standard Template Library containers [13] are used in many C++ libraries (e.g std::vector, std::set). For such cases, it does not seem relevant that every wrapped C++ library contains wrappers for usual STL containers (e.g., std::vector< double >, std::set< int >). Moreover, loading in the Python interpreter multiple compiled libraries sharing different wrappers from same C++ components could lead to serious side effects. It is therefore required that dependencies across different library bindings can be handled automatically.
Documentation The documentation of C++ components has to be associated automatically to their corresponding Python components in order to reduce the redundancy and to keep it up-to-date.

Methodology
A major functionality of AutoWIG is its interactivity. Interactive processing have some advantages versus batch processing. In our context, such advantages are that an interactive framework allows developers to look at the abstraction of their code, to test new wrapping strategies and to evaluate their outcomes directly. In such cases, the user must consider the following 3 steps: Parse In C++ library, headers contain all declarations of usable C++ components. This step performs a syntactic and a semantic analysis of these headers to obtain a proper abstraction of available C++ components (see Section 4.2 for details). This abstraction is a graph database within which each C++ component (namespaces, enumerators, enumerations, variables, functions, classes and aliases) used in the library are represented by a node. Edges connecting nodes in this graph database represent syntactic or semantic relation between nodes (see Section 4.1 for details). Mandatory inputs of this workflow are headers and relevant compilation flags to conduct the C++ code parsing (see Section 6.1 for an example).
Control Once the Parse step has been executed, the graph database can be used to interactively introspect the C++ code. This step is particularly useful for controlling the output of the workflow. By default, Au-toWIG has a set of rules for determining which C++ components to wrap, selecting the adapted memory management, identifying special classes representing exceptions or smart pointers and adapting C++ philosophy to Python (see Section 4.2 for details). Such rules produce consistent wrapping of C++ libraries following precise guidelines (see Section 5 for details). This step enables the control of parameters to ensure a consistent wrapping of a C++ library, even if it does not fully respect AutoWIG guidelines (see Section 6.2 for an example).
Generate Once control parameters have been correctly set in the Control step, the next step consists in the generation of wrapper functions for each C++ component. This is also coupled with the generation of a pythonic interface for the Python module containing the wrappers (see Section 4.2 for details). This code generation step is based on graph database traversals and rules using C++ code introspection realizable via the graph database (e.g., parent scope, type of variables, inputs and output of functions, class bases and members). The outputs of the workflow consists in C++ files containing wrappers that need to be compiled and a Python file containing a pythonic interface for the C++ library (see Section 6.1 for an example).
If an interactive workflow is very convenient for first approaches with AutoWIG, once the wrapping strategies have been chosen, batch mode workflows are of great interest. Note that the usage of the IPython console [14] and its %history magic function enable to save an interactive workflow into a Python file that can be executed in batch mode using the python command line.

Architecture and implementation
In this section, we present the architecture of AutoWIG, describe the technical design underlying the concepts introduced in Section 3, and discuss in details the implementation choices. This section can be considered as technical and readers willing to focus first on AutoWIG big picture can jump to Section 5.

Data model
The central data model used in AutoWIG is an abstract semantic graph (ASG) that represent code abstraction and capture code components and their relationships. In computer science, an ASG is a form of abstract syntax in which an expression of a programming language is represented by a graph whose nodes are its components. This ASG principally contains nodes identified as file-system components (e.g., directories, files) or C++ components (e.g., fundamental types, variables, functions, classes, aliases). Syntactic and semantic relation between nodes are encoded either in edges (e.g., underlying type, inherited classes), edge properties (e.g., type qualifiers, base access) or node properties (e.g., method static or const qualifications, polymorphism of a class).

Plugin architecture
The software architecture is based on the concept of plugin (i.e., a component with a well-defined interface, that can be found dynamically and replaced by another one with the same interface). Implementations can therefore be provided by the system or from a third-party. Plugin architectures are attractive solutions for developers seeking to build applications that are modular, adaptive, and easily extensible. A plugin manager (PM) is a component in charge of discovering and loading plugins that adhere to a specific contract. As stated above, the wrapping process is decomposed into 3 steps. Each step is governed by a specific PM: • The parser PM is in charge of the Parse step. A parser plugin implements syntactic and semantic analyses of code in order to complete an existing ASG. Its inputs are an ASG (denoted asg), a set of source code files (denoted headers), compilation flags (denoted flags) and optional parameters (denoted kwargs). It returns a modified ASG.
• The controller PM is in charge of the Control step. A controller plugin enables workflow control. It ensures that code generated in the Generate step is flawless (e.g., ensure relevant memory management, hide undefined symbols or erroneous methods of class template specializations). Its inputs are an ASG and optional named parameters. It returns a modified ASG.
• The generator PM is in charge of the Generate step. A generator plugin interprets a node subset from the ASG for code generation. Its inputs are an ASG and optional parameters. It returns in-memory files (denoted wrappers) whose content corresponds to the generated code.
Considering these PMs, the workflow simply consists in passing the ASG step by step. Plugin implementation requires different levels of expertise (see Table 1). However, the registration of a new plugin in AutoWIG is simple due to the usage of the entry points mechanism provided by the Setuptools Python package. Moreover, the concept of AutoWIG plugin Workflow Plugin step manager implementation finality

Parse parser developer
Performs syntactic and semantic analysis of input code and produces an abstract semantic graph.

Control controller end-user
Regroups Python code editing the abstract semantic graph for workflow control.

Generate generator developer
Traverses the abstract semantic graph and generates code given code generation rules. Table 1: Plugin architecture of AutoWIG. Each step of the AutoWIG wrapping workflow is managed by a plugin manager that enables an easy control of the workflow outputs.
Considering the finality and underlying complexity of these plugins, implementations responsibilities are shared between AutoWIG developers and end-users. The parser and generator plugins are respectively concerned with compiled and scripting languages admissible bindings. Since such implementations require a high-level of expertise and a variety of tests, they mostly concern AutoWIG developers. On the contrary, controller plugins are library dependent and only require the manipulation of the abstract semantic graph via Python code. Thus, most of AutoWIG end-users are concerned with controller implementations.
manager enables an easy control of plugin implementation (see Section 6.3 for an example).
Parsers. Currently, AutoWIG provides one parser for C++ libraries. Parsing C++ is very challenging and mainly solved by compiler front-ends [15] that generate abstract syntax trees (ASTs). There are many benefits in using a compiler front-end for parsing C++ code. In particular, the parser implementation simply uses the compiler front-end for performing syntactic and semantic analyses of code rather than performing itself a custom analysis of an evolving and complex language. Therefore, the implementation mainly consists in AST traversals to complete ASGs, which is a far less challenging problem. Since the development of LLVM [16] and Clang [17] technologies, the AST, used for the compilation process, is directly available in Python via the libclang Python package. Our libclang parser was therefore designed using libclang: def libclang_parser(asg, headers, flags, bootstrap=True, **kwargs): header = pre_processing(asg, headers, flags, **kwargs) asg = processing(asg, header, flags, **kwargs) asg = post_processing(asg, flags, **kwargs) return asg This implementation consists in the three following steps: Pre-process During the pre processing step, header files (headers) are added in the ASG and marked as self-contained headers (see Section 5 for details). Note that in order to distinguish headers of the current library from headers of external libraries that are included by these headers, the headers of the library are marked as internal dependency headers (opposed to external dependency headers). This step returns a temporary header (header) that includes all given headers. This approach enables to parse only one header including all others and therefore prevents the multiple and redundant parsing of headers. Note that compilation flags (flags) are also parsed in order to save C++ search paths (given by the -I option).
Process During the processing step, the actual C++ code is parsed using the libclang Python package. The parsing of the temporary header (header) returns an AST. The ASG is updated from the AST by a process of enrichment and abstraction. The enrichment entails the addition of node properties (e.g., if a class can be instantiated or copied, if a method is overloaded) or edges (e.g., forward-declarations, backpointers to base classes, type of variables). The abstraction entails the removal of details which are relevant only in parsing, not for semantics (e.g., multiple opening and closing of namespaces).
Post-process During the post processing step, the C++ code is bootstrapped. Template class specializations are sometimes only declared but not defined (e.g., a template class specialization only used as a return type of a method). In order to have access to all template class specialization definitions, a virtual program in which definition of undefined template class specializations are ensured (e.g., using sizeof(std::vector< int >); for forcing std::vector< int > definition) is parsed. Note that this step induces new undefined template class specializations and must therefore be repeated until no more undefined template class specializations arise. This step is controlled by the bootstrap parameter that can be set to True, False or an integer corresponding to the maximal number of repetition of this operation (True is equivalent to bootstrap=float("inf") and False to bootstrap=0).
Controllers. By default, AutoWIG provides a controller for libraries respecting some recommended guidelines (see Section 5 for details): def default_controller(asg, clean=True, **kwargs): asg = refactoring(asg, **kwargs) if clean: asg = cleaning(asg) return asg This default implementation consists of the two following steps: Refactoring The refactoring of the C++ code is simulated in order to have a wrapping compliant with Python rules. In C++, some operators (e.g., operator+) can be defined at the class scope or at the global scope. But in Python, special methods corresponding to these operators (e.g., add ) must be defined at the class scope. Therefore during refactoring, all operators defined at the global scope, but that could be defined at the class scope, are moved as a method of the class.
Cleaning The cleaning operation removes useless nodes and edges in the ASG. A library often depends on external libraries and headers. There are therefore a lot of C++ components, defined by external headers, that are not instantiated and used by the C++ code of the actual library. First, in order to remove only these useless nodes, all nodes are marked as removable. Then, nodes defined by the internal library are marked as non-removable. Recursively, all dependencies of nodes marked as non-removable are marked as non-removable. Finally, all nodes still marked as removable are removed from the ASG. Some C++ libraries, such as armadillo [18], provide one self-contained header that only includes all library headers. In such cases all C++ components will be marked as external dependency and the clean parameter of the default controller should be set to False. Otherwise, without any instruction, all C++ components would be removed.
As soon as a C++ library does not respect the recommended guidelines of AutoWIG , the end-user has to implement a controller. As stated above, this controller will ensure that code generated by the Generate step is flawless. This step mostly consists in the addition of information concerning memory management, undefined symbols and erroneous methods of class template specializations or undesired C++ components in Python (see Section 6.2 for an example).
Generators. Currently, AutoWIG provides one generator for wrapping C++ libraries using the Boost.Python library. AutoWIG could generate wrappers in the C interface that extend the Python interpreter, but this low-level approach does not provide the abstraction needed to consider the requirements presented in Section 2. Thus, there are many benefits in using one of the semi-automatic approaches (e.g., Boost.Python, SWIG) within wrappers code.
In particular, AutoWIG uses the Boost.Python library to propose: • An automatic Python documentation using C++ documentation since documentation strings can be injected directly in wrappers.
• A consistent adaptation of C++ patterns to Python thanks to globally registered type coercions, possible manipulation of Python objects in C++, and an efficient overloaded function handling.
• A consistent memory management thanks to the definition of call policies which can be used to handle references and pointers.
• An automatic translation of C++ exceptions into Python errors using C++ exceptions handling and conversion into Python errors.
• An automatic management of dependencies thanks to automatic crossmodule type conversions.
The boost python generator was therefore designed to generate Boost.Python wrappers: def boost_python_generator(asg, nodes, module='./module.cpp', decorator=None, closure=True, prefix='wrapper_'): ... return wrappers Boost.Python uses extensively C++ class templates. However, class templates may use a huge amount of memory that can entail compilation problems. To avoid this kind of problems, our implementation mainly consists in dispatching wrapper code for C++ components (nodes) into different files: Module file A module file is created in the ASG and named according to the module parameter. This module file is associated with multiple export files (see below). Its content corresponds to the inclusion of wrappers defined in their associated export files within a BOOST PYTHON MODULE block. The compilation of this file produces a Python library containing all the C++ wrapped components. This library has the same basename as the module file prefixed by an underscore.
Export files Export files are created in the ASG within the same directory as the module file. Their content declares Boost.Python wrappers for associated C++ components. The export file of a C++ component is named by the concatenation of its prefix parameter and an unique identifier (created from the global name hash). As a consequence, Au-toWIG creates as many files as namespaces, enumerators, variables, bunch of overloaded functions and classes given in the nodes parameter. Note that enumerators, fields and methods wrappers are included in their parent scope export file. Moreover, in order to prevent name collision in Python, C++ components are wrapped in Python modules corresponding to their C++ scope.
Decorator file A decorator file, named according to the decorator parameter, is created in the ASG (if decorator is not set to None). The Boost.Python library does not provide a way to wrap aliases. Moreover, for serialization purposes, member (i.e., class scoped declarations) classes or enumerations must not be wrapped as class member but as module member. The decorator file therefore contains Python code to define aliases or produce member aliases for member classes or enumerations. Note that, in some cases, programmers want to decorate the C++ like interface into a more common Python interface. For this purpose, the decorator contains lists grouping for a template class all its instantiations. This allows to select easily all these instantiations in order to decorate them in the same way.
The code written in each of these files is generated using the Mako templating engine [19]. Template engines are classically used in Web frameworks to generate dynamic HTML pages. In our case, we use a template language to generate automatically C++ wrapper code from patterns found in the ASG. Changing code generation would require only to change the template code.
In order to provide a modular wrapper generation, templates must be encapsulated into classes. Class selection for previous files is governed by plugin managers (see Table 2). If the parameter closure is set to True, all the dependencies of the input C++ components (nodes) are also wrapped if they are not explicitly marked as non-exportable. To mark a node as non-exportable, its boost python export property has to be set to False (see Section 6.2 for an example). Note that the boost python generator does not respect the contract of generator plugins since it requires asg and nodes as inputs, in place of requiring only asg. In fact, this implementation is used in all other generator implementations that only needs to define abstract semantic graph (asg) traversals to compute nodes that will be considered as inputs of the boost python generator: • The boost python internal generator selects all nodes that are declared in headers marked as internal dependency headers.
• boost python pattern generator selects all nodes that match a reg-   [19] to compute the content of wrappers. The generation of wrappers is therefore customizable. ular expression denoted by the pattern parameter. This pattern parameter is set by default to ".*", so all nodes are considered.

C++ coding guidelines
Considering the requirements presented in Section 2, we recommend to use the following guidelines in order to benefit from the most automated wrapping procedure.
Parse self-contained headers. An AutoWIG parser requires self-contained headers. In other words, a header should have header guards, should include all other headers it needs, and should not require any particular symbols to be defined. Any non self-contained headers, should not be given to a parser but can nevertheless be considered during parsing using relevant search path flags (given by the -I option).
Use smart pointers. Let us consider a C++ template function declaration that returns a pointer,

template<class T> T* ambiguous_function();
There is a priori no way to know whether the pointer should be deleted or not by the caller. Boost and STL (Standard Template Library) libraries have introduced smart pointers as a design pattern to ensure correct memory management. Smart pointers (i.e., unique ptr, shared ptr and weak ptr) define how to manage the memory of a pointer, take the responsibility to delete the pointer, and thus remove these C++ ambiguities. In the following example, template<class T> std::unique_ptr< T > unambiguous_function(); the usage of std::unique ptr explicits the fact that the caller takes ownership of the result, and the C++ runtime ensures that the memory for T* will be reclaimed automatically. By default, AutoWIG considers that any raw pointer should not be deleted by the caller. If this is not the case, Boost.Python call policies can be set to ensure proper memory management.
Use C++ STL containers. In C++, containers can be expressed as C arrays (e.g., double array [10];) or pointers to arrays (double* ptrarray = array;). However, C++ components (e.g., variables, functions) that are using C arrays or pointers to arrays are not wrapped by the boost python generator due to ambiguity. In these cases, we recommend to use C++ arrays (e.g., std::array< double, 10 >) or dynamic arrays (e.g., std::vector< double >), which can be effectively wrapped using the boost python generator.
Derive from std::exception. In C++, exceptions provide a way to react to exceptional circumstances in programs, like runtime errors, by transferring control to special functions called handlers. The C++ standard library provides a base classstd::exception defined in the <exception> header -especially designed to declare objects to be thrown as exceptions. By default, for a Python interfaced C++ library, Boost.Python translates a C++ exception thrown by wrapped functions or module into a Python RuntimeError. To produce better error messages, AutoWIG ensures that any exception derived from the std::exception class is correctly translated (i.e., the error raised has the same class name and content).
Pay attention to static and const overloading. Let us consider the header presented in Figure 1. We here assume that the library has been wrapped using AutoWIG in an basic Python package.
>>> from basic import Overload >>> overload = Overload() Python is not designed for function overloading but Boost.Python provides some meta-programming mechanisms in order to perform dispatching and therefore enable function overloading in Python. Yet, considering static and const specifiers, few problems can arise: • Overloading a function with static renders all overloaded methods as static methods. If this entails strange usage of methods that are actually not static, it remains possible to call all overloaded methods.
>>> overload.staticness(overload) non-static >>> Overload.staticness(overload, 0) static Yet, if static overload has for first parameter an instance, a reference or a pointer to its parent class and all following parameters corresponding to another non-static overload, the non-static method will not be callable in the Python interpreter.

>>> overload.constness() const >>> overload.nonconstness() non-const
This can have serious side effects on the library usage. We therefore recommend to specify in the controller implementation which overload must not be considered, or to design headers considering this rule.

Figure 1:
A basic header used for illustrating overloading problems. The method void staticness(const unsigned int value) (resp. void constness() or void nonconstness() const) can be wrapped but as soon as static void staticness(const Overload& overload, const unsigned int value) (resp. void constness() const or void nonconstness()) is also wrapped, it will not be callable in the Python interpreter.
Use namespaces. Namespaces prevent name conflicts in large projects. Symbols declared inside a namespace block are placed in a named scope that prevents them from being mistaken for identically-named symbols in other scopes. The usage of a base namespace for each C++ library (e.g., std, boost) is highly recommended since it ease code introspection with Au-toWIG.
Document with Doxygen and Sphinx. For C++ documentation, Doxygen [20] is one of the most standard tool for generating formatted, browsable, and printable documentation from annotated sources. Its equivalent for Python is Sphinx [21]. Writing and verifying documentation is a fastidious task, and the redundancy between C++ and Python wrapped components must be limited. As illustrated below, AutoWIG parses the Doxygen documentation in the C++ code source (see Figure 1) and formats it into a Sphinx documentation. This documentation string is then injected into the Python components.
.. note:: The documentation is also used for illustrating the Doxygen to Sphinx conversions .. todo:: Any problem concerning method overloading should be added in this class ...

Results
In the following section, we present some examples using AutoWIG in order to emphasize particular aspects of the wrapping process. Therefore, most of the presented examples are truncated or modified for the sake of clarity and simplicity. Nevertheless, these examples are all fully available and reproducible on a notebook server (see Section 7.4 and supplementary materials for details).

Wrapping a basic library
We here aim at presenting the interactive wrapping workflow. For the sake of simplicity, we consider a basic example of C++ library (see header presented in Figure 2).

>>> import autowig
Assuming that the header is located at './basic/binomial.h', we parse it with relevant compilation flags. Since most of AutoWIG guidelines are respected, the default controller implementation is suitable.

>>> wrappers.write()
Once the wrappers are written on disk, we need to compile and install the Python bindings. Finally, we can use the C++ library in the Python interpreter

Wrapping a subset of a very large library
Sometimes, for a very large library, only a subset of available C++ components is useful for end-users. Wrapping such libraries therefore requires AutoWIG to be able to consider only a subset of the C++ components during the Generate step. The Clang library is a complete C /C++ compiler. Clang is a great tool, but its stable Python interface (i.e., libclang) is lacking some useful features that are needed by AutoWIG. In particular, class template specializations are not available in the abstract syntax tree. Fortunately, most of the classes that would be needed during the traversal of the C++ abstract syntax tree are not template specializations. We therefore proposed to bootstrap the Clang Python bindings using the libclang parser of AutoWIG. This new Clang Python interface is called Clan-gLite and is able to parse class template specializations. As for libclang, this interface is proposed only for a subset of the Clang library sufficient enough for proposing the new clanglite parser.
In order to wrap a library subset, the user need to define a controller implementation that specifies which C++ components will be considered during the Generate step. The controller implemented is the following: 1. Considering all user-defined types as non-exportable. This is done by setting the boost python export property of classes and enumerations to False (lines 3-6).
Assuming that the asg already contains all C++ components from the Clang library and that the clanglite controller has been defined in the Python interpreter. We need to register the clanglite controller as a controller implementation and then to select it.

>>> autowig.controller['clanglite'] = clanglite_controller >>> autowig.controller.plugin = 'clanglite'
After the generation and compilation of wrappers (using the same procedure as the one described in Section 6.1), it enabled us to propose a new parser implementation called clanglite. This has been done by writing Python code responsible for the traversal of the AST and the completion of an existing ASG. Contrarily to the libclang parser the AST traversed by the clanglite parser contains template classes and their specializations. This parser is therefore more efficient and is selected by default in AutoWIG, as soon as the ClangLite bindings are installed.

Wrapping a template library
A template library is a library where there are only template classes that can be instantiated. Wrapping such libraries therefore requires AutoWIG to be able to consider various C++ template classes instantiations during the Parse step. The Standard Template Library (STL) library [13] is a C++ library that provides a set of common C++ template classes such as containers and associative arrays. These classes can be used with any built-in or user-defined type that supports some elementary operations (e.g., copying, assignment). It is divided in four components called algorithms, containers, functional and iterators. STL containers (e.g., std::vector, std::set) are used in many C++ libraries. In such a case, it does not seem relevant that every wrapped C++ library contains wrappers for usual STL containers (e.g., std::vector< double >, std::set< int >). We therefore proposed Python bindings for some sequence containers (e.g., vector of the std namespace) and associative containers (e.g., set, unordered set of the std namespace). These template instantiations are done for various C++ fundamental types (e.g., int , unsigned long int, double) and the string of the std namespace). For ordered associative containers only the std::less comparator was used.
After the generation and compilation of wrappers (using the same procedure as the one described in Section 6.1), the user can hereafter use C++ containers in the Python interpreter. (1) Note that in order to have a functional Python package, some methods can be dynamically added to wrapped classes within modules. For instance, in the stl/vector.py module: • The iter method that enables iterations over a wrapped vector and its conversion to Python list is added to all std::vector class template instantiations wrapped.
• The str and repr methods that enable representations in the Python interpreter of vectors are added to all std::vector class template instantiations wrapped.
>>> v (-1, 0, 1) Moreover, the stl/ init .py module imports all Python objects of the stl/ stl.so library at its root to simplify class names (e.g., stl.VectorInt instead of stl. stl.VectorInt). Some additional features are automatically added in AutoWIG wrappers. For example, for functions returning non-constant references (e.g., int& operator[] (size type pos); of the std::vector< int > instantiation), an additional wrapping is done using the following decorator. This decorator is then dealt as an overloaded method in wrappers. In this particular example, it enables to define getitem and setitem methods in the stl/vector.py module.
If this decorator is not written, there is no way to use the setitem method in Python. Moreover, since Python users are more familiar with Python containers, each method taking a C++ container constant reference or copy as parameter try to convert automatically Python objects into the corresponding C++ container. Therefore, as illustrated below, Python list of integers are automatically converted into C++ vectors of integers.

Wrapping dependent libraries
StructureAnalysis is a set of libraries including statistical models for the analysis of structured data (mainly sequences and tree-structured data): • StatTool is a library containing classes for the parametric modeling of univariate and multivariate data (see Figure 3).
• SequenceAnalysis is a library containing statistical functions and classes for markovian models (e.g., hidden variable-order Markov and hidden semi-Markov models) and multiple change-point models for sequences (see Figure 4). The SequenceAnalysis library depends on the StatTool library.
These libraries have been extensively used for the identification and characterization of developmental patterns in plants from the tissular to the whole plant scale. Previously interfaced with AML (a home-made, domain-specific programming language), some work has been done to switch to Python. Nevertheless, the complexity of writing wrappers with Boost.Python limited the number of available components in Python in comparison to AML. One advantage of having a statistical library written in C++ available in Python is that developers can benefit from all other Python packages. As illustrated in Figures 3-4, this is particularly useful for providing visualizations for statistical model assessment using -for example -the Matplotlib [22] Python package.
The StatTool library. In order to wrap a C++ library, that will be used as a dependency by other libraries, the user needs to save the ASG resulting from the wrapping process. In the StatTool case, we first generate the wrappers (using the same procedure as the one described in Section 6.1). Then, we use the pickle Python package for serializing the StatTool ASG in the 'ASG.pkl' file. After the compilation of the wrapper, the user can hereafter use mixture models in the Python interpreter. For instance, we considered an example concerning the identification of preformed and neoformed parts in plants.
>>> from structure_analysis import stat_tool >>> his = stat_tool.Histogram("meri.his") The data (his) consists of the number of elongated organs of 424 shoots of wild cherry tree (Prunus avium). These shoots were sampled in different architectural positions (from the trunk to peripheral positions of the trees) and were representative of the full range of growth potential. The proximal part of a shoot always consists of preformed organs (i.e., organs contained in the winter bud). This preformed part may be followed by a neoformed part consisting of organs differentiated and elongated during the current growing season. We estimated mixture of parametric discrete distributions on the basis of this data. The number of components (2) was selected between 1 and 4 using the bayesian information criterion. >>> mixt.plot() As illustrated on Figure 3 the data are well fitted by the mixture model and: • The first component corresponds to entirely preformed shoots.
• The second component to mixed shoots consisting of a preformed part followed by a neoformed part.
The StructureAnalysis library. In order to wrap a C++ library that has dependencies, the user needs to combine the ASGs resulting from the wrapping of its dependencies before performing its own wrapping. In the Se-quenceAnalysis case, we construct first an empty ASG.

>>> asg = AbstractSemanticGraph()
Then, we use the pickle Python package for de-serializing the StatTool ASG (assumed to be serialized in the '../stat tool/ASG.pkl' file) and merge it with the current ASG. After the generation and compilation of wrappers (using the same procedure as the one described in Section 6.1), the user can hereafter use multiple change-point models [see 23, 24, for applications of multiple change-point models] in the Python interpreter. Multiple change-point models are used to delimit segments within sequences, for which the characteristics of variables (or vectors in the multivariate case) are homogeneous within each segment while differing markedly from one segment to another (e.g., piecewise constant mean and variance for a Gaussian change in the mean and variance model). For instance, we considered the classic example of well-log data [25,26,27].
>>> from structure_analysis import sequence_analysis >>> seq = sequence_analysis.Sequences("well_log_filtered_indexed.seq") The data (seq) consist of 4050 measurements of the nuclear-magnetic response of underground rocks. The data were obtained by lowering a probe into a bore-hole. Measurements were taken at discrete time points by the probe as it was lowered through the hole. The underlying signal is roughly piecewise constant, with each constant segment relating to a single rock type that has constant physical properties. The change points in the signal occur each time a new rock type is encountered. Outliers were removed before the data were analyzed. We estimated Gaussian change in the mean and variance models on the basis of the well-log filtered data. The number of segments (16) was selected using the slope heuristic [27] with a slope estimated using log-likelihood of overparametrized models ranging from 30 up to 80 change points.

Related work
Python and R are interpreted languages implemented in C. Like many other scripting languages, they provide a C API (i.e., Application Programming Interface) to allow foreign libraries implemented in C or in a language compatible with C (e.g., C++ or Fortran) to extend the language. This design feature has been a key element for the adoption of the Python language as a glue language, by providing efficient standard libraries implemented in compiled languages. This C API is designed to be stable but low-level. It does not provide support for object-oriented languages, and every type and function have to be manually wrapped. Note that if this approach is only efficient for exposing few functions and objects for developers, it is also at the basis of all other wrapper tools that generate C API code.
Several semi-automatic solutions (e.g., Cython, SWIG and Boost.Python) have been proposed to simplify and ease the process of wrapping large C++ libraries. SWIG [8,28] implements its own compiler that simplifies the process of wrapping large C and C++ libraries into a large number of different languages, and in particular R and Python. While SWIG is capable of wrapping most of the C++ features, it requires configuration files to include interface and conversion specifications. If there is a change in the library, these configuration files may become out of date. Cython [6] is another semi-automatic solution. Cython both enables Python users to compile Python code to C for optimizing execution of scientific code, and makes it possible for developers to call C or C++ code from Python. Cython is intensively used by several Python scientific libraries [29,30] that optimized critical part of their code by writing subparts of the package in Cython. It has been originally developed as part of the Sage project [2] to integrate numerous packages and libraries written in C, C++ and Fortran. However, Cython requires re-declaration of every class and function to wrap a C or C++ library. Finally, Boost.Python [7] and Rcpp [9] depend on metaprogramming to provide high-level abstractions (e.g., registration of classes and inheritance, automatic conversion of registered types and classes, management of smart pointers, C++ object-oriented interface to Python objects, ...). However, all the wrappers have to be written and keep in sync with the code of the library, and require lots of knowledge for developers.
Recently, several projects have provided automatic solutions for wrapping existing C++ libraries. They mainly rely on the same kind of architecture: • A parser or compiler that extracts information about the list of C++ functions or classes and their signatures.
• Strategies to convert this abstract view of the C++ code into manual or semi-automatic wrapper tools.
• The generation of the Python or R bindings based on these information.
The first difficulty is to parse large C++ code, and provide information on its structure. For this, tools like Doxygen or GCC-XML have been used. While Doxygen was first developed to automatically extract and render documentation of C++ libraries, it provides an XML representation of the C++ interface that can be used to describe functions and classes. Later, GCC-XML has been developed to offer a representation of a C++ library in XML using the GCC compiler. This tool has been developed for one of the first automatic library, CABLE, used to wrap the large visualization library VTK [31]. However, maintaining such a tool is complex and GCC-XML does not support C++ 11 standard. In AutoWIG, we use the LLVM/Clang technologies [17]  While they can easily be integrated in development workflow, it is not easy for developers to drive and specialize them using a scripting language. AutoWIG and Py++ provide a Python interface and offer introspection facilities for C++ libraries from Python. Like Py++, AutoWIG generates Boost.Python wrappers. However, Py++ depends on GCC-XML and requires to write a full parser and code generator in Python. It allows to implement a fully automatic system for developers based on their library design pattern, but is rather complex to implement.

Extensibility
As stated above, the plugin architecture of AutoWIG enables nonintrusive extensibility. This is of great interest when considering the addition of other source or target languages.
The addition of a target language principally consists in writing Mako templates [19]. As an example, let consider the R language. In order to be able to propose automatic R bindings for C++ libraries, the templates written could be based on the Rcpp [9] library. This is particularly interesting since Rcpp wrappers are quite similar to Boost.Python ones. As a matter of fact, the implementation of a r cpp generator is of highest priority regarding future work. The major difficulty encountered is the lack of some features in Rcpp (e.g., enumeration wrapping) and particular organization of R packages that must be taken into account.
The addition of a source language is more problematic since it could lead to addition of new proxy classes in the abstract semantic graph. For example, if the addition of the C, Objective C or Objective C++ languages should be relatively easy since it can be done using the Clang parser and C++ proxy classes, the addition of the Fortran language requires more work. In fact, for this purpose the Open Fortran Parser [33] could be used but it would require to reimplement the transformation of an abstract syntax tree to an abstract semantic graph. Moreover, any addition of a source language must be followed with the addition of target language generator since wrapper technologies are dependent of source languages. For Fortran, once the parser is implemented, this would require in addition to implement a generator potentially based on the F2Py [10] tool.

Toward a reference guide generator
In its current stage, AutoWIG translate the Doxygen [20] documentation into a Sphinx one [21] but only incorporates it in the wrappers. This means that both Doxygen and Sphinx tools must be used to respectively generates C++ and Python reference guides. Writing a generator that would generate Sphinx compatible files containing the C++ reference guide could be of great interest since it would allow to aggregate both C++ and Python documentation within the same tool.

Installation and usage
The installation of AutoWIG has been tested on Linux, MacOs X and Windows with Python 2.7. Nevertheless, the most effective wrapping process relies on the ClangLite extension, that has not yet been released for Windows. Note that wrappers generated with AutoWIG do not depend on AutoWIG and can be built on any operating system without regarding the operating system from which the wrappers were generated. On each of these operating system, AutoWIG binaries are available using the Conda package management system. Note that these binaries require to be installed in a specific environment that will be used for wrapper generation but not for compiling these wrappers since conflicts can occur between AutoWIG's requirements and those of the wrapped library.

Concluding remarks
AutoWIG greatly simplifies the process of incorporation of compiled libraries within scripting language interpreter. It provides the concept of ASG as C++ code abstraction data model. AutoWIG can therefore be used for C++ code introspection in a Python interpreter to discover and analyze C++ library components. This enabled us to propose an automatic generation of Python bindings for C++ libraries respecting some guidelines. This generation of Python bindings is also combined with the automatic generation of pythonic interface (e.g., use of special functions, error translation, memory management and Sphinx formatted documentation) using Mako, a template language classically used in web frameworks. Some compilation problems led us to also to consider a tool for parsing compiler errors that is particularly useful when considering the wrapping of class template specializations.
Note that a particular attention has been payed for AutoWIG architecture: • It has been designed as a library. This choice has been made since it enables interactive wrapping of compiled libraries in the high-level scripting language, Python. This interactivity use of AutoWIG increases the user ability to supervise or debug the wrapping process and reduces the level of expertise required to use this software.
• It has been designed as a plugin-oriented architecture. This choice has been made for extensibility purpose to enhance the adoption of Au-toWIG by developers by simplifying the integration process of external contribution. While only C++ to Python bindings have been implemented, AutoWIG plugin architecture eases the process of source (such as C ) or target (such as R) language addition.
In Section 6, we demonstrated the efficiency of using AutoWIG to wrap large and complex C++ libraries, such as Clang. Such an approach can be used to wrap other very large scientific libraries in an automatic way and enhance their diffusion to large communities of scientists that only use high-level scripting languages such as Python and R.