Computers and Chemical Engineering

A comprehensive computer-aided mixture/blend design methodology for formulating a general mixture design problem where the number, identity and composition of mixture constituents are optimized simultaneously is presented in this work. Within this approach, Generalized Disjunctive Programming (GDP) is employed to model the discrete decisions (number and identities of mixture ingredients) in the problems. The identities of the components are determined by designing molecules from UNIFAC groups. The sequential design of pure compounds and blends, and the arbitrary pre-selection of possible mixture ingredients can thus be avoided, making it possible to consider large design spaces with a broad variety of molecules and mixtures. The proposed methodology is ﬁrst applied to the design of solvents and solvent mixtures for maximising the solubility of ibuprofen, often sought in crystallization processes; next, antisolvents and antisolvent mixtures are generated for minimising the solubility of the drug in drowning out crystallization; and ﬁnally, solvent and solvent mixtures are designed for liquid–liquid extraction. The GDP problems are converted into mixed-integer form using the big-M approach. Integer cuts are included in the general models leading to lists of optimal solutions which often contain a combination of pure and mixed solvents.


Introduction
The importance of mixture and product design has long been established across the field of chemical engineering ( Gani and Ng, 2015 ). The design of candidate chemicals, be it a single molecule, a mixture or formulation, at minimum cost and time, is essential in improving process and product performance, but is challenging because it requires finding the optimal number, identities and compositions of mixture components and using nonlinear property models. The use of trial-and-error methods to find the most appropriate combination of chemicals and processing technology that will lead to the desired product/process attributes only allows a limited exploration of the design space due to the effort, cost, and time required. Thus, the development of systematic methodologies, tools, and strategies that combine predictive property models within a computer-assisted search for mixture and product design is crucial to sustainability in a highly competitive environment. Computer-Aided Mixture/blend Design (CAM b D) Gani, 2004 ) is a promising approach for * Corresponding author.
identifying suitable mixtures or blends that meet predefined target properties and optimise a given performance measure.
In recent work, Austin et al. (2017) proposed a mixture design approach based on COSMO-RS ( Klamt, 1995 ) and COSMO-SAC ( Lin and Sandler, 2002 ) to predict mixture thermodynamic properties (i.e., molecular volumes, sigma profiles and sigma moments). The mixture problem was decomposed into molecular design and mole fraction subproblems, where the design variables were projected on a lower dimensional space, that of the sigma moments for each unknown component in the mixture. The design problems were solved using derivative-free optimization algorithms described in earlier work from the same group ( Austin et al., 2016 ) which enabled the reduction of the search space. This design approach was applied to the design of solvents and binary solvent mixtures for liquid-liquid extraction and chemical reactions.
A four-step methodology was proposed by Cignitti et al. (2015) for the design of single compounds, mixtures or blends. Within this computer-aided product design framework the product needs and target properties were first defined and then converted into a CAMD or CAM b D formulation. Next, the CAMD/CAM b D problem was formulated as a MINLP problem which was finally solved by employing a decomposition-based algorithm proposed by Karunanithi et al. (2005) . The proposed approach was demonstrated though a solvent design case study for a single-stage liquid extraction process. The same four-step framework was adopted by Zhang et al. (2015) in a CAMD model, where both firstand second-order groups were considered simultaneously in the MILP/MINLP formulations in an attempt to avoid excluding optimal solutions from the feasible region. The proposed CAMD model was applied to solvent, polymer and surfactant design case studies.
A general CAM b D methodology was proposed by  for the design of mixtures and products, based on the use of Generalized Disjunctive Programming (GDP) ( Raman and Grossmann, 1994 ), for formulating the CAM b D problem. The design approach considered for the first time the simultaneous determination of the optimal number of mixture ingredients, their optimal identities and compositions. The desired components participating in the final mixture were selected from a list of candidate molecules. Since the number of ingredients in the optimal mixture were not defined a priori , optimal solutions may consist of a pure compound or a mixture of compounds. Different relaxation techniques of the GDP mixture problems, including the Big-M (BM) approach ( Nemhauser and Wolsey, 1999 ) and Hull Reformulation (HR) ( Lee and Grossmann, 20 0 0 ), were investigated in later work ( Jonuzaj and Adjiman, 2017 ) in order to address the challenges of formulating and solving generalised mixture problems. The design methodology and the two different solution strategies were applied to the design of solvent mixtures for separation processes.
The design and selection of working fluid mixtures for Organic Rankine Cycles (ORC) has received increased attention in recent years. A critical review on designing or selecting fluids and fluid mixtures for ORC is given by Linke et al. (2015) . Here, we discuss briefly recent studies that focus on the design/selection of fluid mixtures. Mavrou et al. (2015b) studied the use of different working fluid mixtures in solar ORC systems employing Flat Plate Collectors (FPC) with heat storage. The performance of several binary mixtures was assessed based on a multi-criteria methodology where several operating and design parameters of the solar ORC system were considered. The working fluid mixtures investigated in their work were either selected from predefined lists of common fluids used in ORC, or designed based on the methodology proposed by Papadopoulos et al. (2013) . In later work the authors ( Mavrou et al., 2015a ) conducted a sensitivity analysis for the se-lection of working fluid mixtures, in which the simultaneous selection of optimal mixtures and ORC system specifications under operating variability was addressed.
Molina-Thierry and Flores-Tlacuahuac (2015) also studied the selection of multicomponent working fluid mixtures for ORC. Within their approach, the optimal identities and compositions of fluids in the mixture, as well as the optimum processing conditions of the Rankine cycle were determined simultaneously. The working fluids that participate in the optimal mixture were selected from a predefined set of commonly used organic compounds. Interestingly, although the number of components in the mixture was not a decision variable and the design problem considered mixtures with three, six and eleven ingredients, the best solutions were obtained for binary mixtures or mixtures with four compounds. The optimal mole fractions of the other compounds in the mixture were found to be zero. This approach is different to the general mixture design methodology proposed by  , where the number of mixture ingredients is a decision variable. In most mixture design problems, setting the decisions variables to zero can create numerical difficulties due to the presence of highly nonlinear phase equilibrium relations (the natural logarithms of mole fractions, which are a frequent feature, are not defined at zero). If the proposed approach was applied to problems where singularities could not be avoided, then only solutions in which all the compounds would be present in the final mixture could be identified and thus, the overall performance would be affected. A mixture of six or eleven components, for example, would yield quite different results compared to an optimal binary mixture.
In a different approach, Lee and Mitsos (2017) proposed a hybrid two-stage methodology for the selection of working fluid blends for ORC that yield maximum recovery of cryogenic energy from a liquefied natural gas (LNG) evaporation process. Within their sequential approach, an MINLP problem, where optimal working fluids are selected from a large list of candidate compounds, is first formulated and solved with a stochastic solver. Next, the composition of a ternary fluid mixture and the operating conditions are optimized as an NLP problem, which is solved to global optimality using a deterministic algorithm. The proposed decompositionbased method gradually reduces the search space, which can ease the solution of complex mixture problems and yield promising solutions.
In the area of chemical product design, Mattei et al. (2012) conducted a study for the design of emulsion-based chemical products, such as pharmaceutical and household products. The proposed systematic methodology consisted of a seven-step hierarchical framework based on computer-aided methods and tools. In the first step, main and secondary consumer needs were identified and converted into target properties. In the following five steps the formulation was built up by adding, one-by-one, different classes of chemicals that satisfy a set of relevant target properties. In particular, suitable active ingredients (AIs) that meet the main needs of the product were selected from databases in step 2, followed by the identification of dispersed and continuous phase solvents in steps 3 and 4, respectively. The solvents were used to dissolve the selected AI and other chemicals present in the formulation, ensuring that the product forms a single liquid phase.
Step 5 consists of the selection of suitable emulsifiers (usually surfactants) to ensure the desired formation and stability of the emulsion. Suitable additives were finally added in step 6 to enhance the quality of the formulated product by satisfying the secondary needs. In the seventh and last step, the overall composition of all selected ingredients was determined. The proposed comprehensive approach was further investigated in later work by the authors ( Mattei et al., 2014 ), where the selection and design of surfactants, as well as the determination of some key properties (e.g., cloud point) were highlighted. The design methodology was illustrated through two case  JID: CACE [m5G;21:32 ] studies for the design of an emulsified UV sunscreen and an emulsified hand-wash.

ARTICLE IN PRESS
In the field of bioprocesses, Hernández et al. (2017) presented a methodology for selecting suitable organic wastes and waste blends for the optimal production of biofuels using anaerobic digestion. Within this study, the optimal blend of biomass wastes that yielded the best composition of biogas, which was used as a source of energy and to produce other chemicals (DME, methanol, ethanol), was determined. The design procedure consisted of two main parts: (a) the biogas production from waste via anaerobic digestion and (b) the use of the generated biogas as a fuel after removing CO 2 , or to produce biosyngas via dry reforming. The use of biogas as a source of energy or for producing other chemicals is affected by the different types of feedstocks and their composition. The optimal waste or waste blends were chosen from a variety of feedstocks including cattle slurry and cattle manure, pig slurry and pig manure, urban food waste, urban green waste and sludge, based on the cost of biomasses and the credit obtained from the digestate. Two different models were used to estimate the digestate price and they yielded two optimal feedstock sources for the production of biogas, including sludge and a blend of 65% cattle slurry and 35% urban food waste.
Formulating molecular and mixture problems without restricting the design space has proved challenging because it requires finding an optimal molecule or mixture (defining identities, number and compositions of ingredients) using nonlinear property models, which results in highly nonlinear problems with a large combinatorial space. In order to circumvent the combinatorial explosion in the number of possible designs and facilitate the solution of mixture problems, most studies in the literature have thus been focused on a reduced version of the general CAM b D problem.
In several methodologies, the identities of some or all components that participate in a mixture are selected from a given set of candidate molecules. Within these approaches there is a risk of excluding from the design space molecular structures which exhibit poor performance as pure compounds, but, when combined in a mixture, can lead to better performance. Only a few studies, such as the work of Austin et al. (2017Austin et al. ( , 2016 , Buxton et al. (1999) and Papadopoulos et al. (2013) , address the design of all mixture ingredients from functional groups rather than from a list, using predictive methods such as UNIFAC ( Fredenslund et al., 1975 ), COSMO-RS ( Klamt and Schüürmann, 1993 ) or COSMO-SAC ( Lin and Sandler, 2002 ). Within these more general methodologies only problems where the number of mixture ingredients is fixed to two (binary mixtures) has been considered. The restriction to binary mixtures results in a simpler design problem, but restricts the design space.
In the context of general mixture design,  addressed the design of multicomponent mixtures by proposing a generalised design approach where the number of components in the mixture is a decision variable. In this approach, however, the mixture ingredients were chosen from a given list of candidate compounds. This is consistent with some existing design practices, especially in the pharmaceutical industry where the list of organic solvents that can currently be used in a drug manufacturing is restricted due to toxicological reasons, as they cannot be completely removed from the final product ( Grodowska and Parczewski, 2010 ). Nevertheless, there is a long-term need to identify new components, as new environmental and safety regulations reduce the lists of allowed compounds ( REACH, 2017 ). The ability to design processing fluids (solvents, heat-transfer fluids) and products from functional groups can help to challenge established practices and identify innovative solutions.
The purpose of the present work is to create a comprehensive approach that bridges the concepts of molecular and mixture design by finding at the same time the optimal number of ingredi-ents, their proportions, which specific molecules should be used, and what atom groups are required. This is achieved by presenting a general CAM b D methodology for the design of molecules and mixtures based on Generalized Disjunctive Programming. Within this approach, all discrete decisions (the number of components, the composition of each component in terms of functional groups) are represented by Boolean variables. The optimal values of these decisions, together with continuous variables such as mixture composition, are made simultaneously in the proposed approach. By designing the pure components and mixture ingredients from functional groups, the selection of compounds from a limited set of choices is avoided, making it possible to consider larger design spaces that consist of a wide variety of molecules and mixtures. This means that we can take full advantage of the nonideality of mixing promising "good" candidates with "bad" candidate molecules to achieve better performance (e.g., higher solubility of paracetamol can be achieved with a binary mixture of a good solvent like acetone and a bad solvent like water than with pure acetone ( Granberg and Rasmuson, 20 0 0 )). Furthermore, by not fixing a priori the number of mixture constituents, the explicit evaluation of every choice of the number of components is avoided and the best solution is identified without specifying in advance whether a pure compound or a mixture is sought. The general mixture design problem expressed within a mathematical framework using GDP is converted into mixed integer form using the big-M approach prior to its solution.
The design methodology is demonstrated through three example problems (case studies 1a, 1b and 2) where the design of solvents/antisolvents and solvent/antisolvent mixtures for separation processes, namely crystallization and liquid extraction, is presented. In case study 1a, the design of optimal solvents and solvent mixtures that maximise the solubility of ibuprofen is discussed, whereas in case study 1b, the design of antisolvents and antisolvent mixtures that minimise the solubility of the drug is investigated. In case study 2, optimal solvents and solvent mixtures are determined to separate acetic acid from water in a single-stage liquid extraction process. Integer cuts are introduced in the general mixture formulations and a list of optimal solutions (i.e., list of mixtures with different number, identity and compositions of ingredients) is obtained for each problem.
The paper is organized as follows. In Section 2 , the mathematical formulations of the generalized mixture design problem are presented. In Sections 3 and 4 , the proposed approach is applied to three example problems, followed by the main conclusions of this work which are summarized in Section 5 .

The generalized mixture design problem
A general formulation of mixture design problems derived by integrating GDP into a CAM b D framework is described in this section. The GDP formulations where the molecules are designed from a set of atom groups are first presented, followed by the MINLP model obtained by reformulating the GDP problem via the big-M approach.

Definition of key index sets
The mixture problem is constructed in a systematic way and is presented here in the context of the formulation of a generalized model, where any number of components is designed up to a user-defined maximum value and the molecules are designed based on a list of atom groups subject to structural, pure component and mixture property constraints. The main design variables are the number of the components that participate in the mixture, their identities (which specific groups and how many groups are required) and their compositions.  JID: CACE [m5G;21:32 ] The following index sets are defined in order to derive the problem formulations: the set I = { 1 , . . . , N c } represents the number of components in the mixture, with N c being the total number of components. The final mixture consists of known fixed molecules that must be present in the mixture and of unknown components that are to be designed from atom groups. The number of fixed components in the mixture is defined as N , whereas the maximum number of components to be designed as N max . The designed components are described by the set N = { 1 , . . . , N max } .

ARTICLE IN PRESS
The chemical identity of each molecule (fixed and unknown) is defined using functional groups (building blocks such as CH 3 , COOH) that can be used in the calculation of relevant physical properties.
The groups are represented by the set G = { 1 , . . . , N G } , where N G is the total number of groups. The compounds to be designed can be branched or straight chain molecules (represented by groups g ∈ G ), aromatic molecules or single group molecules. The aromatic and molecular groups included in the set G are represented by the sets . . , N k } is a set used to define how many groups of a given type appear in a designed molecule. A set of 49 groups used previously for the design of solvents for chemical reactions ( Struebing et al., 2017 ) is employed here to describe the design of molecules and mixtures with the proposed formulation, and it is included in Table B.1 in Appendix B for completeness. It is noted that the generalized model is valid for different sets of atom groups (smaller/larger lists or different types) which can be used in a wide range of applications.

General formulation
In previous work  proposed a general GDP formulation for mixture design where the number, identity and compositions of mixture ingredients are optimised simultaneously, with components that participate in the mixture being selected from a predefined list. Following their derivation, the GDP model of the general mixture problem where molecules are designed from atom groups is given below: where f is the objective function to be optimised and is a function of continuous variables represented by the m -dimensional vector x , which denotes operating conditions, pure components and mixture properties (e.g., mole fractions, activity coefficients), or process variables (e.g., flowrates, temperatures). The set of inequalities r ( x ) ≤ 0 represents general constraints that must hold regardless of the discrete choices. The functions included inside the disjunctions are vectors of linear/nonlinear conditional constraints that depend on the discrete decisions, as represented by Boolean variables. Two different sets of disjunctions are included in this formulation. The first set involves disjunctions for designing each component from a set of functional groups, G , and the identity of a group g in a designed component i in the mixture is determined through a Boolean variable . The presence of groups in a designed component i is represented through inclusive OR disjunctions ( Grossmann and Trespalacios, 2013 ) ∨ g ∈ G Y N +1 ,g , which means that at least one group is present in the first designed component N + 1 (i.e., at least one molecule is designed). The conditional constraints included in the first set of disjunctions consists of chemical feasibility constraints (e.g., the octet rule to ensure that the designed molecule has zero valency Odele and Macchietto, 1993 ), represented by the vector h f i,g (x, ˆ y ) ≤ 0 ; chemical complexity constraints (e.g., constraints to ensure that the number of groups of specific type are within lower and upper limits), given by h c i,g (x, ˆ y ) ≤ 0 ; design constraints (e.g., constraints that ensure the designed molecule/mixture meets specific physical property or health and safety criteria), given by h d i,g (x ) ≤ 0 ; and pure component (e.g., constraints to predict critical temperature, pressure and volume, normal boiling and melting points, heat of fusion, etc.) and mixture property constraints (e.g., solubility, miscibility, selectivity and distribution coefficient functions), represented by h i, g ( x ) ≤ 0. The different sets of constraints are described in detail in the following sections. The binary variables ˆ y included in chemical feasibility and chemical complexity constraints define the types of molecular structures being designed (acyclic, monocyclic, bicyclic) and indicate the presence of branches in aromatic molecules, as will be described in the next sections. These constraints could also be formulated as embedded (inner) disjunctions ( Grossmann and Trespalacios, 2013 ) determined by Boolean variables ˆ Y and included within the disjunctions for the design of mixture ingredients from groups. This disjunctive representation, often seen in superstructures of PSE problems ( Caballero et al., 2006;Grossmann and Trespalacios, 2013;Yeomans and Grossmann, 1999 ), is avoided here by incorporating directly binary variables into the equations to formulate the discrete decisions.
The second set of disjunctions involves conditional constraints that depend on the number of mixture ingredients, n , and that are active when the corresponding Boolean variable ˜ Y n is True. ˜  JID: CACE [m5G;21:32 ] pend on the number of components in the mixture. Common examples of these functions include mixture property and/or process design constraints, such as phase equilibrium relations (e.g., solidliquid equilibrium (SLE), liquid-liquid equilibrium (LLE)), stability functions (e.g., miscibility equation Gani et al., 1991 to ensure that the designed solvent mixture is in a single liquid phase at given temperature and pressure), mass/mole balances of a process unit, etc., which are described in Sections 3 and 4 . The mole fraction, x i , of component i in the final mixture is set to be greater than a user-defined value x L i if the component is present in the mixture and zero otherwise. This prevents the generation of mixtures with unrealistic compositions, in which some components are present in very small quantities. Exactly one disjunction (i.e., exclusive OR disjunctions) for the number of designed components must be active; this is ensured by the expression

ARTICLE IN PRESS
propositional logic relations involving Boolean and auxiliary variables, and express the relationships between the disjunctive sets. A set of logic relations is included in this model to avoid degeneracy of the solutions by ensuring that no two identical components are designed in the mixture. Additional logic propositions are required to relate the Boolean variables for the number of designed components in the mixture, ˜ Y n , n = 1 , . . . , N max , to the Boolean variables for the identity of each group g in designed molecule i, Y i, g , i = N + 1 , . . . , N c , g = 1 , ..., N g . Similar relations are derived to relate the Boolean variables ˜ Y n to the discrete variables ˆ y that define the specific structures of the designed molecules. All logic relations can be translated into linear algebraic equations by replacing the Boolean variables Y i, g and ˜ Y n with the binary variables y i, g and ˜ y n , respectively, and applying Boolean algebra rules ( Raman and Grossmann, 1991;Williams, 1985 ). All the logic propositions and their equivalent algebraic equations are summarized in Table A.1 in Appendix A . A more detailed description of the derivation of logic relations for GDP mixture models and how they are converted into linear algebraic inequalities can be found in  .

Chemical feasibility and complexity constraints
The design of pure solvents from atom groups was recently studied by Struebing et al. (2017) and was successfully applied to chemical reactions. The chemical feasibility and chemical complexity constraints proposed in their study are modified and employed in this work for general mixture design and they are described in the following paragraphs.

The chemical feasibility constraints
to ensure that only chemically-meaningful molecules are designed. Such constraints include equations for the design of cyclic, acyclic, aromatic and single group molecules, the octet rule ( Odele and Macchietto, 1993 ) and the modified bonding rule ( Buxton et al., 1999 ). Thus, the following equation allows only the design of acyclic, monocyclic or bicyclic molecules: where the binary variable ˆ y i,t is used to define the type of molecular structure being designed, with ˆ y i,a = 1 indicating the design of an acyclic molecule, ˆ y i,m = 1 the design of a monocyclic molecule and ˆ y i,b = 1 the design of a bicyclic molecule. When designing cyclic molecules only aromatic compounds are considered in this work and thus the following constraint is derived based on the fact that exactly six aromatic groups are necessary to create a monocyclic molecule and ten aromatic groups are necessary to create a bicyclic molecule: where n i,g is the number of groups of type g that appear in component i . The variable n i,g should take only integer values and thus, it is converted into a pseudo-integer variable as follows ( Floudas, 1995;Struebing et al., 2017 ): where the binary variable ˆ y i,g,k relates to group g in molecule i and the maximum number of groups of type g that appear in a designed molecule i is set via a user-specified value, K . If the value of K is chosen to be 3 ( k = { 1 , 2 , 3 } ), for example, then the maximum number of occurrences of group g in a molecule is 7. The binary variable ˆ y i,g,k is related to the binary variable y i, g , which represents the presence or absence of a group g in a designed component i , as shown below: The following octet rule was proposed by Odele and Macchietto (1993) to ensure that the designed molecule does not have any free bonds: where v g is the valency of group g (cf. Table B.3 in Appendix B ). m is a vector of continuous variables, the values of which are determined by the design of acyclic and cyclic (monocyclic-bicyclic) molecules, using the relation Thus, m i takes the value of 1 if an acyclic compound is designed, 0 if a monocyclic compound is designed and −1 if a bicyclic compound is designed. It should be noted that Eq. (5) does not prevent a single-molecule group ( g ∈ G 1 ) from appearing alongside a combination of groups forming a chemically-feasible molecule since the valency of molecular groups is zero and thus the octet rule is satisfied. In order to avoid the design of such molecules, we employ the following constraints ( Struebing et al., 2017 ) to ensure that no other groups are included in the design when a single-molecule group is selected: where n max i,g is the maximum number of a group g that can appear in each designed component. The presence of single-molecule groups is also considered in the following modified bonding rule proposed by Struebing et al. (2017) , based on the modified bonding rule of Buxton et al. (1999) and bonding rule of Odele and Macchietto (1993) : Eq. (8) ensures that adjacent groups are connected by a single covalent bond since atom groups are defined so that double or triple bonds appear within atom groups (e.g., CH 2 = CH). and an upper bound on the number of groups that appear in the molecule, as shown below:

Chemical complexity constraints
where n min i and n max i are the minimum and maximum number of groups, respectively, in each designed component i . In addition, the number of groups of each type is constrained by a specific upper bound as shown in the following general form: is a linear function that represents the upper limit derived from the possible structures in which each group can participate, i.e., occurrences of a specific group in acyclic, monocyclic and bicyclic molecules. The linear functions Table B.4 in Appendix B for completeness. Further restrictions are imposed on the functional ( G F ) subclass by Eq. (11) : where n U i,g denotes the upper limit on the number of functional groups of type g in designed molecule i . The values of n U i,g are presented in Table B.5 in Appendix B . Eq. (11) ensures that acyclic and monocyclic components consists of only a limited number of functional groups, whereas such groups do not appear in bicyclic molecules. The chemical stability of the designed compounds is also improved by allowing at most one carbon-carbon double bond group g ( g ∈ G Db ) in the molecule, as shown below ( Struebing et al., 2017 ): The design of branched molecules is also considered in this work. In the case of cyclic molecules, the design of branched components requires the presence of an aromatic group g ∈ G Br = { aC , aCCH , aCCH 2 } that links the aromatic ring with a side chain.
The presence of each branched aromatic group g in the designed component i is denoted with a binary variable ˆ y i,g ( i ∈ I ; g ∈ G Br ), which is used to constrain the number of the branching groups in the designed molecule as follows ( Foli ć et al., 2008 ): Thus, if a branched monocyclic or bicyclic compound is designed (i.e., ˆ y i,g = 1 ), the number of branching aromatic groups of type g in molecule i , n i,g , takes a positive value; otherwise it becomes zero. In the specific case of branched monocyclic components, another binary variable, ˆ y i,m, aC , is introduced to define whether an aC group is present in the designed molecule. ˆ y i,m, aC takes the value of 1 if a monocyclic compound is designed (i.e., ˆ y i,m = 1 ) and an aC group is included in the set of groups that make up the molecule (i.e., ˆ y i, aC = 1 ). The relation between the three binary variables is expressed with the following algebraic form: or equivalently: In order to reduce the complexity of the designed branched molecules, the following constraints are derived to ensure that at most one side chain appears in a monocyclic compound (i.e., at most one aromatic group g ∈ G Br participates in the molecule), and monocyclic and bicyclic components consist of at most one and two aC groups, respectively: The groups that appear in side chains are classified into chainending, G CE , and non-chain-ending groups, G NCE . A chain-ending group can be connected to a non-chain-ending group or attached directly to the aromatic backbone of the designed cyclic molecule. Both chain-ending and non-chain-ending groups can be connected to the same aromatic group aCCH, leading to the design of a more complex molecule with two branches. The complexity of such compounds is reduced by enforcing that one of the free bonds of aCCH be connected to CH 3 , as shown in Eq. (18) : Further restrictions on the structure of the designed molecules are introduced by forcing chain-ending and non-chain-ending groups to appear at most three times in an acyclic compound and at most once in an aromatic one, as shown in Eqs. (19) and (20) , respectively: Note that ˆ y i, aCCH does not appear in Eq. (20) as relevant restrictions are imposed via Eq. (18) . Chemical complexity constraints are user-specified restrictions on the structure of the design molecules. They can be modified, reduced or increased based on heuristics or preferences of the user.

Design constraints
Design constraints ( h d i,g (x ) ≤ 0 ) can be derived based on physical properties or health and safety performance of the candidate molecule/mixture. In solvent mixture design, for example, the designed mixture ingredients must be in liquid phase, which is enforced by constraining the normal melting ( T melt, g ) and boiling ( T boil, g ) points of each solvent, as shown in Eqs. (21) and (22) , respectively. The melting and boiling points of the designed components (left-hand-side of Eqs. (21) and (22) ) are calculated based on the first-order group contribution (GC) method proposed by Marrero and Gani (2001) : where T max melt,g and T min boil,g are user-defined maximum melting and minimum boiling temperatures, respectively; 147.450 and 222.543 are reference values used in the GC method of Marrero and Gani (2001) ; T melt, g and T boil, g are the contributions of each group g to the melting and boiling points, respectively, and they are presented in Table B.3 in Appendix B .

Property constraints and physical model
Finally, the set of constraints represented by the vector h i, g ( x ) ≤ 0 consists of pure component or mixture property constraints, such as the liquid phase activity coefficient of each component in mixture (it participates in phase equilibrium relations, e.g., SLE, LLE), which is evaluated using group contribution methods (e.g., UNIFAC, Fredenslund et al., 1975 ). Some functions of the group contribution models depend on the identity of the group and thus are included in the disjunctions.

Reformulation of problem (G-GDP) as an MINLP via Big-M approach
The general GDP mixture problem (G-GDP) is converted into mixed integer form using the Big-M approach as shown below: F n ) such that when the binary variables y i, g and ˜ y n are zero the constraints are always satisfied. In order to avoid numerical difficulties arising from tight bounds and machine precision, the big-M parameters of nonlinear functions are relaxed bounds rather than exact bounds. Conditional constraints can be converted into mixed integer form either as equalities ( h (x ) = 0 ) or as two inequalities with opposite signs ( ±h ( x ) ≤ 0), without loss of generality.

List of optimal solutions
The main objectives of computer-aided molecular and mixture/blend design is to identify promising molecules, mixtures or blends for various applications. When designing molecules from atom groups, the optimal structure generated from the CAM b D model for a specific process or product could be a new compound that is difficult to synthesise or an existing but expensive compound. Thus, the second or even third optimal solution with different molecular structures may be required. Working towards generating a list of optimal molecules and/or mixtures, the following integer cut is introduced in the (G-BM) mixture design problem ( GAMS Development Corporation, 2017a ): where n sol i,g,l is the optimal number of group of type g in molecule i generated in previous solutions l = 1 , . . . , N l . The advantage of including integer cuts in the general mixture problem is that it helps to obtain a list of diverse solutions which can be different molecular structures, or mixtures with different number and/or identities of components.

Case study 1: Solvent mixture design for the crystallization of ibuprofen
Crystallization is a key separation and purification unit in the manufacturing process of many pharmaceuticals (intermediates and/or final API products) as it offers high purity and recovery of chemicals, while maintaining low energy consumption and low environmental impact ( Crafts, 2007 ). The careful design or selection of suitable solvents and solvent mixtures for crystallization is essential to ensure the high performance and economics of the process. Solubility is one of the key properties in crystallization and it is used as a main criterion for selecting molecules ( Frank et al., 1999;Jonuzaj and Adjiman, 2017; or designing molecules ( Austin et al., 2016;Karunanithi et al., 2006 ) that can be solvents and solvent mixtures for many crystallization processes, such as cooling and drowning out crystallization. Frank et al. (1999) reviewed several methods based on solubility to screen solvents from a database for the crystallization of organic solids. The design and selection of suitable solvents or solvent mixtures based on solubility for the crystallization of ibuprofen (a common anti-inflammatory drug), is a well-studied problem in the literature. Karunanithi et al. (2006) and Austin et al. (2016) have investigated the design of appropriate solvents and binary solvent mixtures for the crystallization of the drug, based on solubility, considering also other factors such as flammability, viscosity and toxicity of the solvent and its effect on crystal morphology.  have addressed the determination of multicomponent mixtures for maximizing the solubility of ibuprofen, with optimal solvents being selected from a predefined set of candidate compounds. In this work, we design optimal solvent and antisolvent (drowning out agent) mixtures with fixed and unknown number of mixture ingredients for maximising and minimising the solubility of ibuprofen, respectively, as described in the following subsections. The molecules are designed from atom groups (UNI-FAC groups) and the solubility of the drug is considered to be the main criterion for design. It is noted that this case study is employed to illustrate the ability to design "solvent" from atom groups without making any assumptions on whether the "solvent" contains one or more compounds. We do not take advantage of the full possibility of the design methodology, since aspects such as the environmental impact of the solvent or the ease of recovery are not taken into account. These metrics can be integrated into a CAM b D model, as shown in the work of Karunanithi et al. (2005) .

Case study 1a: Solvent mixture design for maximising the solubility of ibuprofen
In all types of crystallization, achieving high solubility of the solute in a solvent or solvent mixture is an essential feature for the successful performance of the process, as solvent capacity has a strong impact on the throughput that can be achieved given a fixed equipment size. In cooling crystallization, for example, which is a commonly used separation mode in the fine chemicals industry, the solid compound of interest dissolves in a solvent mixture at high temperature and it is then recovered as pure solid from the mixed solution after decreasing the temperature so that the solute becomes much less soluble and crystallizes out of the solution ( Karunanithi and Achenie, 2007 ). The selection/design of suitable solvents affects process efficiency and good solvents that yield high solubility of the solid compound are usually required, so that a sufficient amount of solid is dissolved without increasing the temperature to exceedingly high values. This may not be achieved if a small molecular design space is considered, e.g. if conventional compounds are selected from restricted databases.
In the proposed general approach, the design of molecules from a large set of atom groups (UNIFAC groups) is considered; this can  JID: CACE [m5G;21:32 ] generate a very large and diverse chemical space. We apply the mixture design methodology to the design of solvents and solvent mixtures that maximise the solubility of ibuprofen, x ibu (i.e., the mole fraction of ibuprofen in the mixture), at ambient conditions. A full description of the problem statement was given in previous work , where solvent molecules were selected from a small list, and here we present a brief summary of the main features of the case study. Ibuprofen is represented with atom groups as 4 × aCH, 3 × CH 3 , aCCH 2 , aCCH, CH, COOH and its solubility in the solvent mixture is calculated as follows ( Gmehling et al., 1978;Sandler, 1999 ):

ARTICLE IN PRESS
where γ ibu is the liquid phase activity coefficient of ibuprofen at composition x , mixture temperature T = 300 K and pressure P = 1 atm; T m and H fus are the normal melting point and enthalpy of fusion of ibuprofen and their values are taken to be 347.15 K and 25.5 kJ/mol, respectively ; R is the gas constant.
The following miscibility constraint for every binary pair of solvent molecules is employed in order to ensure that the solvents are mutually miscible in the proportions and at the temperature relevant to the mixture ( Gani et al., 1991;Smith et al., 2001 ): , with x i and x j being the mole fractions of components i and j , respectively, in the multi-component mixture. Although these constraints do not guarantee that the final mixture consists of a single stable liquid phase, it is much less likely to undergo phase separation if the binary pairs are stable. The activity coefficient of ibuprofen and of each designed solvent is estimated using the UNIFAC group contribution method ( Fredenslund et al., 1975 ), and the formulation proposed by Smith et al. (2001) , which is presented in a form convenient for programming, is employed in this work. These equations are given in  .

Problem formulations
In order to compare the proposed methodology with the standard (restricted) CAM b D problem, we first apply a simplified version of the model to the case study, in which the number of solvents in the mixture is fixed. Next, the generalized model is solved, with all the decision variables (number, identity and composition of mixture ingredients) being optimised simultaneously. Hence, two formulations are considered: Formulation 1aR, a restricted problem, where we design mixtures with one ( N = 1 − problem D1), two ( N = 2 − problem D2) or three ( N = 3 − problem D3) solvents; Formulation 1aG, a general problem, where the number of components in the mixture is bounded by a maximum number and we design mixtures with up to three solvents ( N max = 3 − problem D4). The possible components in the mix-  Table B.1 in Appendix B , and each designed molecule i consists of at most 7 groups ( n max i = 7 ). The objective here is to maximize the solubility of ibuprofen at 300 K and thus, the objective function is f (x ) = x ibu . In both formulations (1aR, 1aG), the general constraints, r ( x ), consist of the solubility function ( Eq. (24) ) and those equations of the UNIFAC model that are expressed in terms of ibuprofen only (i.e., that do not depend on the identity or number of the designed solvent molecules); thus the general constraints do not depend on the logic decisions. It is noted that in Formulation 1aR, nearly all UNIFAC equations are treated as general constraints. The conditional constraints that depend on the identity of groups in each molecule and/or number of solvents in the mixture are included in the appropriate disjunctions, based on which formulation considered.
In the restricted problem of Formulation 1aR, ˜ Y n is no longer a decision variable and thus the formulation consists only of disjunctions for assigning the number of groups of type g to components i in the mixture, which include the chemical feasibility, chemical complexity and design constraints ( Eqs. (1) -(22) ) described in Section 2.2 ; mixture property constraints, such as the stability function ( Eq. (25) ); and those equations from the UNIFAC model (e.g., the molecular van der Waals volume, r i , and molecular surface area, q i , of designed component i ) used to predict the activity coefficient of each solvent component. The conditional constraints can be converted into MINLP using the big-M approach as described in the previous section. Here, however, a simper MINLP reformulation can be constructed to facilitate the solution of the problem. As shown in Eq. (3) , the number, n i,g , of groups of type g ∈ G in a designed molecule i ∈ I is defined by using the binary variable ˆ y i,g,k ( i ∈ I, g ∈ G, k ∈ K ), whereas the rest of the equations within disjunctions are expressed in terms of the pseudo-integer variables n i,g and the binary variables ˆ y . These relations force the number of groups n i,g and all the relevant variables that depend on the identity of the groups to become zero when no atom group of type g is present in the designed component i (i.e., n i,g = 0 when ˆ y i,g,k = 0 ). Thus, the use of large big-M parameter values, which can sometimes lead to poor relaxations of the MINLP model, can be avoided in this simple case. Furthermore, the use of binary variables ˆ y i,g,k is sufficient to calculate the number of groups in the molecule and hence variables y i, g that represent the presence of a group in a designed component can be eliminated, reducing the size of the MINLP. Finally, logic relations are included in Formulation 1aR in order to eliminate degenerate solutions, so that the design of identical components as two distinct mixture components is avoided (see Table A.1 ).
The general problem of Formulation 1aG consists of both disjunctions for the identity and number of components, defined by the logic variables Y i, g and ˜ Y n , respectively, as described in Section 2.2 . The conditional constraints included in the first set of disjunctions are converted into mixed-integer form as described in Formulation 1aR. The second set of disjunctions consists of all the linear and nonlinear constraints that depend on the number of components in the mixture, such as Eqs. (1) -(22) , the mole fraction, x i , of each designed component in the mixture, the miscibility constraints (25) , as well as equations of the UNIFAC model used to predict the activity coefficient of the designed solvent molecules, and they are converted into mixed-integer form using the big-M approach. The big-M parameter values for the nonlinear functions are chosen to be relaxed bounds, so that numerical difficulties arising from tight bounds are avoided. Finally, all the logic relations that appear in Formulation 1aG and their equivalent algebraic form are presented in Table A.1 .

Results and discussion
All models reported in this paper were implemented and solved in GAMS ( GAMS Development Corporation, 2017b ) version 24.8.3, running on a single core of a dual 6 core Intel Xeon E5-1660 machine at 3.30 GHz. SBB ( Bussieck and Drud, 2001 Table 1 Optimal solubility of ibuprofen, x ibu , obtained when solving the restricted and generalised problems of case study 1a. The lower bound of solvent mole fractions in the mixture is set to x L all models are summarized in Table 1 . GAMS files for the resulting MINLP models of Formulations 1aR and 1aG can be found at https://www.zenodo.org/record/1154203 , as noted in the data statement given in Section 5 . In Formulation 1aR, where the design of mixtures with one, two and three solvents is considered, a pure solvent, dimethyl sulfoxide (DMSO), yields the highest solubility of ibuprofen, with a value of 0.5183. Its molecular structure is represented with a single UNI-FAC group (i.e., (CH 3 ) 2 SO, as shown in Fig. 1 a). DMSO is a colorless liquid with a low level of toxicity and it is miscible with many organic solvents and water ( Gaylord Chemical Company, 2014a ). It is a commonly used polar solvent that dissolves a wide range of organic materials, active pharmaceutical ingredients (API), polymers and several inorganic salts. The solubility of ibuprofen in DMSO reported in the Gaylord Chemical database ( Gaylord Chemical Company, 2014b ) is 376.2 g/100 g DMSO (i.e., x ibu = 0 . 5876 ) at 298 K, which is close to the solubility value found with UNIFAC.
Dimethylformamide (DMF) and ethyl pentafluorobenzene (5 ×aCF, aCCH 2 , CH 3 ) are identified as additional optimal solvent components when binary and ternary mixtures are designed to dissolve ibuprofen. The chemical structures of the optimal designed components are presented in Fig. 1 . Mixtures with two and three components yield lower solubilities than the pure solvent, with values of 0.5077 and 0.4955, respectively. The mole fractions of the second (DMF) and third (ethyl pentafluorobenzene) solvents are at the user-specified lower bound, which is set to be x L i = 0.1. Tight bounds are used for the minimum mole fraction of solvents, so that the generation of mixtures with unrealistically small compositions of some solvents is avoided. The performance of each mixture type (mixtures with one, two or three solvents) is markedly affected if a sufficient amount of solvents is added in the mixture.
Formulation 1aG ( N unknown) involves designing an optimal mixture with at most three solvents, and the results validate the optimal solutions obtained in the restricted models, by confirming that the highest solubility is achieved with DMSO and the mole fraction of ibuprofen is 0.5183.
The values presented for computational time in Table 1 correspond to the runs of Formulations 1aR and 1aG where the best solutions were found with SBB. In problems D1 ( N = 1 ), D2 ( N = 2 ) and D3 ( N = 3 ), the CPU time increases rapidly with the number of components, because of the increased size and complexity of the formulations. The general problem, D4 ( N ≤ 3), requires significantly less computational time than problem D3, where a ternary mixture is designed, and it appears to be more effective than enumerating all options in the restricted problem ( N = 1 , 2 , 3 ). The final solutions achieved cannot be guaranteed to be global which means that the reported computational times are affected by the starting points. However, similar initial guesses are given to all models (e.g., the same starting points are used in both D3 and D4), so that an initial comparison of the problems can be carried out.
Attempts were made to solve the problems to global optimality using BARON version 17.1.2 ( Tawarmalani and Sahinidis, 2005 ) and only the smallest problem (D1), where the number of designed solvents is fixed to one, is solved globally in 35.61 CPU seconds. The results verify the optimal solution obtained with the SBB MINLP solver, i.e., the maximum solubility is found to have a value of 0.5183 with DMSO as the optimal solvent, as shown in Table 1 . Convergence to global optimality was not reached in 60,0 0 0 CPU seconds when larger, more challenging problems for the design of binary and ternary mixtures were solved. The solutions reported with SBB for problems D2, D3 and D4 were not identified by the global solver.
The importance of designing mixtures from a large set of atom groups and not restricting the design space is shown in this work. It is observed that better results can be achieved when solvents are designed from groups instead of selecting them from predefined lists. The solvent mixtures generated in this study yield higher solubility values than the optimal solvent mixtures determined in  , where the optimal components (i.e., chloroform, methanol and ethanol) were selected from a set of candidate of 9 compounds. An increase of 48% in the solubility is observed when ibuprofen is dissolved in the ternary mixture from groups compared to its solubility in the optimal ternary mixture of chloroform, methanol and ethanol found in previous work.

Case study 1b: Antisolvent mixture design for minimising the solubility of ibuprofen
The use of common crystallization techniques, such as cooling or evaporative crystallization, may not be suitable for purifying heat-sensitive chemicals that decompose or react with solvents at higher temperatures, or chemicals that exhibit a weak dependence of solubility on temperature over a practical range of temperatures ( Berry et al., 1997 ). In these cases, drowning out crystallization, another common method for the purification of fine chemicals is used. It is based on the addition of a poor solvent, often referred to as a drowning-out agent or antisolvent, to a solute/solvent mix-  JID: CACE [m5G;21:32 ] ture, which reduces the solubility of the solute in the mixture and results in crystal formation. The choice of solvent/antisolvent mixtures in drowning out crystallization is an important decision which affects the success of the process. Water and some organic solvents (e.g., acetonitrile) are often selected as anti-solvents based on knowledge-based methods and database searches. As in most crystallization techniques, solubility is one of the key features of the process and it is considered as one of the main criteria for designing or selecting suitable drowning-out agents ( Frank et al., 1999 ). Only a limited number of modelling approaches have addressed the problem of choosing or designing suitable solvents and antisolvents for drowning out crystallization. Frank et al. (1999) presented a procedure to screen and select candidate solvents that dissolve organic solids, based on the use of existing methods for estimating the solubility of the solids (e.g., UNIFAC, Robbins chart, Hansen solubility parameters, etc.). Karunanithi et al. (2006) and Austin et al. (2016) studied the design of optimal solvent-antisolvent mixtures that result in maximum potential recovery in the drowning out crystallization of ibuprofen. These existing design methodologies have only addressed the design of binary mixtures that consist of a solvent and an antisolvent (i.e., one optimal antisolvent is added to a solution). The design of antisolvent mixtures has not been considered so far.

Problem statement
In this work, the design of antisolvents and antisolvent mixtures for minimising the solubility of ibuprofen at 300 K and 1 atm is addressed, given an initial solvent in which the solute is dissolved. The drug is crystallized from a pure solvent by adding an antisolvent or an antisolvent mixture to the solution. It is assumed that there are no impurities present. The solute should be highly soluble in the initial solvent and its solubility should decrease greatly when a pure anti-solvent or an antisolvent mixture is added. The objective of the design problem is given below as: where x ibu is the mole fraction of ibuprofen in the solventantisolvent mixture and it is calculated using the solid-liquid equilibrium relation which depends on the enthalpy of fusion and melting temperature of the solid and its liquid-phase activity coefficient, as presented in Eq. (24) . The activity coefficient is evaluated using the UNIFAC ( Fredenslund et al., 1975 ) group contribution model given in  . The miscibility of the solvent and antisolvent needs to be examined in order to ensure that the final mixture is in one liquid phase at the process conditions. The miscibility constraint for every binary pair of compounds presented in Eq. (25) is employed in this case study to ensure that each binary pair of molecules is miscible for the relative composition, temperature and pressure of interest, as an indication of the stability of the overall mixture.
The optimal solvent structure that participates in the solventantisolvent mixture is defined based on the findings of the previous study (case study 1a), where optimal solvents and solvent mixtures are designed to maximise the solubility of ibuprofen at ambient conditions. As shown in Section 3.1 , DMSO appears to be the optimal solvent that yields the highest solubility of ibuprofen, with a value of 0.518. Thus, the identity of the solvent component in the mixture is set to be DMSO, with a mole fraction value of 0.482. The fixed components in the mixture, N , consist of ibuprofen and DMSO. An upper limit of two moles is set on the total number of moles in the mixture ( N total = 2 moles), so that a reasonable amount of antisolvent(s) is added to the mixture.
The design specifications and process conditions included in previous problem case for maximising the solubility of ibuprofen (case study 1a) are also used in this study (cf. Section 3.1 ).

Problem formulations
Following the strategy discussed in case study 1a, the application of the proposed methodology to the design of antisolvent mixtures is based on two problem formulations: (i) Formulation 1bR, where the number of solvent and antisolvent components is fixed (restricted problem); (ii) Formulation 1bG, where the number of components in the mixture is not fixed but bounded by a maximum N max (generalized problem). In Formulation 1bR, mixtures with one antisolvent ( N = 1 -problem E1) and two antisolvents ( N = 2 -problem E2) are designed, whereas Formulation 1bG involves the design of mixtures with at most two antisolvents ( N max ≤ 2 -problem E3). The components in the mixture thus include ibuprofen (ibu) the fixed solvent component (DMSO) and the designed antisolvents ( c 1 , c 2 ). The antisolvent molecules are designed from a set of 49 atom groups, presented in Table B.1 in Appendix B .
Formulation 1bR consists of disjunctions for assigning a group g from the set G to a designed component i in the mixture that ensure that the relevant conditional constraints are active if the corresponding Boolean variable Y i, g is true. Formulation 1bG involves disjunctions for the identity and the number of components in the mixture, which are defined by the Boolean variables Y i, g and ˜ Y n , respectively. A detailed description of the GDP restricted and general problems is given in Section 3.1.1 where the problem formulations for maximizing the solubility of ibuprofen are presented. It should be noted that the disjunctive constraints that depend only on the identity of the fixed solvent component are expressed in terms of DMSO and thus, they are treated as general constraints, in the set r ( x ) ≤ 0.
The GDP formulations are converted into mixed integer form using the big-M approach and the MINLP models of the restricted and generalised problems can be accessed through https://zenodo. org/record/1154203 .

Results and discussion
Optimal solutions presented in Table 2 were obtained with SBB ( Bussieck and Drud, 2001 ). Problems E1 and E2 correspond to the models of the restricted problem (Formulation 1bR) and involve the design of one antisolvent ( N = 1 ) and a binary mixture with two antisolvents ( N = 3 ), respectively. Model E3 represents the general problem of Formulation 1bG which includes the design of a mixture with at most two antisolvents ( N ≤ 2).
The results of the restricted problems E1 and E2 show that the lowest solubility of ibuprofen, with a mole fraction value of 0.0025, is achieved with the addition of water (H 2 O) up to the maximum amount allowed to DMSO. Water is a poor solvent for many organic compounds (including ibuprofen) , and it is commonly used as an antisolvent in drowning out crystallization ( Frank et al., 1999 ). A ternary mixture of DMSO, water and acetic acid (CH 3 , COOH) yields a greater solubility value of 0.0072. The mole fraction of acetic acid is at its lower bound and thus, only a small amount of acetic acid is added to the mixture, which accounts for small differences in the corresponding solubilities of the binary and ternary mixtures. The best solution obtained in restricted problem E1, where the addition of water yields the lowest solubility of ibuprofen, is also found in the generalized problem E3 where the number of antisolvents in the mixture is unknown.
The values of the CPU time presented in Table 2 correspond to the runs where the best solutions were found. It can be observed that the computational performance of the general model E3 is more advantageous than solving problems E1 and E2 sequentially. We note, however, that the problems were solved locally and thus the computational cost of each problem is affected by the initial guesses given to the MINLP solver. An overall comparison of all models could be conducted if the problems were solved globally,  Table 2 Optimal solubility of ibuprofen, x ibu , obtained when solving the restricted and generalised problems of case study 1b. The lower bound of solvent mole fractions in the mixture is set to x L The mole fraction of the fixed solvent, DMSO, is set to 0.2410.  Table 3 Ranking of optimal solutions obtained for the general mixture problem (D4) of case study 1a when including integer cuts. The set i represents the designed solvent molecules and x i is the mole fraction of each solvent in the mixture ( i ∈ { c 1 , c 2 , c 3 }). The lower bound on solvent mole fractions in the mixture is set to  Table 4 List of optimal solutions obtained for the generalised mixture problem (E3) of case study 1b when including integer cuts. The set i represents the designed antisolvent molecules and x i is the mole fraction of each antisolvent in the mixture ( i ∈ { c 1 , c 2 }). DMSO is set to be the solvent component in the mixture.

List of optimal solutions
The design of molecules from a large list of atom groups may sometimes lead to the design of components with undesired properties (e.g. high toxicity values) or new molecular structures that are difficult to synthesize in a laboratory. In order to address these issues, the integer cuts presented in Eq. (23) are introduced in the generalised problem formulation, so that lists of optimal solutions with different molecules and mixtures are generated. The ten best solutions obtained with the general models of both crystallization problem cases (D4 and E3) are ranked in descending order and presented in Tables 3 and 4 , respectively.
In case study 1a, the list of optimal solutions includes mainly binary solvent mixtures (apart from the first solution), with DMSO being the main component for dissolving ibuprofen. The small differences in the solubility values among the results obtained are due to the small amount of the second designed solvents added in each mixture (mole fraction values of each solvent are at the lower bound). It can be observed that the second best solution is achieved with the same binary mixture (i.e., a mixture of DMSO and DMF) obtained in model D2 of the restricted problem. The mixture of DMSO, DMF and ethyl pentafluorobenzene found in model D3 yields a solubility value of 0.4955 which is lower than the solubilities obtained in the first ten solutions, and thus no ternary mixtures are included in the generated set. Ethyl pentafluorobenzene, though, is found to provide the third best solution in a binary mixture with DMSO. JID: CACE [m5G;21:32 ] In case study 1b, the top ten solutions consist of optimal mixtures with one or two antisolvent components. The first two optimal solutions with the lowest solubility values validate the results obtained in models E1 and E2 of the restricted problem (see Table 2 ). It is seen that extremely low solubility values can be achieved when water is added to the DMSO + ibuprofen mixture (see first four solutions of Table 4 ), in contrast with the higher solubility values found for ibuprofen in different antisolvent components (see last six solutions of Table 4 ). DMF is found to be the second best pure solvent which yields a solubility value of 0.4628. There is a large difference in solubility between the first and second pure solvents and binary mixtures provide alternative high performance options.

Case study 2: Separation of acetic acid from water by liquid-liquid extraction
The problem of identifying optimal solvent mixtures from a given set of candidate compounds to separate acetic acid from water in a single stage liquid extraction process provides o more challenging case study due to the participation of all mixture components in the liquid-liquid equilibrium equation.
Liquid-liquid extraction has been proven to be an economical and environmentally friendly method for the purification of acetic acid from an aqueous stream ( Alkaya et al., 2009;Austin et al., 2017;IJmker et al., 2014;Karunanithi et al., 2005 ). As shown in the single stage extraction unit illustrated in Jonuzaj and Adjiman (2017) , a feed mixture of acetic acid and water and a solvent mixture are first mixed and are then separated into extract and raffinate phases in a settler. The design specifications given in Seader et al. (2011) are used in this case study: an aqueous solution with 2.5 mol % acetic acid and a feed flowrate of 708 kmol/h at 298 K and 1 atm are considered. The raffinate phase should contain at most 0.3% mol of acetic acid.
The objective here is to design an optimal solvent mixture that minimises the solvent-to-feed ratio and also achieves a large fraction of acetic acid and small fraction of water in the extract phase, as shown in the single objective function presented below: where F S and F are the solvent and feed flowrates, respectively, while x E, a and x E, w are the mole fractions of acetic acid and water, respectively, in the extract phase. The problem involves liquidliquid equilibrium (LLE) relations between extract and raffinate phases, as described in Eq. (27) : where x E, i and x R, i are the mole fractions of component i in the extract and raffinate streams, respectively, and γ p, i ( p ∈ { E, R }) is the liquid phase activity coefficient of component i in stream p at temperature T , composition x p and pressure P . The activity coefficient values are calculated using the modified UNIFAC (Dortmund) model ( Gmehling et al., 1993; and the relevant equations are presented in Jonuzaj and Adjiman (2017) . The mole balances for each component in the system are described in Eqs. (28) -(31) , respectively: x F,a F = x M,a M (28) where M is the flowrate of the output stream of the mixer, and E and R are the extract and raffinate flowrates, respectively; x p, i represents the mole fraction of component i in stream p . The molecular structures of acetic acid and water are represented through the groups CH 3 , COOH and H 2 O, respectively, whereas the solvent molecules are designed from a set of 25 atom groups ( G = { 1 , . . . , 25 } ) presented in Table B.2 in Appendix B .
Each solvent structure i consists of at most 10 groups ( n max i = 10 ). We note that a reduced list of groups is used in this case study, compared to the set of 49 groups employed in the crystallization problem, due to the smaller number of available interaction parameters in the publicly available Dortmund UNIFAC database.

Problem formulations
Once again, a GDP formulation, where the number of components is fixed (restricted problem, 2R), and a formulation with unknown number of solvents (generalized problem, 2G), are presented in this section. In Formulation 2R, mixtures with one (problem F1), two (problem F2) or three solvents (problem F3) are designed and in Formulation 2G, mixtures with at most three solvents (problem F4) are identified. The components present in the final mixture include acetic acid ( a ) and water ( w ), which are the fixed components in the mixture, and the designed solvent molecules ( c 1 , c 2 , c 3 ).
The general constraints in the GDP models consist of the LLE relations and mole balances that hold for acetic acid and water, as well as the UNIFAC model equations to predict the activity coefficients of the fixed components, as they do not depend on the logic decisions (i.e., identity and number of the designed molecules). All the equations that run over the designed components in the mixture ( c 1 , c 2 , c 3 ), such as chemical feasibility and chemical complexity constraints, design constrains, LLE equations, mole balances and UNIFAC equations for the designed solvents, depend on the identity of the groups in each molecule and on the number of solvents in the mixture. Thus, these conditional constraints are included in appropriate disjunctions for assigning each group g to component i and for the number of designed components in the final mixture, and they are active when the corresponding logic variables Y i, g and ˜ Y n are true. The disjunctive constraints are converted into mixed integer form using the big-M approach.
As discussed in case study 1a, the number, n i,g , of each group g in a designed molecule given in Eq. (3) and several constraints derived for the design of solvent molecules (described in Section 2.2 ) are formulated via the binary variables ˆ y . These relations force the number of groups n i,g and all related variables to become zero when a group g is not present in the designed component. Therefore, the use of big-M parameters which may lead to poor relaxations is avoided when transforming these equations in the MINLP models.
The logic propositions and the corresponding linear algebraic inequalities presented in Table A.1 are included in the resulting GDP and MINLP formulations. The MINLP models and all the parameter values (e.g., big-M parameters) included in the problems can be found at https://www.zenodo.org/record/1154203 .

Results and discussion
The results of the restricted (Formulation 2R) and generalized (Formulation 2G) problem statements are summarized in Tables 5 and 6 . In Formulation 2R (F1-F3), the minimum objective function (with a value of 1.3129) and the highest recovery of acetic acid in the extract phase (with a value of 93.21%) are obtained with a pure solvent, pentafluorobenzyl acetate (5 ×aCF, aCCH 2 , CH 3 COO). The best binary and ternary mixtures generated include pentafluorobenzyl propionate (5 ×aCF, aCCH 2 , CH 2 COO, CH 3 ) and tetrafluoro-  JID: CACE [m5G;21:32 ]   Problem 4-methylbenzyl acetate (4 ×aCF, aCCH 3 , aCCH 2 , CH 3 COO) as the second and third designed solvent molecules, respectively, and they result in worse objective function values than the pure component, due to the increased flowrates ( F S ) of the solvent mixtures (see Table 6 ). As shown in Fig. 2 , similar molecular structures are designed in all cases as optimal solvent components in the mixture, with aCF being the most frequent group in all molecules. Therefore, the percentage of second and third solvent molecules do not affect significantly the separation process and the results obtained with the optimal pure component and the optimal solvent mixtures are very similar. We note that pentafluorobenzyl acetate and pentafluorobenzyl propionate are not commonly used as solvents, and currently very expensive ( Synquest Laboratories, 2010; JID: CACE [m5G;21:32 ]  List of optimal solutions obtained for the general mixture problem (F4) of case study 2 when including integer cuts. i represents the solvent molecules and x i is the mole fraction of each solvent in the mixture ( i = c 1 , c 2 , c 3 ).
The computational times presented in Table 5 for Formulation 2R (F1-F3) increase with the number of components in the mixture. The solution of Formulation 2G (F4) requires less CPU time than solving Formulation 2R with three solvents (F3). Similar initial guesses were given in each problem, so that the solution times are not affected significantly by the choice of starting points.
It can be observed that the identity of the solvent molecules has a significant impact on the performance of the process. The molecules and mixtures designed in this work yielded better solutions than the solvents selected from a given list of 8 compounds presented in previous work ( Jonuzaj and Adjiman, 2017 ). In particular, the solvent mixture flowrate ( F S ) is decreased by 48% when pentafluorobenzyl acetate is used to separate acetic acid from water, which may lead to reduced capital and operating costs.

List of optimal solutions
Integer cuts, presented in Eq. (23) , are included in the generalised problem (F4) of case study 2 and a list of twelve optimal solutions is presented in Tables 7 and 8 . The list contains pure solvents, as well as binary and ternary solvent mixtures. Only ten of the solutions were generated by adding the integer cuts to F4; the second and third solutions (in brackets) of the list were obtained by solving problems F2 and F3 (see Table 5 ), but were not found in the first ten runs of the generalised problem when introducing integer cuts. This indicates that on at least two occasions, the SBB solver converged to a local solution.
In most solutions in the list, solvent mixtures containing aromatic compounds that consist of one fluorobenzene ring and one or two side chains, are designed. If fluorobenzene rings are excluded from the design of monocyclic compounds, then a ternary mixture with acyclic solvents, butanone (2 × CH 3 , CH 2 CO), methyl butanone (2 × CH 3 , CH, CH 3 CO) and pentanone (2 × CH 2 , CH 3 , CH 3 CO) in position 13, yields an optimal solution with higher objective function and flowrate values, but better acetic acid recovery than any of the top 12 solvents. Hence, a trade-off between reduced costs and improved separation and recovery of acetic acid can be achieved with different solvent designs.

Conclusions
This study provides a general and comprehensive mathematical formulation for the design of optimal molecules and mixtures based on a computer aided mixture/blend design (CAM b D) framework. Within this systematic approach the number of mixture components, the identities of the components and their compositions are determined simultaneously. The general formulation makes it possible to consider large and diverse design spaces by designing the optimal molecular structures of all mixture components simultaneously from a large set of atom groups (building blocks), and it focuses on identifying the best pure or blended components, without specifying a priori whether one seeks a single compound or a mixture. Disjunctions are employed to model the effect of discrete decisions (i.e., the number of mixture ingredients, the identities and numbers of groups in a molecular structure and of molecules in the final mixture), on the problem equations, leading to a GDP formulation.
The proposed approach has been applied successfully to different case studies in the context of separation processes, where optimal solvent, antisolvents, solvent and antisolvent mixtures are  designed from UNIFAC groups. In the first problem studied (case study 1a), solvents and solvent mixtures were designed to maximise the solubility of ibuprofen which is an important feature of any crystallization mode. Next, antisolvents and antisolvent mixtures were designed to minimize the solubility of the drug in drowning out crystallization (case study 1b). The third example problem explored the design of optimal solvents and solvent mixtures for separating acetic acid from water in a single stage liquid extraction process (case study 2). In applying the methodology to the case studies, formulations with a fixed number of mixture ingredients (restricted problem -Formulations 1aR, 1bR, 2R) and an unknown number of mixture ingredients (generalized problem -Formulations 1aG, 1bG, 2G) have been considered. Disjunctions for the identity of groups/molecules and for the number of solvent components were transformed into mixed-integer form using the big-M approach. All problems were solved using a local MINLP solver in reasonable time (less than 6 CPU minutes). In all design problems (case study 1a, case study 1b and case study 2), the solution of the generalized formulations required less time than the enumeration of all possible mixture "sizes" ( N = 1, 2, 3) and thus, appears to be more effective than solving the restricted formulations repeatedly.
Lists of promising solutions -i.e., optimal mixtures with different number (pure, binary, ternary, etc.) and identity (e.g., DMSO, DMF, etc.) of ingredients -were obtained by introducing integer cuts in the generalized models of the generalized formulations. Mixtures were found to provide alternative high performance options that are not accessible with pure solvents alone.
This works shows that improved solutions can be achieved by designing components from atom groups rather than from lists and datasets of candidate molecules. Further verification of the designed molecular structures should, however, be conducted, especially when new molecules are designed. The GDP formalism provides a useful framework for formulating and solving mixture problems, making it possible to consider large design spaces and to avoid numerical difficulties arising from the absence of groups in a molecular structure and of components in the final mixture (i.e., with a mole fraction of zero).
In the context of separation solvent and process design, future work will focus on extending the problem formulations to provide a more realistic assessment of performance (e.g., include environmental and safety metrics Karunanithi et al., 2005;Karunanithi et al., 2006 ). Future investigations will also be directed at tackling product design applications, especially formulated products ( Mattei et al., 2012;Yunus et al., 2014 ) which exhibit an increased number of mixture ingredients and where the general mixture design formulation may be especially beneficial: the explicit evaluation of every choice of number of components can be avoided in this way, making it possible to consider larger design spaces. In addition, the development of suitable algorithms to achieve global solutions should be considered, in order to carry out a comprehensive comparison of the proposed formulations.  . . . , N c Relations between ˜ Y n and Y i,g ¬ ˜ . . . , N c Relations between ˜ Y n and ˆ y Let ˆ P i,t ≡ ( ˆ y i,t = 1) and ˆ [m5G;21:32 ]  Upper bounds provided by linear functions ˆ f that define if each group g is allowed to appear in a designed molecule i . Groups with an * have been deactivated in the studied problems due to lack of sufficient data for accurate predictions.