Understanding and Designing a High-Performance Ultrafiltration Membrane Using Machine Learning

Ultrafiltration (UF) as one of the mainstream membrane-based technologies has been widely used in water and wastewater treatment. Increasing demand for clean and safe water requires the rational design of UF membranes with antifouling potential, while maintaining high water permeability and removal efficiency. This work employed a machine learning (ML) method to establish and understand the correlation of five membrane performance indices as well as three major performance-determining membrane properties with membrane fabrication conditions. The loading of additives, specifically nanomaterials (A_wt %), at loading amounts of >1.0 wt % was found to be the most significant feature affecting all of the membrane performance indices. The polymer content (P_wt %), molecular weight of the pore maker (M_Da), and pore maker content (M_wt %) also made considerable contributions to predicting membrane performance. Notably, M_Da was more important than M_wt % for predicting membrane performance. The feature analysis of ML models in terms of membrane properties (i.e., mean pore size, overall porosity, and contact angle) provided an unequivocal explanation of the effects of fabrication conditions on membrane performance. Our approach can provide practical aid in guiding the design of fit-for-purpose separation membranes through data-driven virtual experiments.


Text S2 Ultrafiltration membrane fabrication
Three ultrafiltration membranes were prepared using the nonsolvent-induced phase inversion (NIPS) method.In brief, the homogenous casting solutions were prepared by dissolving certain amount of polymer and pore maker in organic solvent at elevated temperature of 60 ℃ under moderate stirring.For membranes with incorporation of nanomaterials, the nanomaterial was first dispersed into the organic solvent and sonicated for 2h to obtain the suspension solution of nanomaterial.Subsequently, the polymer and pore maker were added into the above solution for casting solution preparation.The homogenous casting solution was cooled to room temperature and degassed in a vacuum oven at ambient temperature for 1h.Then, the prepared solution was casted onto a glass plate using a Dr. Blade at a gap height of 200 µm.After that, the casted solution with the glass plate was immediately immersed in a water (fresh DI water) coagulation bath at room temperature to trigger the phase inversion.Detailed membrane fabrication conditions were summarized in Table S6.

Table S1
The input features and targets used in the data sets.

Input features
Unit Description of input features

P (SMILES) /
The base polymer used to fabricate membranes.

P_MW g mol -1
The molecular weight of base polymer.
δ mN m -1 The surface tension of base polymer.

P_wt% wt%
The weight fraction of base polymer in the casting solution used to fabricate membranes.

M /
The pore maker used in the casting solution.

M_wt% wt%
The weight fraction of pore maker in the casting solution.

M_Da Da
The molecular weight of pore maker.

A /
The additive type blended into the polymer matrix.

A_wt% wt%
The weight fraction of incorporated additive to the base polymer.

S /
The organic solvent used to prepare casting solution.
γ MPa 1/2 The solubility parameter of the organic solvent used.

Mean pore radius nm
The mean pore size of the fabricated membrane.
Overall porosity % The overall porosity of the fabricated membrane.

Contact angle degree
The static contact angle of the fabricated membrane.

Surface roughness nm
The root mean square roughness of the UF membrane.

TMP bar
The applied hydraulic pressure through membrane performance test.

C /
The contaminants used for removal efficiency measurement.

C_Da Da
The molecular weight of contaminants.

C_mg/L mg L -1
The initial concentration of contaminants in the feed solution.

F /
The model foulant used for membrane fouling evaluation.

F_mg/L mg L -1
The initial concentration of model foulant in the feed solution.

Water permeability LMH bar -1
The pure water permeability of the UF membrane.

Removal efficiency %
The rejection rate towards the contaminants.

Flux decline ratio %
The flux decline ratio when using solution containing contaminants as feed solution.
Flux recovery ratio % The flux recovery ratio after physical cleaning.
Reversible fouling ratio % The reversible fouling ratio of the membrane.
Table S2 Screening candidates for Encoder methods.

Candidates Working principle
Encoding methods

BackwardDifferenceEncoder
The mean of the dependent variable for a level is compared with the mean of the dependent variable for the prior level.For feature value, the James-Stein estimator returns a weighted average of: The mean target value for the observed feature value or the mean target value (regardless of the feature value).

OneHotEncoder
Mapping each category to a vector that contains 0 and 1.

BaseNEncoder
Base-N encoder encodes the categories into arrays of their base-N representation.A base of 1 is equivalent to one-hot encoding (not really base-1, but useful), a base of 2 is equivalent to binary encoding.N=number of actual categories is equivalent to vanilla ordinal encoding.

HelmertEncoder
The mean of the dependent variable for a level is compared to the mean of the dependent variable over all previous levels Table S3 Percentage of the missing value for each feature.Table S5 The range of candidate hyperparameters for each ML algorithm.

ML algorithm Hyperparameters Range
CatBoost Table S6 The detailed conditions for fabricating and testing UF membranes.

Figure S1 .
Figure S1.Data distribution of the numeric input features.

Figure S2 .
Figure S2.Types of additives involved in UF membrane as collected from the literature.

Figure S6 .
Figure S6.The SHAP plot for ML models accounting for membrane properties.(a) Removal efficiency.(b) Flux decline ratio.(c) Flux recovery ratio.(d) Reversible fouling ratio.Feature number (e.g., fp_498) denotes the feature position in the Morgan fingerprint vector.

Figure S7 .
Figure S7.The chemical structure of feature fp_1654.

Table S4
Predictive performance of different configurations of machine learning algorithms and encoder methods for the water permeability data set.

Table S7
Predictive performance of the ML models with membrane properties included as input features.

Table S8
Predicted and experimental results of the fabricated membranes.Predicted and characterized membrane properties of the fabricated membranes.
a Experimental results; b Predicted values Table S9