Privacy preserving distributed learning classifiers – Sequential learning with small sets of data

Background: Artificial intelligence (AI) typically requires a significant amount of high-quality data to build reliable models, where gathering enough data within a single institution can be particularly challenging. In this study we investigated the impact of using sequential learning to exploit very small, siloed sets of clinical and imaging data to train AI models. Furthermore, we evaluated the capacity of such models to achieve equivalent performance when compared to models trained with the same data over a single centralized database. Methods: We propose a privacy preserving distributed learning framework, learning sequentially from each dataset. The framework is applied to three machine learning algorithms: Logistic Regression, Support Vector Machines (SVM), and Perceptron. The models were evaluated using four open-source datasets (Breast cancer, Indian liver, NSCLC-Radiomics dataset, and Stage III NSCLC). Findings: The proposed framework ensured a comparable predictive performance against a centralized learning approach. Pairwise DeLong tests showed no significant difference between the compared pairs for each dataset. Interpretation: Distributed learning contributes to preserve medical data privacy. We foresee this technology will increase the number of collaborative opportunities to develop robust AI, becoming the default solution in scenarios where collecting enough data from a single reliable source is logistically impossible. Distributed sequential learning provides privacy persevering means for institutions with small but clinically valuable datasets to collaboratively train predictive AI while preserving the privacy of their patients. Such models perform similarly to models that are built on a larger central dataset.


Introduction
The application of artificial intelligence (AI) (i.e., machine/deep learning models) within the clinical decision making process, also referred to as precision medicine, has become a research topic of increasing interest [1,2]. The rising number of published AI models in the literature that support diagnosis/prognosis is a testament to this.
The most common way to train AI models, often referred to as "centralized training", is when the data is sourced from a single centralized database and the training of the classification AI model is local to a single machine. This approach however is not ideal during collaborative efforts where data sharing and centralization is strictly regulated by legal and ethical considerations. For instance, the General Data Protection Regulation (GDPR) and Health Insurance Portability and Accountability Act (HIPAA) act as safeguards to protect the privacy of patient data. Distributed learning (i.e., federated learning, ensemble learning, or sequential learning) offers a promising solution to this centralization barrier, allowing development and validation of predictive models while preserving the privacy the patient data. Federated learning, the most conventional form of distributed learning, involves a master server that coordinates the initialization and aggregation of learning within a consortium of partners [3,4]. Ensemble learning consists of training independent models on local data, and each model's predictions on new data are grouped to a single global prediction [5].
Sequential distributed learning is an extension of distributed learning enabling the partners of a consortium to iteratively update a model with their respective local datasets. The last model in the queue is the final model [6,7]. These approaches are particularly appealing in the cases of small datasets (e.g., low clinical volume or rare diseases) in which the amount of data available to a single center is below the threshold to develop robust and generalizable AI. Since the performance and the robustness of an AI model is directly related to the number of samples on which it was trained and validated [2], the scarcity of data coupled with lengthy procedures required to centralize data can derail initiatives to develop clinical decision support tools.
While distributed learning has been well established with applications in multicentric studies [3,6,[8][9][10], and previous work on ensemble distributed learning on small local datasets has indicated promising performance [11,12], the impact of the network data-scape (e.g., small batch sizes) has yet to be systematically investigated for sequential distributed learning. In this work, we investigate the performance of Stochastic Gradient Descent (SGD) based classifiers trained using a sequential distributed learning approach. We evaluate the influence on model performance when using micro batch sizes (as small as n = 1) to replicate cases where a participating institution (or partner) may only provide a single case record to the consortium to support training. To this extent we examine the influence of micro batch sizes on sequential learning model performance compared to the equivalent (i.e., same data) centralized model using a variety of Radiomics and clinical open-source datasets.

Model optimizers -stochastic gradient descent
This work also explores stochastic gradient descent (SGD) which is an iterative optimization method. It is a commonly used optimization technique applied to various machine and deep learning algorithms [13]. Upon each training iteration, the SGD optimizer fine-tunes the algorithm, minimizing the error of the model. As opposed to standard gradient descent optimizers, where the error is reduced over the entirety of the training dataset, SGD randomly selects small training batches and approximates the gradient for the random batch. The iterative process of batch selection is performed by randomly shuffling the dataset and minimizing over all batches, offering the advantage of avoiding local minima and reducing model optimization time.

Challenges in medical image analysis
Multicentric studies are needed to develop robust AI and to demonstrate the clinical relevance of imaging AI. This kind of studies face many challenges such as: 1) Data collection (described in section "Medical data sharing"); 2) Data heterogeneity, caused by the difference in acquisition and reconstruction settings amongst the different medical centers [14]. To ensure better model building in a heterogeneous domain, the raw data and/or the features derived from it must be harmonized [15,16]; 3) And Inter-reader variability, the automation of manual tasks, such as organ and lesion delineation, requires to learn from ground truth masks delineated manually by experienced radiologists [17].The difference in experience and trainings of the clinicians leads to a variation on the ground truth delineations, which in turn represents a challenge in segmentation model training and validation [17].

Medical data sharing
Despite the efforts made to publicly share medical data in public repositories, including, the cancer imaging archive (TCIA; https://www. cancerimagingarchive.net/), and the NIH BioLINCC (https://biolincc.nh lbi.nih.gov/home/), among others [18], data sharing remains very difficult, especially in low prevalence rare diseases. Within the context of rare-diseases, data sharing limitations can hinder rare disease research and development, as well documented cases may be limited in number. This proves especially difficult in situations where a single institution may want to extract hidden insights using machine learning approaches, such as a diagnostic or prognostic biomarker. Initiatives, such as the European Joint Program on Rare Diseases (EJP RD; http s://www.ejprarediseases.org/index.php/about/), began to address this issue and has illustrated the potential of data in driving precision medicine and accelerating rare disease diagnosis/prognosis.
The importance of datatype (e.g., genotype, phenotype and endotype among others) in modeling patients with rare diseases, is well demonstrated within the literature [19,20]. However, de-identification of patient data prior to sharing, does not necessarily guarantee preservation of privacy [21] as patient personal information can potentially be re-identified from the de-identified features (e.g., up to 99.98% of the American population in any dataset can be identified using only 15 demographic features) [22]. This risk increases as the dimensionality of data increases. In order to protect patient sensitive information, data acquisition and sharing is therefore tightly regulated by ethical and legal constraints [23]. In this context, distributed sequential learning is an important approach to facilitate data analysis across institutions while preserving data privacy.

Distributed learning
Distributed learning was first applied to clinical decision support systems in 2013 [2]. Distributed learning infrastructures enable the efficient training of machine/deep learning models by isolating training data in respective local databases of each collaborative center. Distributed learning can be applied in various forms. In federated learning, each of the collaborators connects to a master server that initializes and updates learning. After initialization, each collaboration center trains a portion of the model on their local data then provides the resulting model weights to the master server. The master server in turn aggregates the weights, updates the model, and shares the updated model weights with the collaborators within the network. Each collaborator then retrains the local models based on the updated weights and sends them back to the master server to close the loop, which operates until a convergence threshold is reached [3,4]. Another form of distributed learning is sequential learning, differing in learning management architecture: 1) learning orchestrated by a cloud server such as the Personal Health Train (PHT; https://www.dtls.nl/fair-data/personal-health -train/) [9,24], or 2) decentralized learning as applied in Chained-Distributed Machines Learning (C-DistriM) [6]. Each iteration in a sequential learning process corresponds to an update of the model from one collaborator. This type of learning is slower when compared to federated learning, where the learning is parallel, but is not subject to the logistical concerns (mainly related to the variation of the internet connection speed across the partners) related to federated learning [25].
In distributed learning data is not visible to the researchers. For this reason, researchers have to rely on the statistical information derived from the local data to build a global model. To reach an optimal performance some modeling steps such as feature selection and inference have to be adapted [4,26,27]. In this regard, the literature has demonstrated both federated learning and distributed learning achieve a comparative performance to traditional centralized learning approaches [4,6,10,28,29].

Machine learning classifiers
In this work we consider three machine learning classifiers, depicted in Fig. 1, in distributed sequential settings: 1. Support vector machines (SVM), a supervised learning algorithm, applied mostly towards classification, but also for regression and the detection of outliers. SVMs work by establishing two parallel hyperplanes, separating the different classes of the feature space. The best fit is established as the one that maximizes the distance between both hyperplanes [30]. To accommodate data variability (i.e., linearly separable or not) various kernels such as linear, radial basis function have been established to optimize the distance between hyperplanes [31]. In this work we applied linear SVM techniques. 2. Logistic regression is a statistical method used for analyzing a feature space in which there are one or more independent variables that identify a predefined outcome. The assumption is that multiple linear regressions of the independent variables are transformed using a logit function to form a conditional probability of the outcome variable. Logistic regression assumes that the feature space possesses a linear relationship with the outcome, making it a linear algorithm with a nonlinear transform [30]. 3. Perceptron, a single-layer neural network used for linear classification. The hidden layer mimics the design of a network of neurons within the human brain. Similarly, a perceptron network predicts classifications based on patterns within a series of input features correlating to a specific outcome [30]. The process is as follow: 1) the input features are multiplied by their corresponding weights, that are randomly selected at the first iteration, 2) sum the results of step (1) to generate a weighted sum, 3) calculate the outcome (output) by applying the weighted sum to the activation function, that maps the outcome into values ranging between two predefined values (labels) such as [0,1].

Data
In this study, four open-source datasets were collected from two different public repositories: the UC Irvine Machine Learning Repository (https://archive.ics.uci.edu/ml/index.php) and cancerdata.org (https://www.cancerdata.org/). The characteristics of these datasets are illustrated in Table 1 [36].
These datasets were used to train and test the selected machine learning classifiers. Each of these datasets consists of a feature space corresponding to a binary outcome, as illustrated in Table 1. In addition to these four data sets, we extended our analysis to test sequential distributed learning on deep neural networks applied to smaller sets of (MNIST) dataset [37].
The breast cancer Wisconsin dataset consists of features calculated from a digitized image of a fine needle aspirate (FNA) of a breast mass [38], and an outcome defined as "malignant" or "benign". ANOVA test was used to perform select the robust features. 50% of the features were discarded based on ANOVA's F-ratio, reducing the total number of features from 30 to 15.
The Indian liver dataset [33] consists of a set of demographics and clinical features (all patient records were collected from North East of Andhra Pradesh, India) for patients with liver disease. A Pearson pairwise feature correlation was performed. Highly correlated features (i.e., with Pearson correlation coefficient > 0.7) were discarded, reducing the total number of features from 11 to 8. Four patients had missing values corresponding to one feature, the missing data were imputed based on the mean value of the corresponding feature vector.
The NSCLC-Radiomics dataset [34,35] consists of radiomics features extracted, using RadiomiX (Radiomics/Oncoradiomics SA, Liège, Belgium) based on quantitative image analysis technology, from gross tumor volumes (GTV) of standard CT images corresponding to 422 patients. Gross tumor volume segmentations were performed by trained oncologists. Of the 421 records, 44 subjects were discarded during the radiomic features calculation phase. The discarded subjects had GTV segmentations with multiple unconnected volumes. In these cases, signature feature "compactness" cannot be calculated since it is defined for a single volumetric object. The outcome (survival) in NSCLC-Radiomics was converted into two-year survival (binary). New feature selection was not performed, the four predictive features reported in the original study [39] were used.
Data from Ref. [36] is referred to as The Stage III NSCLC dataset. The dataset consists of a combination of clinical, dosimetric features and clinical outcome (survival) for lung cancer patients. Missing data were imputed, using the scikit-learn (version 0.22) imputation transformer. The imputation was based on the mean values of each feature. No feature selection was performed for this dataset, instead predictive features reported in the original study [36] were used to train the models.
As a means to mitigate classifier scaling bias, features in all training sets were independently normalized to the interval [0,1] and the same normalization factor was applied to their respective validation and test sets. The primary objective of this work was to assess model performance variability across unique training scenarios in centralized vs. distributed SGD training approaches. Improving the prediction performance of for models trained with these datasets was out of the scope of this work.

Experiment design
Three commonly used machine learning (ML) classifiers were selected to conduct this study (Support Vector Machine (SVM), Logistic Regression, and Perceptron). Each classifier satisfies the inclusion criteria: 1. The classifier can be trained in a sequential manner, 2. The classifier has previously been applied and accepted in medical image analysis scientific community [8,40].
Each dataset was split into training, validation, and test sets (60% training, 20% validation, and 20% testing). Training, validation, and test sets were stratified based on the outcome label to guarantee equal percentage of positive and negative samples on each subset. The validation data was used for hyperparameter tuning and the test set was used evaluate the model performance.
For each dataset, we simulated four training cases: Centralized: a centralized learning approach where the entirety of the training set was used by a single partner to fit the modelused as the reference for distributed learning approaches. Case 1. a distributed learning approach composed of 2 partners (2 subsets), randomly distributed between each partner (67% and 33% of the dataset).
Case 2. a distributed learning approach representing an extreme case where each partner contributes with a single datapoint (i.e., from a single patient). In this case the model was updated at each iteration incorporating one additional datapoint. Case 3. a repeat of Case 2, with the exception of randomly shuffling the dataset to observe the effect the order of the training data (of medical centers) has on the resulting model.

Centralized model
For each classifier and dataset pair, we trained a central model and used it as the reference to compare performance of each corresponding distributed model. Hyperparameters tuning was performed for optimal performance. The primary tuned parameter specified the regularization parameter used to calculate learning rate, herein referred to as alpha (α), and number of iterations (epochs). Default values with respect to the classifier were used for remaining parameters such as tolerance, and penalty. To tune the hyperparameters we defined a set of alpha values ranging between 1e − 7 and 1, as illustrated in Fig. 2. For each classifier: 1. The validation set performance (determined by the area under the curve (AUC) of the Receiver Operating Characteristic curve (ROC)), was estimated for each parameter α.
2. The resulting models were compared based on their performances.
3. The best α parameter was then selected according to the model comparison outcome. 4. Each classifier was subsequently retrained using the best corresponding α parameter.
5. Finally, the performance of the global model was then evaluated against the appropriate test set.

Distributed learning models
Hyperparameters tuning was also performed for distributed learning cases, as illustrated in Fig. 3. Model optimization was performed over 5 key steps: 1. For each simulated partner, estimate the validation set performance (AUC) corresponding to each parameter α. 4. Retrain each classifier using the best corresponding α parameter in a sequential manner. 5. Finally, the performance of the global model was then evaluated against the appropriate test set.
Finally a pairwise comparison of the final aggregated models AUC values corresponding to each classifier and dataset was performed using DeLong tests [42].
To consider the impact of shuffling the local training datasets, and their size on model performance in cases where partners have multiple datapoints each, we extended the experiments conducted in this study. To sufficiently realize these experiments, we sourced the Modified National Institute of Standards and Technology (MNIST) dataset [37], a commonly used large dataset suited to test deep neural networks. Data description, the different data splits, model architecture, and results are available in Supplementary Materials (Section A1).

Results based on dataset
The combination of 4 datasets, training cases and 3 model architectures resulted in 48 uniquely trained models. Table 2 summaries model performance for each architecture and training use case reported as the AUC and a 95% confidence interval (CI). Models trained with the breast cancer dataset outperformed models trained with other datasets in all model architectures. The Indian dataset had notably better performance in specific training use cases and model architectures when compared to classification performance for either NSCLC dataset. Logistic regression and perceptron architectures had improved performance over SVM for classification in either NSCLC dataset. Fig. 4 depicts the AUC values corresponding to each study case for each pair of classifier and dataset. For each classifier, the derived AUC values per use case (centralized, Case 1, Case 2, Case 3) trended with a high degree of similarity but were not identical. Shuffling and local dataset size variability produced observable differences in the ROC curves. However, the Pairwise DeLong tests [42] were used to compare the ROC curves for each of the training scenarios and found no statistically significant differences (p-values > 0.05), as summarized in Table 3 organized by classifier and training dataset. Furthermore, each model trained in a distributed fashion did not differ significantly from

Results based on classifier architecture
In most cases the average absolute difference in the AUC values were blow 5%. The average difference in the AUC of the centralized training over the Breast cancer, Indian Liver, NSCLC-Radiomics dataset, Stage III NSCLC datasets was reported as 0.67%, 1.75%, 8.33%, and 6.24%, respectively. Differences in the AUC for each training scenario versus each classifier has been summarized in Table 1, Supplementary Materials (Section C). The maximum average difference of the AUC values for the distributed learning classifiers per dataset increases up to 8.74%, 7.66%, and 8.64% for case, Case 2, Case 3, respectively. It should be noted that in extreme cases certain scenarios had AUC differences above 10%, highlighted in Table 1 of the Supplementary Materials (Section C). These results suggest the optimal classifier chosen is highly dependent on the characteristics of the dataset.

Discussion and future work
High quality datasets with sufficient training datasets are required for machine learning models to converge and generalize [2]. When working with patient data, there are important ethical and legal considerations to be managed, when considering sharing patient data between institutions.
The results presented in this work demonstrate that sequential distributed learning on small, isolated datasets (including extreme cases of model updated using a single datapoint at a time) achieves equivalent performance to models trained in conventional centralized learning. Similar conclusions were observed in the case of multiclass classification using the MNIST dataset [37]. We observed, by applying a pairwise DeLong [42] comparison, that the AUC for distributed learning models do not differ with statistical significance from models trained in centralized scenarios.
The results in Tables 2 and 3 and the ROC curves indicate that there is a difference in the performance of different classifiers, and this difference can vary from one dataset to another. We noted that the average  AUC difference between the classifiers can increase up to 8.33%, 8.74%, 7.66%, and 7.66% with respect to each use case (centralized, Case 1, Case 2, Case 3) and dataset (Breast cancer, Indian Liver, NSCLC-Radiomics dataset, Stage III NSCLC). Even though this margin may be perceived as inconsequential, the clinical risk of decisions based on predictions must be considered as with all changes in model performance. Cases with AUC differences above our 10% threshold (highlighted in red), indicate that this specific classifier is suboptimal for the dataset in question. Thus, with respect to learning [43], we recommend to select the classifier based on comparative performance of the different centralized and distributed classifiers, or base the selection justified criteria related to the data characteristics that will be used to fit the model. Previous reports on distributed ensemble learning [11,12], showed the potential of application of this particular type of distributed learning to small siloed datasets. For example, Tuladhar et al. [12] reports that grouping models learned locally from either artificial neural network, SVM, or random forest could efficiently exploit small sets of data to build global models. These results suggest that the application of ensemble learning on small dataset is feasible. While other authors have demonstrated that grouping local logistic regression models, is promising in the case of small datasets [11]. In addition to that, they proposed a model update based on the distributed sets of data information to improve the global model performance on small datasets. Results of these studies [11,12] showed an overall improvement in global model performance compared to models trained in a single institution data. These results, however, cannot be extended to the case of distributed sequential learning.
Our results demonstrate that sequential distributed learning can be beneficial for the application of AI for outcome prediction in favor of medical institutes holding very small datasets. Practical examples of small datasets can be 1) pediatric cases that tend to suffer from small sample sizes [44], 2) early phase clinical trials where the sample size tend to have around 20 subjects [45], and 3) rare diseases as they have a very low prevalence (<5/10000 in the European population) [46], making it nearly impossible for a single medical center to collect enough data to train machine learning models. Even with these limitations, and with considerably small datasets (20-100 datapoints), researchers have been using machine learning to build diagnosis and prognosis models for rare diseases [47]. The generalizability of trained models is directly related to the quality and quantity of the training data [48]. In this regard, distributed learning provides opportunity to develop generalizable models with small high-quality datasets in multi-center applications while also mitigating the need to share data and maintaining the privacy of all patient information, such as imaging, genomic, or clinical insight.
Batch size is well known to have an effect on final model performance [49], where evidence suggests that large batch size does not always relate to better model performances [50]. Conversely, in distributed learning applications, 1) a smaller batch size has been linked to the privacy of the training data, as it considerably reduces the ability to reproduce training data from shared model weights in case of weights leakage [51], 2) it has been well documented that the order of training partners in a distributed network influences the performance of the model [28]. Our results suggest that the centralized and distributed models are not statistically different. Therefore, we see distributed sequential learning as a viable tool for multicentric precision medicine studies, particularly in applications with small datasets such as rare diseases and could also be applied in pediatrics and early phase clinical trials.
The tuning of each classifier prior training of the final model is an essential step in achieving robust and generalizable models as this is dependent on the nature of data used in training. The need for tuning hyperparameters stems from the fact that the classifiers investigated in this work are using SGD as an optimizer, and thus cannot avoid this optimization step. The main parameter that needs to be optimized is the learning rate; as it controls the manner in which model is modified according to the estimated error at every iteration/update of the model weights. The process of learning rate selection is challenging as a small value theoretically facilitates better performances but in contrast can increase training phase time significantly. On the other hand, a larger learning rate value can result in an unstable training phase, as the model updates very quickly in each iteration causing it to converge to a flat (i. e., less optimal) minima.
Tuning model hyperparameters is also imperative for distributed classifiers, as we showed in this study. Furthermore, we observed that there is a need to investigate different combinations of hyperparameters and number of iterations. Hyperparameter and training settings such as number of iterations, coupled with a set of model selection criteria (based for example on a comparison of model accuracies or model parameters) [52], can be beneficial to reduce the risk of overfitting. However, this leads in turn to one limitation, related to the longer execution time in comparison to traditional centralized training. This increase in time is accounted by the need to investigate all the training data across all the participating partners to set optimal hyperparameters. In addition to this, it is important to consider communication costs, as all parameter tuning and model updates occur over the internet. Characterizing the time required for training is challenging as the duration is highly dependent on each partner's internet bandwidth. Future directions of this work will include analysis to characterize the model training duration in a distributed fashion, identify the scalability of the infrastructure to accommodate larger loads by increasing available computational power either though scale up (additional hardware) or scale out (additional nodes) and investigate the elasticity or ability to dynamically handle varying loads of data.

Conclusion
This study demonstrates 1) the proof-of-concept of sequential distributed learning applied on small sizes of data, narrowed down to a single datapoint at a time 2) the opportunities associated with this type of distributed infrastructures on the application of AI in low prevalence diseases. We simulated three different distributed learning cases using three classifiers and four different datasets. Our results indicate that sequentially training the models using (extremely) small datasets delivers statistically similar performance (p-values > 0.05) in comparison to the conventional centralized approach. This work provides a validation of the potential of distributed learning in case of small datasets and a new opportunity to data driven outcome modeling in rare disease research. Furthermore, this work can be used to continuously update predictive models as new data is available. Finally, future work is planned to estimate and optimize the scalability of sequential distributed learning infrastructures in real world settings.

Funding
Authors acknowledge financial support from ERC advanced grant (ERC-ADG-2015, n • 694812 -Hypoximmuno). This research is also supported by the Dutch Technology Foundation STW (grant n • P14-19 Radiomics STRaTegy), which is the applied science division of NWO, Aspasia NWO (grant n • 91716421) and the Technology Program of the Ministry of Economic Affairs. Authors also acknowledge financial support from SME Phase 2 (RAIL -n

Declaration of competing interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Fadila Zerka, Akshayaa Vaidyanathan, Fabio Bottari, Martin Gueuning, Hanif Gabrani-Juma, Mariaelena Occhipinti are salaried employees/ receive renumeration from Radiomics (Oncoradiomics SA). Dr Philippe Lambin reports, within and outside the submitted work, grants/sponsored research agreements from Varian medical, Radiomics (Oncoradiomics SA), ptTheragnostic/DNAmito, Health Innovation Ventures. He received an advisor/presenter fee and/or reimbursement of travel costs/external grant writing fee and/or in kind manpower contribution from Radiomics (Oncoradiomics SA), BHV, Merck, Varian, Elekta, ptTheragnostic and Convert pharmaceuticals. Dr Lambin has shares in the company Radiomics (Oncoradiomics SA), Convert pharmaceuticals SA and The Medical Cloud Company SPRL and is co-inventor of two issued patents with royalties on radiomics (PCT/NL2014/050248, PCT/ NL2014/050728) licensed to Radiomics (Oncoradiomics SA) and one issue patent on mtDNA (PCT/EP2014/059089) licensed to ptTheragnostic/DNAmito, three non-patented invention (softwares) licensed to ptTheragnostic/DNAmito, Radiomics (Oncoradiomics SA) and Health Innovation Ventures and three non-issues, non licensed patents on Deep Learning-Radiomics and LSRT (N2024482, N2024889, N2024889. Ralph T.H. Leijenaar has shares in the company Radiomics (Oncoradiomics SA) and is co-inventor of an issued patent with royalties on radiomics (PCT/NL2014/050728) licensed to Radiomics (Oncoradiomics SA). Sean Walsh and Wim Vos have shares in the company Radiomics (Oncoradiomics SA). Michel Dumontier has shares in The Medical Cloud Company SPRL. Rest of the co-authors have no known competing financial interests or personal relationships to declare.