An agnostic study of associations between ABO and RhD blood group and phenome-wide disease risk

Background: There are multiple known associations between the ABO and RhD blood groups and disease. No systematic population-based studies elucidating associations between a large number of disease categories and blood group have been conducted. Methods: Using SCANDAT3-S, a comprehensive nationwide blood donation-transfusion database, we modeled outcomes for 1217 disease categories including 70 million person-years of follow-up, accruing from 5.1 million individuals. Results: We discovered 49 and 1 associations between a disease and ABO and RhD blood groups, respectively, after adjustment for multiple testing. We identified new associations such as a decreased risk of kidney stones and blood group B as compared to blood group O. We also expanded previous knowledge on other associations such as pregnancy-induced hypertension and blood groups A and AB as compared to blood group O and RhD positive as compared to negative. Conclusions: Our findings generate strong further support for previously known associations, but also indicate new interesting relations. Funding: Swedish Research Council.


Sample-size estimation
• You should state whether an appropriate sample size was computed when the study was being designed • You should state the statistical method of sample size computation and any required assumptions • If no explicit power analysis was used, you should describe how you decided what sample (replicate) size (number) to use Please outline where this information can be found within the submission (e.g., sections or figure legends), or explain why this information doesn't apply to your submission:

Replicates
• You should report how often each experiment was performed • You should include a definition of biological versus technical replication • The data obtained should be provided and sufficient information should be provided to indicate the number of independent biological and/or technical replicates • If you encountered any outliers, you should describe how these were handled • Criteria for exclusion/inclusion of data should be clearly stated • High-throughput sequence data should be uploaded before submission, with a private link for reviewers provided (these are available from both GEO and ArrayExpress) Please outline where this information can be found within the submission (e.g., sections or figure legends), or explain why this information doesn't apply to your submission: As this was a registry study, no sample-size calculations were carried out. However, in "Materials and Methods" subsection "Outcomes" we address the issue of insufficient events per disease category.
The "Replicates" does not apply to this epidemiological study in the above definition. In this study independent statistical test were carried out in the multiple comparisons setting, this is addressed in "Materials and Methods" subsection "Statistical methods".

Statistical reporting
• Statistical analysis methods should be described and justified • Raw data should be presented in figures whenever informative to do so (typically when N per group is less than 10) • For each experiment, you should identify the statistical tests used, exact values of N, definitions of center, methods of multiple test correction, and dispersion and precision measures (e.g., mean, median, SD, SEM, confidence intervals; and, for the major substantive results, a measure of effect size (e.g., Pearson's r, Cohen's d) • Report exact p-values wherever possible alongside the summary statistics and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.
Please outline where this information can be found within the submission (e.g., sections or figure legends), or explain why this information doesn't apply to your submission: (For large datasets, or papers with a very large number of statistical tests, you may upload a single table file with tests, Ns, etc., with reference to sections in the manuscript.)

Group allocation
• Indicate how samples were allocated into experimental groups (in the case of clinical studies, please specify allocation to treatment method); if randomization was used, please also state if restricted randomization was applied • Indicate if masking was used during group allocation, data collection and/or data analysis Please outline where this information can be found within the submission (e.g., sections or figure legends), or explain why this information doesn't apply to your submission:

Additional data files ("source data")
• We encourage you to upload relevant additional data files, such as numerical data that are represented as a graph in a figure, or as a summary table • Where provided, these should be in the most useful format, and they can be uploaded as "Source data" files linked to a main figure or table • Include model definition files including the full list of parameters used • Include code used for data analysis (e.g., R, MatLab) • Avoid stating that data files are "available upon request" Please indicate the figures or tables for which source data files have been provided: Statistical analysis methods are addressed in "Statistical methods" section and further discussed in the "Discussion". P-values, raw and adjusted, un-adjusted confidence intervals, as well as events are partly presented in Figure 1, 2 and 3, and fully searchable in Supplement Tables 3-6.
The main cohort consisted of all person-time for all individuals in the database, until blood donation if a subject started to donate blood, to end of follow-up. The validation cohort consisted of all person-time from start of blood donation until end of follow-up. Figure 1-2 are provided as live html figures as well as static images. The live html figures include relevant javascript libraries (plotly) and the information possible to extract when browsing corresponds to most source information. As previously stated, this information is also available in the Supplementary Tables, also provided as searchable html-files. Information on model parameterization is provided in "Statistical methods" section. Jupyter notebook of SAS code will be available at publication on https://github.com/SCANDAT/ABO.