Main

Each year, malaria kills about 1 million children and causes debilitating illness in more than 500 million people1. Underlying this massive global health problem is a remarkable biological phenomenon, the co-evolution of three eukaryotic genomes2,3,4,5,6,7,8,9 (Table 1). The disease is caused by single-celled parasites of the genus Plasmodium, which invade, and reproduce in, human erythrocytes. The parasites are then transmitted from one person to another by blood-sucking mosquitoes of the genus Anopheles.

Table 1 Malaria involves three eukaryotic genomes

The evolutionary 'arms race' between the parasite, its vector and the human host is central to the problem of controlling disease. Plasmodium populations are continually evolving to resist antimalarial drugs and have sophisticated genetic mechanisms of evading the human immune system, presenting a major problem for the development of a vaccine against malaria2,7,10. Anopheles populations are likewise evolving to resist the insecticides that are used to control malaria, but they also have genetic defences against the parasite that might provide clues to new control strategies11,12. Malaria has also been a strong force for recent evolutionary selection in the human genome9,13, and uncovering all of the human genetic factors that confer resistance to malaria would provide clues to the molecular basis of protective immunity that would be invaluable for vaccine developers.

The genetic basis of human resistance to malaria can now be investigated systematically at the level of the whole genome, by using genome-wide association (GWA) analysis. In a typical GWA study, the genotype of thousands of individuals is determined at the positions of half a million or more single nucleotide polymorphisms (SNPs)14 (see page 728). The ultimate goal of GWA analysis is to uncover all of the DNA sequence variants that affect an individual's risk of disease, without sequencing the whole genome, by using statistical inferences based on common patterns of variation in the genome.

An important question that can be addressed by GWA analysis is why only some children develop severe malaria (that is, life-threatening forms of the disease15,16) in communities in which every child is repeatedly infected with Plasmodium falciparum, the species of parasite that is responsible for most deaths from malaria. Only a small proportion of P. falciparum infections progress to severe malaria, and epidemiological data indicate that about 25% of the risk is determined by human genetic factors17 (Box 1). A typical study design is to recruit individuals with severe malaria (cases) in a hospital setting and to recruit control individuals essentially randomly from the general population. By comparing the frequency of a set of SNPs in cases and controls, it is possible to estimate the effect of different sequence variants on an individual's risk of developing severe malaria. Because the risk of developing severe malaria is probably determined by many genetic factors and environmental factors operating at different stages of infection, the effect of any one factor might be small, so a large number of individuals must be studied to obtain statistically significant results.

Similar approaches could, in principle, be used to investigate the emergence and molecular basis of drug resistance in Plasmodium populations or insecticide resistance in Anopheles populations. However, this cannot be put into practice until genomic variation in Plasmodium and Anopheles populations is better understood3,18,19,20,21. A complicating factor for GWA studies of P. falciparum is that in a single infection, the parasites that are transmitted can have different genotypes, so an individual who is infected frequently can carry a parasite population of great genetic complexity. Another is that, in Africa, where malaria is most prevalent, the P. falciparum genome has low levels of linkage disequilibrium19,21. Linkage disequilibrium is a fundamental concept to consider in GWA analysis; it refers to the correlation between genotypes that is observed at neighbouring positions in the genome. The lower the level of linkage disequilibrium, the more positions in the genome need to be genotyped for an effective GWA study. Recent technological advances in massively parallel sequencing of single DNA molecules22 might help to overcome both of these problems, by enabling P. falciparum to be genotyped at a very large number of positions in the genome and by helping to distinguish the different parasite genotypes that can constitute a single infection.

In this Commentary, we describe how a global research network has been established to investigate the effects of genomic variation in humans on the biology and pathology of malaria. We focus on the human genome because the tools for genotyping and the framework for population genetics are further advanced than those for Plasmodium and Anopheles species. More specifically, we outline the practical reasons why malaria is more challenging to study by GWA analysis than many other common diseases, and we describe how we have established several projects that bring together large-scale studies carried out in multiple locations to address key scientific questions. We also describe the procedures that we use for standardizing and integrating data from different investigators, as well as the policies that we have developed to deal with issues of sample and data ownership, data release, intellectual property and ethics.

Challenges of GWA studies of malaria

The genetic analysis of human resistance to malaria is challenging at several levels, ranging from the practical and ethical issues of clinical research in the developing world to the statistical genetic issues arising from the great diversity of the populations that are affected.

Recruiting a large number of individuals with severe malaria presents challenges because most of the burden of malaria falls on poor communities with underfunded health services and no systematic medical records. A considerable proportion of children with severe malaria die within hours of reaching a hospital; therefore, for the clinical phenotype of malaria to be classified properly, research information must be gathered at the time of hospital admission. This implies considerable responsibilities on the part of the research team for ensuring standards of medical care, particularly in a resource-poor setting. Also, it is not feasible to take large amounts of blood from children who are ill with malaria, many of whom are anaemic, so it is often necessary to use whole-genome amplification to obtain enough DNA for genotyping at numerous SNP positions. This can reduce genotyping efficiency and thus diminish statistical power, making an even larger sample size necessary23.

In addition, designing an appropriate 'SNP genotyping' strategy for GWA studies of malaria is complicated by the large amount of genomic variation in Africa. Because of the low levels of linkage disequilibrium in populations in Africa, genotypes need to be sequenced at the positions of more SNPs than in studies of European populations. On the basis of the initial data from the International HapMap Project (http://www.hapmap.org), it was estimated that a GWA study of about 1.5 million SNPs in an African population would be approximately equivalent to a study of 0.6 million SNPs in a European population, in terms of the ability to tag a high proportion of common sequence variants6. But it is difficult to estimate how many SNPs will be required to tag all common variants until resequencing studies have generated a comprehensive list of common sequence variants in different African populations24.

Furthermore, the ethnic diversity of African populations presents numerous statistical challenges for GWA studies. Many African communities consist of several ethnic groups, and minor differences in the ethnic composition of the case groups and the control groups can lead to false-positive genetic associations. To exclude such artefacts, studies need to be designed carefully, and statistical genetic methods that correct for population structure need to be applied25.

Another problem arises from the genetic differences between populations in Africa, as opposed to within a single population. Signals of association are not expected to be constant across GWA studies carried out at different locations in Africa. For example, differences in haplotype structure can result in variable signals of association around a causal variant of a disease, particularly in genomic regions that have recently undergone evolutionary selection. Also, different populations can harbour different factors that confer resistance to malaria. One example of this is two resistance-associated forms of haemoglobin (haemoglobin S and haemoglobin C) that result from different SNPs at adjacent locations in HBB, the gene that encodes the β-chain of haemoglobin — these SNPs have different patterns of distribution in West Africa26,27.

But such differences between populations can be also highly informative. For example, they can aid in uncovering genetic factors that have evolved in specific populations and in investigating interactions between genes and the environment. Importantly, differences in the patterns of linkage disequilibrium between populations can help to distinguish a causal variant from neighbouring polymorphisms. This is necessary because many SNPs that have been associated with particular diseases are not the causal variants but show an association signal simply as a result of correlation with the causal variant (because of linkage disequilibrium). Thus, GWA studies carried out at multiple sites in Africa could provide a rich resource for identifying causal variants.

Developing a global research network

In the past, research into the human genetic factors that affect resistance to malaria has been characterized by multiple research groups each pursuing relatively small studies on their own samples. But the chance of making a discovery, and replicating the finding, is greatly increased if there are effective mechanisms for different research groups to share data and thereby enlarge the number of samples that are studied. The concept of forming a network for sharing data on the genomic epidemiology of malaria — which was to become the Malaria Genomic Epidemiology Network (MalariaGEN) — originated from work that was funded in 2003 by the Bill & Melinda Gates Foundation and by the UK Medical Research Council. The purpose of this funding was to develop web-based software that would allow the integration of clinical and genetic data collected by different research groups. This funding also supported a workshop on the ethical and ownership issues involved in sharing data, which was held in Accra, Ghana, in January 2004 and attended by scientists and clinical researchers from ten research groups in Africa.

MalariaGEN was established in 2005, with joint funding from the Bill & Melinda Gates Foundation (through the Foundation for the National Institutes of Health) and the Wellcome Trust, as part of the Grand Challenges in Global Health initiative28 (http://www.gcgh.org). The purpose of this joint funding was to discover mechanisms of protective immunity to malaria by combining analysis of human genome variation with large-scale epidemiological studies in malaria-endemic regions. Five objectives necessary for achieving this goal were identified: building a global network for sharing data on the genomic epidemiology of malaria; collecting DNA and clinical data from individuals with different phenotypes of malaria; characterizing genetic variation in populations in malaria-endemic regions; identifying genetic variants that provide protection against severe malaria; and defining the immunological mechanisms by which such genetic variants exert their protective effect.

The group of researchers who came together to tackle these objectives, the MalariaGEN investigators, are mainly leaders of clinical, epidemiological or immunological research projects in malaria-endemic areas, and they contribute samples and data to the MalariaGEN programme. Other MalariaGEN investigators contribute expertise and technical resources related to high-throughput analysis of genomic variation, statistical genetics or biomedical ethics. The host institutions of MalariaGEN investigators, the MalariaGEN partner institutions, are located in 15 malaria-endemic countries and 6 other countries (for additional information, see http://www.malariagen.net/resource/1), and the institutions in malaria-endemic countries have well-established study sites, where individuals are recruited to participate in research. Most of these study sites are in sub-Saharan Africa: in Burkina Faso, Cameroon, Gambia, Ghana, Kenya, Malawi, Mali, Nigeria, Senegal, Sudan and Tanzania. There are also MalariaGEN study sites in Papua New Guinea, Sri Lanka, Thailand and Vietnam.

To address the complexities involved in setting up such a global research network, MalariaGEN investigators agreed, at an inaugural meeting in Oxford, United Kingdom, in July 2005, to establish the network in four stages. The first stage was to establish a set of principles and processes, agreed by all investigators, to regulate a central resource of DNA samples and phenotypic data (Box 2 and see http://www.malariagen.net/resource/1). More specifically, this involved standardizing scientific definitions and procedures, enabling partners to gain secure access to the data resource via the Internet, and developing rules about data sharing, intellectual property and appropriate consent.

The second stage was to define a core scientific programme of large-scale experiments and statistical analysis, which would use data and expertise from multiple investigators, and the results of which would belong jointly to all of the investigators involved. Projects that are part of this core programme are called Consortial Projects (Box 2). There are four such projects so far, and each has a specific objective and a plan of action (Table 2). After a Consortial Project has been defined, each investigator decides whether he or she wishes to contribute to the project.

Table 2 MalariaGEN Consortial Projects

The third stage was to find ways of assisting investigators in malaria-endemic countries to develop clinical and epidemiological studies that would advance the core scientific programme. Investigators were invited to submit funding proposals for projects at their study sites that would contribute to Consortial Projects, using the research infrastructure of the local partner institution and founded on the scientific interests and expertise of the local investigators. Funding was allocated after proposals had been reviewed by a group of investigators that represented the network as a whole (with members from Cameroon, Gambia, Ghana, Italy, Kenya, Malawi, Mali, Sri Lanka, Sudan, Tanzania and the United Kingdom). This group evaluated both the scientific design and the feasibility of the clinical and epidemiological studies proposed, taking into account the infrastructure and expertise of the local partner institution and study site.

The fourth stage was to strengthen the capacity to manage data, and to carry out statistical and genetic analyses, at partner institutions in malaria-endemic countries. A fellowship programme in data analysis was established. After an open application process, a data fellow was appointed at each partner institution. Most of the MalariaGEN data fellows work on the team of a MalariaGEN investigator and have responsibilities for managing the team's data. All data fellows receive training and support in data management, statistical genetics and computing skills. This training is provided by a team of expert statisticians, geneticists and computer programmers who work at the MalariaGEN Resource Centre, which is based at two locations in the United Kingdom, at the University of Oxford and at the Wellcome Trust Sanger Institute near Cambridge. Members of the MalariaGEN resource centre organize regular data-analysis workshops, both in the United Kingdom and at partner institutions in malaria-endemic countries. These workshops provide structured teaching, together with an opportunity for data fellows to share their experiences and to analyse their own data with hands-on assistance from an expert.

Dealing with data

Sharing data is a simple concept but, when many investigators and partner institutions are involved, it can be complex to put into practice. There is the technical issue of how to amalgamate data from different research groups. There needs to be transparency about the ownership and permitted uses of the data and samples contributed by investigators. Procedures need to be established for releasing data and, where appropriate, for protecting intellectual property. This section outlines how MalariaGEN has dealt with each of these areas.

Standardizing and integrating data

Standardizing and integrating data from multiple study sites is central to MalariaGEN's mission. As an example, Consortial Project 1 (Table 2), which is the core project of MalariaGEN's programme, depends on there being a standardized clinical definition of severe malaria. Severe malaria consists of several overlapping clinical syndromes, often referred to as subphenotypes: these include cerebral malaria (which is characterized by coma), profound anaemia and respiratory distress. Some genetic factors confer resistance generally to severe malaria, whereas others might be specific for a subphenotype. The clinical definition of severe malaria therefore depends on a combination of observations, some of which (for example, respiratory distress) can be quantified less precisely than others (for example, anaemia, through measuring haemoglobin concentration), and there is ongoing research into how to minimize the diagnostic error rate. After consulting MalariaGEN investigators — and after a joint meeting with the Severe Malaria in African Children network16, in Yaoundé, Cameroon, in November 2005 — a standardized case report form was agreed (see http://www.malariagen.net/resource/1). This form is not intended to replace the case report forms used by individual investigators but rather to provide a template for extracting core information from different clinical data sets in a standardized manner, while giving investigators the freedom to collect data in the way that is most appropriate to their own research.

In a large research network, there will be site-to-site variation in the way in which clinical and epidemiological information is recorded and stored at the local level, so investigators and data fellows are encouraged to have an active role in data standardization and integration. This is facilitated by web-based software developed specifically for this purpose by the MalariaGEN resource centre. Investigators collect data using the database format that is best supported at their institution, and they periodically upload their data via a secure, password-protected interface to a personalized section of the MalariaGEN website, which cannot be accessed by others. Tools are provided for checking data integrity and for transferring data into the database for the relevant Consortial Project. The process of data transfer generally requires the investigator to recode or transform certain variables in their own data set to match the format of the project database, and the web-based software assists and documents this process.

MalariaGEN investigators are also working on the standardization of immunological assays as part of Consortial Project 2, which involves investigating the genetic determinants of the immune response in different populations and environmental settings (Table 2). In the first phase of this project, antibody measurements are being carried out at a central reference laboratory to ensure that data from different study sites can be directly compared. In the long term, the project seeks to develop robust methods and standardized reagents that will enable reference laboratories to be established at partner institutions.

Sharing data and establishing rules of ownership

MalariaGEN is a data-sharing community in which independent investigators with different projects and research objectives contribute to a central repository of DNA samples and a central database of core phenotypic data for each Consortial Project. General principles of data sharing and ownership were agreed at the inaugural meeting of MalariaGEN (Box 2 and see http://www.malariagen.net/resource/1). The major findings of each Consortial Project will be published in scientific journals, with all investigators who contributed to the project listed as authors. In addition, investigators are encouraged to analyse the data that have been generated from their own samples, and to incorporate any additional clinical or experimental data that they have for these samples; these analyses are then permitted to be published independently of the findings of the Consortial Project.

One of the most important considerations when building the database for each Consortial Project was protecting the anonymity of research participants. The MalariaGEN database contains no personal identifiers and is not linked to databases at local study sites. However, one of MalariaGEN's key principles is that investigators should be able to analyse data generated from samples that they contributed and to amalgamate these data with locally held phenotypic data. A standard operating procedure was therefore developed to ensure that the local databases held by partner institutions that contain data generated by MalariaGEN are designed and used according to appropriate ethical guidelines (see http://www.malariagen.net/resource/1).

Releasing data and protecting data as intellectual property

Because the scientific benefits of GWA studies are cumulative, the value of a single study can be increased substantially if the data for individual subjects are available to the wider scientific community, provided that the identity of these individuals is securely protected14,29. MalariaGEN's policy on this topic was developed in consultation with all MalariaGEN investigators and with ethics-review boards at several MalariaGEN partner institutions (see http://www.malariagen.net/resource/1). In broad terms, the data-release policy seeks to permit research that is consistent with the nature of informed consent and the uses of the samples agreed by the relevant ethics-review boards. A key concern that arose from the consultation was to guard against the data being used in a way that might lead to any form of ethnic stigmatization. Another concern was to ensure that the timeline for data release is fair for investigators in malaria-endemic countries who have contributed resources and data to a project, because these investigators generally have less capacity for analysing genetic data than researchers in rich countries. Balancing the benefits of prompt data release with the need to protect the interests of partner institutions, MalariaGEN's current policy is to release GWA data 9 months after contributing investigators have had access to the complete data set. Data are placed in the European Genotype Archive (http://www.ebi.ac.uk/ega) and are then made available on application to an independent data-access committee (as described on the MalariaGEN Data Access web page, http://www.malariagen.net/access). As an additional check and balance, a working group is being established to represent partner institutions and ethics-review boards in malaria-endemic countries, and this group will be kept informed about applications for access to data and consulted about any proposed changes to the data-release policy.

It was important for MalariaGEN to develop guidelines on the circumstances in which data should be protected as intellectual property before publication, with careful consideration of arguments for and against patenting discoveries30. On the one hand, if a scientific discovery could lead to health benefits, then every effort should be made to make these benefits available to those who need them most, a process that could involve patenting the discovery. On the other hand, there is an argument for releasing data as openly as possible when there are no immediate applications for improving health and when open access to the data could drive innovations that might lead to health benefits. Arguably, for genomic epidemiology data, the prompt release of scientific findings is, in general, the appropriate course of action, but occasionally there might be discoveries that are exceptions to this. MalariaGEN's current policy is that intellectual-property protection should be sought if all three of the following conditions are satisfied: the discovery must be directly relevant to a medical application; it must be probable that the intellectual property will be licensed for development immediately; and the discovery must have been shown to require intellectual-property protection as a stimulus for further development (see http://www.malariagen.net/resource/1). In such cases, intellectual property will be licensed to non-profit organizations if possible. And, if financial benefits arise, then MalariaGEN will seek to ensure that these benefits flow to the communities who participated in the research.

Engaging with ethical issues

A range of ethical and social issues arise in establishing a network to share data between investigators in many countries. Ensuring ethical standards for the conduct of clinical research in developing countries raises many complex issues31. And the accumulation of detailed genomic information about individuals is raising new questions for society in general32. This combination of ethical and social issues needs to be addressed appropriately33. MalariaGEN has therefore established a team with expertise in medical ethics, which works with investigators and partner institutions to assess the ethical and social issues at different study sites, with the aim of establishing best practices for the ethical conduct of research carried out by MalariaGEN. This ethics team also develops training materials for investigators and ethics-review boards and has held workshops in Kenya, Mali, Thailand and Vietnam. To support investigators in tackling specific ethical issues and to gain an understanding of local practices, members of the team have also visited study sites in Cameroon, Gambia, Ghana, Kenya, Malawi, Mali, Papua New Guinea, Senegal and Sudan.

One of the most important aspects of this work is to find effective ways of communicating with research participants34. For example, when a very sick child is brought from her village to a busy government hospital, and her parents are asked whether part of the diagnostic blood sample can be used for a research project, it is often difficult to convey the distinction between medical diagnosis and medical research. Terms such as 'research', 'genetics', 'laboratory' and 'database' might be meaningless unless a concise and effective way is found of translating these concepts into the local language (by using examples and metaphors drawn from local experience), without creating anxiety by information overload. After consulting investigators and ethics-review board members, MalariaGEN has developed a template and guidelines for obtaining informed consent from participants in genetic studies of resistance to malaria (see http://www.malariagen.net/resource/1). To understand how guidelines can be put into practice most effectively, the ethics team is also undertaking empirical research on the process of gaining informed consent at different study sites, with the objective of establishing best practice across MalariaGEN study sites, while being sensitive to local culture and practices.

The ethics team is also working to develop models of consultation at the community level that are appropriate for diverse cultural settings. A sensitive issue for many communities is the potential abuse of genetic data relating to ethnicity, which could result in stigmatization. Qualitative research is being carried out to understand the perspectives of communities and other stakeholders on the collection and use of information about ethnicity in genomic epidemiology projects. The aim is to develop guidelines for the publication and release of data about ethnicity that will provide the maximum scientific benefit while safeguarding the interests of participants and their communities.

Many of the ethical and social challenges confronting MalariaGEN stem from the diversity inherent in a large scientific enterprise with partners in rich and poor countries that span multiple disciplines, from clinical research and community-based research to state-of-the-art genomics and bioinformatics. Often, partners need to agree on an appropriate balance between standardization and shared practices on the one hand, and diversity and sensitivity to local circumstances on the other hand. MalariaGEN's procedures for data integration and guidelines for informed consent are examples of this process.

Looking forward

In September 2008, an ambitious plan for the elimination of malaria was announced — the Global Malaria Action Plan (http://www.rbm.who.int/gmap). This plan, which is supported by major international development agencies and governments around the world, seeks to halve the number of malaria cases worldwide by 2010 and to eliminate deaths from malaria almost completely by 2015. But it cannot succeed without effective insecticides and antimalarial drugs. And even if the plan's goals for the next decade are achieved, the chance of controlling and eliminating malaria over the long term will be greatly increased if an effective vaccine becomes available.

The new science of genomic epidemiology could assist these efforts to eliminate malaria, by providing more effective ways of monitoring the emergence of parasite resistance to antimalarial drugs and of mosquito resistance to insecticides, and by providing new leads for malaria vaccine development based on a better understanding of the natural mechanisms of protective immunity.

If genomic epidemiology is to make a contribution in this way, there need to be mechanisms in place to help researchers both in malaria-endemic countries and worldwide to pool their resources. Research groups in malaria-endemic countries need access to the technical expertise and infrastructure for the large-scale analysis of genomic variation. And research groups worldwide need to combine forces to analyse the massive amounts of data being generated by these studies, leading the way for important discoveries to be made. The MalariaGEN community is endeavouring to learn how to build and maintain the relationships, shared values and best practices that underpin this new type of scientific collaboration.