THE UNITED KINGDOM HUMAN GENOME MAPPING PROJECT RESOURCE CENTRE

The UK Human Genome Mapping Project Resource Centre (HGMP-RC) is a UK Medical Research Council (MRC) Unit located on the Hinxton Hall Genome Campus established by the Wellcome Trust. The Campus is situated at Hinxton Hall, a 22 hectare parkland site 20 km south of the university city of Cambridge and 4 km east of the Mil motorway to London, In addition to the HGMP-RC the site also houses the Sanger Centre and the European Bioinformatics Institute (EBI) and will incorporate facilities for international conferences, restaurant, sports and recreational facilities and on-site accommodation for visitors, The Sanger Centre, directed by John Sulston, is a new research centre established jointly by the Wellcome Trust and the MRC to provide a major focus in the UK for mapping and sequencing the human genome and genomes of other organisms, Major projects include the sequencing of nematode and yeast and physical mapping of human, The informatics group develops new software including ACEDB which provides a highly graphical interface to genome data, The EBI is an outstation of the European Molecular Biology Laboratory (EMBL) and will expand on the work of the EMBL Data Library, which was established in 1982 to build a database of nucleotide sequences, and today supplies this and many other kinds of biological information to scientists throughout the world, The EBI will continue the work of the Data Library in preparing and distributing databases of biological interest in collaboration with an extensi ve network of other European centres, It will expand in areas crucial to the long-term achievement of this service goal which include technology tracking in biology and informatics, research and development, training and user support. Specific areas for research will include comparison algorithms, networked information resources, and new database design, The HGMP-RC provides biological materials and services relating to the human and mouse genome projects, In addition it provides an online computing service, user support


INTRODUCTION
The UK Human Genome Mapping Project Resource Centre (HGMP-RC) is a UK Medical Research Council (MRC) Unit located on the Hinxton Hall Genome Campus established by the Wellcome Trust.The Campus is situated at Hinxton Hall, a 22 hectare parkland site 20 km south of the university city of Cambridge and 4 km east of the Mil motorway to London, In addition to the HGMP-RC the site also houses the Sanger Centre and the European Bioinformatics Institute (EBI) and will incorporate facilities for international conferences, restaurant, sports and recreational facilities and on-site accommodation for visitors, The Sanger Centre, directed by John Sulston, is a new research centre established jointly by the Wellcome Trust and the MRC to provide a major focus in the UK for mapping and sequencing the human genome and genomes of other organisms, Major projects include the sequencing of nematode and yeast and physical mapping of human, The informatics group develops new software including ACEDB which provides a highly graphical interface to genome data, The EBI is an outstation of the European Molecular Biology Laboratory (EMBL) and will expand on the work of the EMBL Data Library, which was established in 1982 to build a database of nucleotide sequences, and today supplies this and many other kinds of biological information to scientists throughout the world, The EBI will continue the work of the Data Library in preparing and distributing databases of biological interest in collaboration with an extensi ve network of other European centres, It will expand in areas crucial to the long-term achievement of this service goal which include technology tracking in biology and informatics, research and development, training and user support.Specific areas for research will include comparison algorithms, networked information resources, and new database design, The HGMP-RC provides biological materials and services relating to the human and mouse genome projects, In addition it provides an online computing service, user support M.J. BISHOP J nd e\ten~i\e training courses.The computing group is also involved in a number of co ll3borati\e de\elopment projects in genome informatics.
HG\IP-RC is a member of the European Data Resource Project along with the Centre de Bi olnformatique, CNRS -INSERM, Villejuif, France and the Deutsches KrebQ'orschungzentrum (DKFZ), Heidelberg, Germany which is the project coordina-lOr.The purpose is to provide computing services, training and workshops for the data handling aspects of the Human Genome Analysis Programme funded by the Commission of the European Union.
HGMP-RC is also a member of the European Molecular Biology Network (EMBnet).Th is is a foundation of centres providing computing facilities for molecular biology.The majority of nodes are national nodes established in many European countries.The national node in the UK is SEQNET at the Daresbury Laboratory.In addition to the national nodes there are a number of specialist sites of which the HGMP-RC is one.The project was established with funding from the BRIDGE programme of the Commission of the European Union.

HGMP-RC BIOLOGICAL SERVICES
The UK Human Genome Mapping Project Resource Centre provides specialist resources and services for scientists engaged in genome mapping and gene isolation studies.Access to the facilities is open to all registered users of the HGMP Resource Centre.Full registration is restricted primarily to UK or EU based academic users with a major interest in human genome mapping.Biological materials are provided free of charge to such users on the understanding that they return the results to the HGMP-RC.A modest charge is made to other categories of users.
The HGMP-RC is a repository for a collection of oligonucleotide primers.These include a set of 300 fIuorescently labeled microsatellite markers designed by Dr. John Todd for human genome exclusion mapping.There are also defined primary and secondary mapping primers for the mouse genome.A custom oligonucleotide synthesis service operates.The catalogue of primers is available to registered users on the online service or to anyone via Gopher and World Wide Web.An example of a query using the Xmosaic interface is shown in Figure I.
A Probe Bank has been established at the Imperial Cancer Research Fund (lCRF) Clare Hall Laboratories and the probes are distributed by HGMP-RC.The Probe Bank has two objectives.Firstly, to supply upon request DNA markers.The majority of those initially held in the Probe Bank detected RFLPs.Probes will be supplied in two possible formats: as bacteria containing recombinant plasmids (currently using DH5-alpha as host) or as purified DNA (approximately lO)..lg of DNA).Secondly to isolate new DNA markers, particularly to bridge some of the gaps seen in current genetic maps.We hope that this exercise can become an area of collaboration with other groups in the UK Human Genome Project.If any group has a particular interest in isolating new markers or wishes to develop new strategies, they should contact Dr. Nigel Spurr, UK DNA Probe Bank, ICRF Clare Hall Laboratories, Blan~he Lane, South Mimms, Potters Bar, Hertfordshire, EN6 3LD, UK.Tel: +44171 2420200, Fax: +441 707649527, Email: ns@clh.icnet.uk).
The HGMP-RC houses five human total genomic libraries, and supplies clones (some with chromosomal assignments) on request.A panel of monochromosomal somatic cell hybrids offers an additional physical mapping resource.This provides a method of assigning probes to a single chromosome but may not be totally reliable due to fragments of other chromosomes being present.'A service to map cosmids or Y ACs to chromosomes by fluorescent in situ hybridisation is provided.The DNA is forwarded to external laboratories which do the work and return the results to the HGMP-RC.Requests are encouraged from users who wish to characterise a set of related clones.
Fifty 'full length' human and rodent cDNA libraries prepared from a variety of tissues are available as a 100 ng DNA ligation mixture together with an agar culture of the host strain for transformation.
A collection of 15360 isolated cDNA clones from a foetal brain and a foetal adrenal library have been gridded onto filters for hybridisation screening.
There is a collection of partially sequenced cDNA clones details of which may be obtai ned from the cDN A database on the online service.The sequences may also be found in the EMBL or GenBank data files .
. 4. human CpG island library is available (Cross et aI. , 1994).CpG islands are short stretches of DNA containing a high density of non-methylated CpG dinucleotides.It is thought that many of these are associated with the 5' ends of about 60% of genes and often extend into the coding region .For this reason they are important as tools for discovering and mapping human genes.
For genetic mapping in the mouse, or genetic mapping of other mammalian probes on to a mouse 0.3cM anchor locus map, fifty individual mouse DNAs from the Mus musculus/Mus spretus interspecific backcross are available (in solution or dot blotted), for typing by PCR, RFLP analysis, or hybridisation.
The European Human Cell Bank (EHCB) is a specialist collection of cell lines derived from patients with genetic disorder or chromosome abnormalities held within the European Collection of Animal Cell Cultures (ECACC) established at the PHLS Centre for Applied Microbiology and Research, Porton, Wiltshire.Since May 1990 the EHCB has received a block grant from the HGMP to provide a Human Cell Banking Service.This includes a subsidised EBV transformation service for peripheral bloods from patients with genetic disorders or chromosome abnormalities.The EHCB now has over 80 laboratories throughout the UK contributing to the collection.Not all blood samples are EBV transformed immediately but some are held as untransformed lymphocytes until requested.The EHCB holds over 10,000 samples, with approximately 6000 already established as lymphoblastoid cell lines many with HGMP funding.The EHCB also offers other non-subsidised services such as provision of larger cell pellets for DNA extraction and purified DNA.For further details on the services offered or for obtaining material contact: Dr. Bryan Bolton, ECACC, CAMR, Salisbury, SP4 OJG, UK.Tel: +44 1980612512.Fax: +441 980611315.

HGMP-RC INTERNET SERVICES
A number of services at the HGMP-RC are offered on the basis of unrestricted access to users of the Internet.These are file transfer (anonymous FTP), Gopher (an information server) and a World Wide Web (WWW) server which provides hypertext information across networks, with links to other sources of information including Gopher, Wide Area Information Server (W AIS) and other WWW servers.
Files on the anonymou s FTP server include various manuals including the HGMP-RC Computing Handbook , Genome Data Base (GDB) and Online Mendelian Inheritance in Man (OMIM) as well as program documentation for programs in the online service menu.
The Gopher program is a distributed document delivery service.It allows people to access various types of data residing on multiple computer hosts in a seamless fashion.This is accomplished by presenting the user with a hierarchical set of menus of documents.Gopher programs at different sites talk to each other and send the requested documents to other sites.In addition to browsing through menus of documents, Gopher users can submit queries to Gopher to do searches.The response to a query is a list of documents that matched the search criteria.The contents of the HGMP-RC Gopher server are shown in Table I.The Xmosaic program provides pages of text, some words of which will be underlined or differentiated in colour.If you are interested in the subject click on the underlined text and XMosaic will retrieve the associated text from somewhere on the network.The top level of the HGMP-RC WWW server home page is shown in Table 2.

UK MRC HGMP-RC WELCOME TO THE UK MRC HGMP RESOURCE CENTRE
This is a collection of useful locations and services.Most of the information is public, but the Menu is only available if you have registered as a user of the HGMP services.

HGMP-RC COMPUTING SERVICES
Any bona fide applicant will be allowed access to the computing facilities, unless and until it becomes clear that such unrestricted access is straining the capacity of the system.
The objectives of the HGMP-RC computing are to establish and make available databases of genes, genetic markers and map locations, and to develop new computing environments and methods for acquisition and analysis of such data.
Genetics and other databases, application software and miscellaneous services are available on the HGMP servers and other systems around the world through the HGMP-RC Menu system.For a previous account of the system and services available see Rysavy et al., 1992.The fields of genetic mapping and molecular biology in general are changing fast.For scientific workers to remain competitive it is essential to have access to the very latest information and programs.The HGMP-RC facility aims to provide this in as easy and as intuitive a manner as possible.A dedicated facility can do this better and more comprehensively than can be achieved by the smaller resources available for individual departments or units.
Appropriate use is made of new technology to make access to information and services simple and effective.The facilities are connected to the SuperJanet academic research network and hence to the worldwide Internet.There are excellent network connections to the HGMP-RC from academic sites both nationally and internationally.It is possible to access the HGMP-RC from your local desktop computer and to then use resources throughout the world.
The major relevant software packages are available and regularly updated.The HGMP-RC provides the latest version of programs and data, often being aware of new developments before they have been publicly released.An outline of the scope of the online service is given in Table 3.
There is extensive online help for the programs.User support and user requirements are taken extremely seriously.Suggestions for improving the services offered are quickly acted on.

HGMP-RC COMPUTING TRAINING
Computer training courses are an important service of the HGMP-RC.The aim is to increase the effectiveness of a person's use of the computing resources and to illustrate particularly useful applications.Regular courses are held at sites around the UK on subjects ranging from general use of the programs at the HGMP-RC to advanced courses on specific applications.A book relating to the content of courses has recently been published (Bishop, 1994).
There are regular general 5 day courses (usually six times per annum).These give an excellent introduction in how to use the menu, basic unix and editing skills and to the most frequently used applications.In addition there are specialised courses for particular subject areas and applications as outline in Table 4.If you wish to make best use of the linkage programs, it is essential to go on one of the advanced three day linkage courses.

HGMP-RC COMPUTING DEVELOPMENTS
The HGMP-RC has a number of computing developments relating to biological projects and in collaboration with other centres.These are concerned with database development, storage and maintenance of data, and display of the data in ways which are useful and intuitive for the user.The philosophy adopted is to use commercial relational database management systems for the storage and maintenance of data.These include Sybase, Oracle and Ingres.Tools for displaying data are developing rapidly in their sophistication and ease of use and are becoming highly graphical and interactive.The Xmosaic program used for WWW provides a means of interfacing in a uniform way to a variety of databases.Where possible, tools developed elsewhere are reused with local data.Notable examples are the ACEDB program from Richard Durbin and Jean Thierry-Meig and the "encyclopedia of the mouse" software from the Jackson Laboratory.

YAC Consortium and Database
The aim is to provide Y AC libraries, Y AC clones and a service to map probes to Y ACs.Ordering of Y AC clones into overlapping contiguous stretches is important to provide access to the physical DNA.The data are being compiled at ICRF in the Reference Library Database (RLDB) which has an Xmosaic interface.Ordering of the CEPH mega Y AC library is well advanced.The data may be accessed at Genethon by WWW.
All four major Y AC libraries (CEPH mega Y AC, ICI, ICRF and St. Louis) plus some chromosome specific libraries are available for screening.Screening can be by PCR or by hybridisation to high density gridded filters.
Data are collected centrally by ICRF and made available online.The RLDB software is developed by Dr. G. Zehetner.Users are able to request clones that have been previously characterised.Data are being used to construct a physical map which will be used to complement other genome maps and will also be made publicly available.

eDNA Consortium and Database
In the search for human genetic disease genes the localisation to within a few Mbase can be achieved by genetic linkage, and hence to a few Y AC clones.Candidate genes from these Y ACs may be obtained by finding cDNA clones which they contain.The cDNA consortium provides cDNA libraries, cDNA clones, cDNA clones gridded on filters and a service to sequences cDNA clones of interest.There is also work on CpG island libraries which consist of genomic DNA enriched for the 5' ends of genes.A cDNA database has been constructed for the project.
The cDNA database was developed by Mr. G. Williams at the HGMP Resource Centre.The database is written in Sybase and contains information about the libraries and also the results of BLAST searches against clones which have been sequenced.The data are being maintained by Dr. Y. Umrania.The database is available to registered users of the Resource Centre.

Mouse Backcross Database
The Mouse Genome is a primary model organism for the Human Genome Project and considerable emphasis has been placed on the genetic and physical mapping of the mouse genome worldwide.One of the primary goals of the mouse genome project is the development of a high resolution « 1 cM) genetic map that will form the basis for the construction of a complete physical map of the mouse genome.EUCIB aims to provide the resources for the high resolution genetic mapping of the mouse genome.
A 1000 animal interspecific backcross between Mus musculus C57BLl6 and Mus spretus has been completed and DNAs prepared.Each backcross progeny mouse has been scored for 3-4 markers per chromosome completing an anchor map of70 loci across the mouse genome.A 1000 animal cross provides a genetic resolution ofO.3cM with 95% confidence.Completion of the anchor map allows the identification of pools of animals recombinant in individual chromosome regions and allows a rapid two stage hierarchical mapping of new loci .New markers are first analysed through a panel of 40-50 mice in order to identify linkage to a chromosome region.Subsequently, the new marker is analysed through a panel of mice identified as carrying recombinants within that chromosome region.
The backcross data is held in MBx -a database developed in SYBASE by Mr. D. Tailor.MBx is designed to store mouse, locus, probe and allele data.Allele data is presented as a scrollable matrix on the screen.When a new marker is analysed through the backcross, MBx provides lod score information to indicate possible linkage to a chromosomal region.With the aid of the alleles matrix, recombinant mice in this chromosomal region may be selected for second stage of hierarchical screening.In addition, at each stage, MBx will not only calculate the available lod scores for closely linked markers but will also determine genetic order with respect to closely linked markers by minimising the number of recombinants.
The genetic maps are derived from the data held in MBx and formatted for the Jackson Labs's Encyclopedia of the Mouse Genome, version 2.0 software.The maps are updated every day from the MBx database.

Integrated Genomic Database (IGD)
The Integrated Genomic Database (IGD) aims to provide access to constituent working databases, especially those associated with the EU and national European programmes.The project is run by Dr. O. Ritter at DKFZ in collaboration with the Dr. H. Lehrach and Mr. S. Bryant at ICRF and Dr. M.J .Bishop at the HGMP-RC in the UK and with Dr. J. Thierry-Meig at CNRS in Montpellier, France.Dr V. Markowitz at Lawrence Berkley Laboratory (LBL) is collaborating on the provision of database tools .IGD will use the ACEDB software as a network system on the client/server architecture as its user graphical interface.The first release was made in October 1994.
The project intends to build an integrated information system to handle human genome data.The system will collect data of interest into a comprehensive database, and will provide users with a set of tools to retrieve, display, analyse and edit the information on their local computers.Interface to software for sequence and structure analysis, genetic linkage and physical map assembly, will be part of the system, too. Figure 2 shows correlations of maps on chromosome 21 .
The project will design a comprehensive database for genome related conceptual and experimental objects, and populate it with data from major public databases including the Protein Data Bank (PDB), the EMBL Data Library, GenBank, Swiss-Prot, the Protein Identification Resource (PIR), Prosite, Enzyme, REBASE, GDB, OMIM, Entrez, Seqanalref, Mouse Genome Database (MGD), Mouse Encyclopedia, and other.Beside consensus information on genomic loci, maps, and phenotypes, we will import raw data from experimental databases and projects, including the RLDB, cDNA, DNA ProbeBank, CEPH/Genethon, Mouse Backcross, and other sources.
There will be an interface between the integrated database and external software tools for analysis, e.g.GCG, LINKAGE, SIGMA, CRIMAP and other.
The system will provide graphical display for complex objects like chromosome genetic and physical maps, clone grids, sequence feature maps, etc.Using mostly the mouse device, users will navigate along the data and invoke operations in a consistent and intuitive manner.
The IGD system is designed as open and extensible.We will provide tools for evolutionary extensions of the database schema, for rapid data import from new databases, for the development of interface to new analysis programs, and for the implementation of new display methods.
The IGD front-end will use software components available at no cost and portable over a wide variety of hardware platforms, ranging from PC's over UNIX workstations to supercomputers.Upon completion, the IGD system will provide powerful tools for genome researchers to retrieve reference data, manage own experimental data, carry out analyses, exchange data with collaborators, and submit data to resource databases all in a single framework, freely available, widely portable, and easily extensible.

Comparative Mapping Database
In addition to human and mouse gene mapping there is considerable interest in other mammals including the farm animals (ox , pig, sheep) to attempt to make alterations of commercial value.The PIGMAP project involves 17 European laboratories and plans to place 200 polymorphic genetic markers on the pig genome to give a resolution of 20 cM.The database of the PIG MAP project is PIGBASE which is compiled at the Roslin Institute, Edinburgh.Figure 3 shows a comparison of genetic linkage and cytogenetic maps for pig chromosome 6.
~ --:-':1'i ,""" A comparative mapping database is being compiled at Roslin in collaboration with HGMP-RC to allow easy comparison of gene order in mammals.Gene order appears to be quite highly conserved and this gives clues about where to look for a gene localised in one species but not so far in another.It also provides an interesting legacy of chromosomal rearrangements during vertebrate evolution which may shed light upon phylogenetic relationships.

Figure I .
Figure I.The Xmosaic interface used to query the HGMP-RC Primers database.
87 hybridisation, or by PCR, is facilitated by the provision of high density gridded colony filters, and DNA clone pools of varying complexity.
Human genome data OM 1M (McKusick's Online Mendelian Inheritance in Man) GDB/OMIM (Human Genome Database from Johns Hopkins) IGD (Integrated Genome Database from DKFZ) GV (GnomeView -GDB and Genbank Genome database) SIGMA (System for Integrated Genome Map Assembly) Genethon Map Data (Genethon Human CEPH YAC Map) LDB (Morton's Location Database) MHCDB (Database of the human MHC) Mouse genome data MGD (Jackson Labs's Mouse Genome Database) Mouse Genetics and Maps (MRC Radiobiology Unit Databases) The Encyclopedia of the Mouse (Mouse Genome Database) MIT (WhiteheadlMIT Genome Center Genetic Map of the Mouse) MBx (Mouse Backcross Database) Other genome data ACeDB (A Caenorhabitis elegans Database) FLYBASE (Drosophila Database) EcD (Escherichia coli Database) AAtDB (An Arabidopsis thaliana Database) AScDB (A Saccharomyces cerevisiae Database)

Figure 2 .
Figure 2. The correlation of cytogenetic, geneti c, restriction and overlapping clone maps for human chromosome 21 displayed by the ACEDB software as part of IGD.

Figure 3 .
Figure 3. Genetic linkage and cytogenetic maps for pig chromosome 6 displayed by the Xmosaic software.
These are the CEPH, ICI, ICRF and St. Louis Y AC libraries and the ICRF PI library.Screening of the ICI Y AC library, by

Table 2 .
The HGMP-RC WWW Home Page.

Table 3 .
Outline contents of the HGMP-RC online service menu.

Table 4 .
List of computing courses held at the HGMP-RC.