The cloning, DNA sequence, and overexpression of the gene araE coding for arabinose-proton symport in Escherichia coli K12.

A lambda placMu1 insertion was made into araE, the gene for arabinose-proton symport in Escherichia coli. A phage containing an araE'-'lacZ fusion was recovered from the lysogen and its restriction map compared with that of the 61-min region of the E. coli genome to establish the gene order thyA araE orf lysR lysA galR; araE was transcribed toward orf. A 4.8-kilobase SalI-EcoRI DNA fragment containing araE was subcloned from the phage lambda d(lysA+ galR+ araE+) into the plasmid vector pBR322. From this plasmid a 2.8-kilobase HincII-PvuII DNA fragment including araE was sequenced and also subcloned into the expression vector pAD284. The araE gene was 1416-base pairs long, encoding a hydrophobic protein of 472 amino acids with a calculated Mr of 51,683. The amino acid sequence was homologous with the xylose-proton symporter of E. coli and the glucose transporters from a human hepatoma HepG2 cell line, human erythrocytes, and rat brain. The overexpressed araE gene product was identified in Coomassie-stained sodium dodecyl sulfate-polyacrylamide gel electrophoresis gels of cell membranes as a protein of apparent Mr 35,000 +/- 1,150. Arabinose protected this protein against reaction with N-ethylmaleimide.

Genetical Techniques-The E. coli strains, phages, and plasmids are listed in Table I. General procedures were performed as described by Miller (lo), and infections with phage XplacMul used the method of Bremer et al. (11). Liquid cultures were grown in the minimal medium of Ashworth and Kornberg (12), as modified by Henderson et al. (13), supplemented with 20 mM glycerol, 10 or 1 mM arabinose, and 80 pg/ml amino acids as appropriate. Induction of lysogens of XplacMul by ultraviolet light was performed as described by Davis and Henderson (14). The phage excision events occurred by illegitimate recombination, since there was no DNA homology at either end of the phage, so the process was nonrandom and of low frequency (104-106 plaque-forming units/ml). Lac+ phages were identified by plating on a lawn of a lac strain containing X-gal' (25 rg/ml) and inducer. Spi-phages (15, 16) were selected on a lawn in Tryptone soft agar of the P2 lysogen RB341 on minimal succinate (20 mM) plates. Transformations were performed by the method of Hanahan (17).
Although the araA or B mutation relieved the toxicity of arabinose conferred by the araD mutation (181, growth on certain substrates (including glycerol) was inhibited by arabinose. To overcome this inhibition minimal medium cultures were grown as described (13) with glycerol as the carbon and energy source to a culture density of 0.3-0.45 mg dry mass cells/ml (15-17 h at 30 "C) and induced with arabinose (final concentration 1 mM) for 1 h, when the culture reached a density of 0.5-0.65 mg dry mass/ml. The cells were then harvested and prepared for transport measurements and enzyme assays.
Construction and Characterization of ara::AphcMu Strains-E. coli strain JM2433 (Table I) has a double deficiency in enzymes of arabinose metabolism and a deletion through the lac operon. It is therefore a suitable recipient to select for infection by XphcMu (conferring a Lac+ phenotype) and subsequently screen for arabinoseinducibility of growth on lactose, 2.e. for in-frame insertions downstream of an arabinose operon.
Strain JM2433 was infected with XplacMul and the integration helper XpMu507 (11). Colonies selected on arabinose (1 mM) plus lactose (5 mM) as carbon sources (therefore expressing the lac genes) were replicated onto Iactose minimal medium and onto arabinose plus lactose. Five colonies out of 167 grew on arabinose plus lactose, but not lactose alone. The XplacMu insertions in these strains could be downstream of three known arabinose promoters, the ara operon located at 1 min (7, 19), the araE transport gene(s) at 61 min (4, 6, 20), or the araF,G operon at 45 min (2,21). To identify insertions in the araE gene, its known co-transducibility with lysA (3, 6, 7) was tested ( Table 11). The insertion in strain JM2443 was co-transducible with lysA, from strain CBK140, and was likely to be in araE. (The insertions in strains JM2442 and JM2444 were similarly shown to map in the araA,B,C,D region.) Given that the insertion is in the araE gene, the gene order with respect to both adjacent genes lysA and thyA was determined as follows. The Lys-NmR derivatives of strain JM2443 were also tested The abbreviations used are: X-gal, 5-brom0-4-chloro-3-indolyl p-D-galactoside; kb, kilobases; SDS, sodium dodecyl sulfate.

I1
Location of Q AplacMu insertion near lysA Insertions of AplacMu that caused an arabinose plus lactose-positive phenotype were isolated (see text). Phage P1 grown on strain K10 (wild-type) was used to transduce these recombinants to Ara', and such transductants were shown to retain arabinose-inducible 6galactosidase. Next, P1 grown on strain CBK140 (1ysA::TnB) was used to transduce the mutants to neomycin resistance (NmR). These recombinants were tested for their ability to grow on arabinose plus lactose. Only Lys-transductants were considered. thyA. This result agrees with that of Kolodrubetz and Schleif (4), but contradicts that of Macpherson et al. (6).
The Preparation of Q A Phage Containing the araE Gene-The lysogen JM1647 contained two phages Ad(araE' lysA' galR') and AcI857S7 as integration helper (6). Since the phages were cI857, they were induced by heating an exponential culture of the lysogen to 42 "C. The phages present in the mixed suspension were separated by cesium chloride buoyant density centrifugation (below). The more dense phage, Xd(araE' %SA+ galR'), was about 50 kb in length.
Induction of A cI857 Lysogens-A 5-ml inoculum was grown overnight with shaking a t 33 "C and diluted 100-fold into L-medium. This was incubated at 33 "C with shaking to a culture density of about 0.3 mg dry mass/ml (Am = 0.45). The temperature was raised rapidly to 42 "C by the addition of an equal volume of L-medium at 52 "C. Incubation was continued for 1.5-3 h. The resultant cells were then used to prepare phage (10) or membranes (see below) as appropriate. For plasmid-containing strains, ampicillin (100 pg/ml) was included in all steps until the heat shock.
DNA Preparations-Phages were propagated in lytic cycle as described by Maniatis et al. (22). Remaining cells were lysed by the addition of chloroform. DNase and RNase were added (each to about 1 pg/ml) and the suspension incubated for 1 h a t room temperature. The phages were concentrated by precipitation with polyethylene glycol 6000 and the phage resuspended in Adil (22) and sedimented a t 40,000 X g for 3 h. Phages in mixed lysates were separated by a cesium chloride block gradient (3/5 M) followed by an isopycnic cesium chloride (4 M) gradient. Phage DNA was isolated using the formamide extraction procedure of Davis et al. (23).
Plasmid DNA was extracted by treating cells with lysozyme and Triton as follows.2 An overnight culture (5 ml) of plasmid-containing organism in L-medium (22) plus ampicillin (100 pg/ml) was harvested and washed once in 0.1 M NaCl(5 ml). The cell pellet was resuspended in 10 mM Tris acetate (pH 8.0), 10% sucrose (200 pl), and the suspension placed in a microcentrifuge tube on ice. The three following additions were made a t 5-min intervals on ice: 5 mg/ml lysozyme (10 pl); 100 mM Tris acetate, 100 mM EDTA (pH 7.7) (100 p l ) ; 1% Triton X-I00 (200 pl). The mixture was centrifuged a t 12,000 X g for 15 min to sediment chromosomal DNA, and the plasmid DNA was obtained from the supernatant by phenol/chloroform extraction followed by ethanol precipitation (22).

England Biolabs, Bethesda Research Laboratories, and Amersham
Restriction Analysis-Restriction enzymes were obtained from New International. Digestions were performed as described by Maniatis et d . (22) and the products separated by electrophoresis on 0.7 or 0.9% (depending on fragment size) agarose gels in 40 mM Tris acetate, 1 mM EDTA (pH 7.7). DNA fragments were purified by the method of Dretzen et al. (24).
Subcloning of the araE Gene into a n Expression Vector-A 2.8-kb HincII-PuuII restriction fragment containing araE (See text) was isolated and ligated into the expression vector pAD284. The ligation mixture was transformed into strain AD5230 (Table I) and spread onto ampicillin plates. The resultant recombinant plasmids were screened for the N-(Gal-a t 42 "C) phenotype, and the DNA was extracted and examined by restriction digestion. The plasmids were then transferred from strain AD5230 into strain AD5827 (N+, ~1857).

TABLE I11
Association of (3-galactosidase activity with a membrane fraction of E. coli strain JM2443 Cultures were grown as described under "Experimental Procedures" to a density of about 0.5 mg dry mass/ml. They were harvested, washed, and resuspended to about 20 mg/ml in 200 mM Tris-CI (pH 8.0). The cell suspension was passed once through a French press a t 20,000 p s i . a n d centrifuged at 1,500-3,000 X g to remove unbroken cells and larger cell debris. This cell-free extract was further centrifuged a t 145,000 X g to obtain a cytoplasmic fraction (supernatant) and membrane fraction (pellet resuspended to the same volume as the cytoplasmic fraction). All operations were performed a t 0-5 "C except the &galactosidase assays on the extract and fractions ("Experimental Procedures"). Strain EJ18 has an operon fusion, i.e. cytoplasmic &galactosidase (25 Numbers in parentheses are wmol/mg/min.

The Arabinose-H' Symport Protein 8005
a host that provided the antiterminator and thus allowed efficient expression from PL (15).
Transport Assays-The accumulation of "C-radiolabeled sugars and sugar-promoted pH changes in bacterial suspensions were measured as described by Henderson et al. (13) and Henderson and Macpherson (8).
8-Galuctosidase Assays-Liquid cultures were lysed and assayed for 8-galactosidase by the method of Miller (10). Assays on plate-grown colonies were done as described by Davis et al. (25).
Labeling of AraE Protein with Radioactive N-Ethylmukimide-This was performed by the dual isotope method described by Henderson and Macpherson (8) except that whole cells, rather than vesicles, were used, from which bacterial membranes were prepared (see below). The preferentially labeled proteins were separated by SDS-polyacrylamide gel electrophoresis on 15% gels, which were electroblotted onto cellulose nitrate paper for slicing and determination of radioactivity by scintillation counting (8).
Preparation of Bacterial Membranes and Cytoplasm-Spheroplasts were prepared by the method of Witholt et al. (26) and lysed by osmotic shock in ice-cold deionized water (8). The membrane and cytoplasmic fractions of this preparation were separated by centrifugation (40,000 X g for 20 min at 4 "C). The membrane pellet was resuspended and the supernatant freeze-dried to concentrate the cytoplasmic proteins.
Separation of Proteins-SDS-polyacrylamide gel electrophoresis was performed as described by Henderson and Macpherson (8).
Fluorography-Amplify (from Amersham International) was used according to the manufacturer's instructions prior to drying the gel and exposing to x-ray film at -80 "C.
DNA Sequencing-The dideoxy chain termination method of Sanger et d. (27) with generation of random clones by sonication (28) was used. The oligonucleotides were separated on wedge-shaped gels cast and electrophoresed in the LKB Macrophor apparatus (14, 29). The contiguous sequence was assembled using R. Staden's Database System installed on the University of Cambridge IBM 3080 by M.

Biochemical Characterization of the Fusion of araE with
hZ-In strain JM2443 (see "Experimental Procedures") @galactosidase activity was induced by arabinose, and it was associated with the membrane fraction (Table 111); no arabinose-proton symport activity was detected (Fig. 1, Table IV), but there was an arabinose-inducible lactose (or lactose analogue)-proton symport (Fig. l, Table IV). These were properties expected of an in-frame lacZ fusion in an arabinose-inducible gene coding for a membrane protein, such as araE. The parent strain (JM2433), in contrast, possessed arabinoseproton symport and lacked methyl-@-D-thiogalactoside-proton (lactose-proton; 30-32) symport (Fig. 1, Table IV). The average arabinose-proton uptake of 331 nmol of H+/mg dry mass cells compared to about 50 nmol of H+/mg dry mass cells for E. coli with the a r d mutation (5). This indicated that the mutation in strain JM2433 was in araB (ribulokinase), since an active arabinose isomerase ( a r d gene product) would convert arabinose to ribulose, explaining the enhanced arabinose influx (Fig. 1, Table IV). It was also consistent with   a relatively low level of induction of @-galactosidase that would result from conversion of the inducer, arabinose, to ribulose. The methyl-P-D-thiogalactoside-proton symport activity in strain JM2443 was similar in extent to measurements on wild-type lactose operons (31,32). Uptake of radiolabeled arabinose was present in both strains (Table IV) (11). b, location of the E. coli DNA insert by comparison of phage Xd(araE' lysA' galR+) with phage XcZ857 S7; the insert is in the opposite orientation to those in Xpl+(araE'-'lac) and ApP@(araE'-'lac). c, aligned restriction maps of the inserts from phage Xp2@(aruE'-'lac), phage Xd(araE+ lysA+galR+) and the lysR region of E. coli (kindly given by J. C. Patte, c.f. Ref. 33). The restriction sites within the Xho-EcoRI region were confirmed and extended after its ligation into plasmid pBR322 (text and Fig. 3). The uertical arrows show the location where the araE gene fused with lacZ in the Xp2+(araE'-'lac) phage, and so identify the approximate location of araE in phage Xd(araE+ lysA+ galR+) and the E. coli DNA. The secondary attachment site where the prophage, from which Xd(araE' lys+ galR+) was derived, originally inserted is shown. d, precise location of the araE, orfX, lysR, and galR genes and the restriction sites derived from the DNA sequences of this work, Stragier and Patte (33), and Stoner and Schleif (9). containing phage ("Experimental Procedures"), two distinguishable types were plaque-purified on a nonselective host (CSH25). Phage Xpl@(araE'-'lac) was Spi+ and derived from the arabinose X-gal screen, and XpZ@(araE'-'lac) was Spiand was obtained by selection on the P2 lysogen RB341 ("Experimental Procedures"). DNA was prepared from these phages ("Experimental Procedures") and their restriction maps compared with that of XplacMu (Fig. 2a). The maps were consistent with the expected presence or absence of the lac, red, and gam regions (Fig. 2a). Furthermore, the E. coli DNA inserts and their restriction sites were readily located adjacent to the BamHI site in the phage-carried lac2 gene (Fig. 2a).
Similarly, DNA was prepared from Xd(aruE+ lysA+ gulf?+) and XcI857 S7 phages derived from E. coli strain JM1647 (Ref. 6, "Experimental Procedures"). Comparison of their restriction maps enabled the location of the E. coli DNA insert in the former phage (Fig. 2b) and identified the restriction sites within it.
The Location and Direction of Transcription of the araE Gene-Expanded restriction maps of the E. coli DNA inserts in the Xd(araE' lysA+ ga1R+),Xp2~(araEf-'lac) phages are aligned with the lysA region of the E. coli genome (33) in Fig.   2c. The araE promoter must be located to the left of the point where the insertion in strain JM2443 had occurred, which can be calculated from the BamHI site in lac2 located 1.6 kb away from it (Fig. 2c) FIG. 3. Subcloning of the 4.8-kb XhoI-EcoRI fragment into plasmid pBR322, and of a HincII-PuuII fragment into plasmid pAD284, yielding plasmids pMM25, pMM26, and pMM27. Plasmid pMM25 was used to confirm that the XhI-EcoRI fragment contained a functional, intact araE gene (see text and Fig.  4). It was the source of the PuuII-Him11 fragment for DNA sequencing. This was also ligated into the expression vector pAD284 to obtain the plasmid pMM27 and the control plasmid pMM26 with the insert in the opposite, unexpressed, orientation.  incomplete open reading frame orfY (15 amino acids) was the C-terminal end of the araE coding sequence (33). This was later confirmed by DNA sequencing (below). The position of the secondary attachment site where Xd(araE' lysA' galR') originally inserted was also deduced (Fig. 2c).
The Subcloning of the araE Gene and Biochemical Characterization of Its Phenotype-A 4.8-kb XhoI-EcoRI restriction fragment (Fig. 2c) large enough to contain the araE gene had ends convenient for cloning into the high copy number vector pBR322 (34). The Xd(araE' lysA+ galR') DNA was digested with these two enzymes and ligated into pBR322 digested with EcoRI and Sal1 (Fig. 3). The ligation mixture was transformed into strain DH1 (Table I)  The Arabinose-H+ Symport Protein 8009 restriction enzymes. The DNA of the resultant plasmid, pMM25, was isolated and the positions of its restriction sites confirmed (Fig. 3).
When the araE araF recA strain, "23 (Table I), was transformed with plasmid pMM25, it acquired both arabinose uptake and arabinose-proton symport (Table IV, Fig. 4). Curiously, the transformants were not Ara' as measured by growth of the strain on minimal arabinose plates, although the specific growth rate of this strain in minimal arabinose liquid cultures was enhanced (0.43 h-') compared with the untransformed strain (0.09 h-'). The absence of enhanced growth on solid media was difficult to explain, but the other phenotype characteristics clearly confirmed the presence of the araE gene on plasmid pMM25.
From the expected position of the fusion with lac2 (Fig. 2) and our suggestion (above) that orfY (33) was the C-terminal end of araE, it was deduced that the intact araE gene was located within a 2.8-kb HincII-PuuII restriction fragment (Figs. 2 and 3). This was isolated and ligated into the expression vector pAD284 in strain AD5827 ("Experimental Procedures," Fig. 3). Two plasmids were obtained, pMM27 in which the araE gene was in the correct orientation for expression from PL, and pMM26 with the reverse, incorrect, orientation (Fig. 3). Plasmid pMM26 was important for control experiments as it enabled the same host with the same DNA (vector and insert) to be subjected to the induction conditions. Transport of [14C]arabinose and arabinose-proton symport activities of strain AD5827 (pMM27), but not of strain AD5827 (pMM26), were increased after heat shock (Table  IV). This confirmed the successful expression of araE from PI..
Analysis of the Membrane Proteins of Induced Ouerexpressing Cells-Membranes were prepared from samples of the following cultures: thermo-induced AD5827 (pMM27), uninduced AD5827 (pMM27), and thermo-induced AD5827 (pMM26). These were solubilized (37 "C, 20 min) and the proteins separated by SDS-polyacrylamide gel electrophoresis (Fig. 5). In several experiments a protein of apparent M, 35,000 & 1,150 was present in induced AD5827 (pMM27) but not in either control (see, e.g. Fig. 5), consistent with the apparent M, found by Macpherson et al. (6). Furthermore, this protein (M, 35,000, 35,000, 37,000, and 38,000 in four experiments) was protected against reaction with N-ethylmaleimide by arabinose (Fig. 6), confirming that it was the araE gene product (6). AraE was about 7% of the total membrane proteins (estimated by scanning the absorbance of the Coomassie-stained gel at 560 nm), an amplification factor of about 10-50-fold. The protein could not be detected when solubilization was performed at 100 "C for 1.5 min. Whole cell preparations showed an additional protein of M, 25,000 (301, which may correspond to the product of orfX (calculated M , 25,207 (33)).
DNA Sequencing-The sequence of the 2.8-kb DNA fragment was determined (Fig. 7). At the position of the gene anticipated by the restriction analyses (above) was an open reading frame of 1416 base pairs corresponding to a protein of 472 amino acids. There was total agreement with the partial sequence of orfY determined by Stragier and Patte (33) and two discrepancies with the N-terminal bases determined by Stoner and Schleif (Ref. 9, Fig. 7). The reading frame was confirmed by sequencing the araE'-'lacZ fusion (Fig. 7) and by its correspondence with the promoter and N terminus determined by Stoner and Schleif (9). At the C-terminal end of araE (orf Y, above) was a sequence typical of a p-independent terminator (Fig. 7; Refs. 33 and 35). The other open reading frame in the sequence corresponded precisely to orfX, identified by Stragier and Patte (33), which starts immediately after the terminator. The frequency of use of optimal codons calculated by the method of Ikemura (36) was 0.74, similar to XylE (0.65), Lacy (0.62), and MelB (0.57), but lower than values for OmpA (0.92) and Lpp (0.98), which are expressed at much higher levels.
Analysis of the Amino Acid Sequence-As expected for an integral membrane protein AraE was highly hydrophobic (hydropathic index of 0.63; Ref. 37), with 12 hydrophobic domains separated by hydrophilic segments (37). The M, of the protein calculated from the sequence was 51,683, larger than the apparent M, of 36,000 (Figs. 5 and 6). Such discrepancies are shown by other integral membrane proteins (38)(39)(40), which appear smaller on SDS-polyacrylamide gels than their sequences predict.

DISCUSSION
The structure and function of membrane proteins is poorly understood compared with soluble proteins. This is largely due to their hydrophobic character, which makes them difficult to examine by standard techniques (41,42). A further complication with proteins such as AraE is their low abundance in the membrane even when fully induced in wild-type strains.