Genome-wide CRISPR screening identifies new regulators of glycoprotein secretion

Background: The fundamental process of protein secretion from eukaryotic cells has been well described for many years, yet gaps in our understanding of how this process is regulated remain. Methods: With the aim of identifying novel genes involved in the secretion of glycoproteins, we used a screening pipeline consisting of a pooled genome-wide CRISPR screen, followed by secondary siRNA screening of the hits to identify and validate several novel regulators of protein secretion. Results: We present approximately 50 novel genes not previously associated with protein secretion, many of which also had an effect on the structure of the Golgi apparatus. We further studied a small selection of hits to investigate their subcellular localisation. One of these, GPR161, is a novel Golgi-resident protein that we propose maintains Golgi structure via an interaction with golgin A5. Conclusions: This study has identified new factors for protein secretion involved in Golgi homeostasis.

Protein secretion is a fundamental and well-known process in cell biology, in which proteins are transported from the endoplasmic reticulum (ER) to the Golgi via coat protein-coated vesicles and subsequently to the plasma membrane [1][2][3] . The localisation and activity of proteins involved in this secretory pathway must be tightly regulated to ensure correct spatiotemporal distribution of membranes and cargo proteins along the pathway. While the molecular machinery of secretion is relatively well understood, our knowledge remains incomplete, particularly regarding the regulation of protein secretion. For example, evidence suggests that multiple trafficking routes for transport within the Golgi exist 4 . The complex process of sorting proteins at the Golgi into vesicles for their correct destinations also requires further investigation 5-7 .
Recently, genome-wide CRISPR screening has emerged as a powerful strategy to identify novel gene functions; this type of screening has the advantages of being unbiased and more reliable than previously-used methods, in terms of introducing genetic mutations 8 . As such, it provides an opportunity to uncover new information about the regulation of protein secretion. Pooled CRISPR screening has been used for a wide range of applications, including the investigation of drug-resistance mechanisms in cancer cells 9 , the genetics of pluripotency 10 , autophagy regulators 11,12 and host factors required for viral infection 13 . We previously demonstrated that unbiased pooled genome-wide CRISPR screening could effectively reveal key players required for glycoprotein secretion, using galectin-3 retention at the cell surface to assess glycoprotein secretion 14 . Galectin-3 is a cytosolic protein that is secreted without entering the conventional secretory pathway for its export to the extracellular space. Vesicular and non-vesicular modes of secretion have been proposed, but the precise series of events involved in the unconventional secretion of galectin-3 remain ill defined 15 . Once outside the cell, galectin-3 binds to β-galactosides resident on the cell surface 16 and can therefore be used as an indirect measure of glycoprotein transport to the cell surface via the ER-Golgi secretory pathway. Using a binding partner of glycans as a readout, rather than one specific glycoprotein, allows regulators of general glycoprotein secretion to be discovered. It is also possible that the hits identified may not exclusively regulate glycoprotein secretion but may be more general regulators of more protein secretion; further experiments would be required to make this distinction.
Here, we devised a powerful experimental pipeline, using an improved pooled genome-wide CRISPR screen, followed by two arrayed secondary screening methods using siRNA knockdown to identify new factors involved in glycoprotein secretion and Golgi apparatus architecture. Combining these methods allows the speed and reduced cost of pooled CRISPR screening to be taken together with the advantages of an arrayed siRNA screen in which genotype and phenotype remain linked 17 . Using this approach, we were able to validate 55 novel hits that are important for glycoprotein secretion. We found that many of these hits are also important for maintenance of the Golgi architecture. One of these hits, GPR161, is a novel Golgi-localised protein that decreases glycoprotein secretion and disrupts the Golgi structure.

Cell culture
HeLa cells, suspension HeLa cells expressing Cas9 (sHeLa-Cas9) and HeLa cells expressing horseradish peroxidase fused to a signal sequence to direct its secretion (HeLa-ss-HRP) were cultured in 10% culture medium: 10% (v/v) FBS, 100 U Penicillin / 0.1 mg ml -1 streptomycin, 2 mM L-glutamine in high glucose DMEM, in 5% CO 2 at 37°C. See Table 1 for a list of all reagents used and their sources.
CRIPSR screen: transduction with Brunello library 24 h before transduction, suspension HeLa cells were passaged. The lentiviral Brunello library, which targets 19,114 genes in the human genome with four unique guides per gene 18 , was used to transduce 1 × 10 8 suspension HeLa cells expressing Cas9 (sHeLa-Cas9) in polybrene media (10 µg ml -1 polybrene, 10% (v/v) FBS, 100 U Penicillin / 0.1 mg ml -1 streptomycin, 2 mM L-glutamine in high glucose DMEM (Sigma, D6546)) at an MOI of 0.50 by spinoculation at 1000 × g for 30 min at 20°C. This gave a transduction efficiency of 40% and therefore an overall percentage of 31% cells with one guide per cell, as calculated using the Poisson distribution for the probability that a cell will be infected with at least one virus, 1− P(0, MOI) 19 . Cells were incubated for 24 h at 37°C, 5% CO 2 before passage into puromycin media (1 µg ml -1 puromycin, 10% (v/v) FBS, 100 U Penicillin / 0.1 mg ml -1 streptomycin, 2 mM L-glutamine in high glucose DMEM). Cells were cultured in puromycin media for seven days to retain only transduced cells. The transduction efficiency was measured as the percentage of live cells after 48 h in puromycin media.
CRISPR screen: cell sorting by flow cytometry Cells were washed once with DMEM, incubated with 50 µg ml -1 anti-galectin-3 antibody conjugated to Alexa Fluor 647 for 30 min at 4°C, then washed again. Approximately 3 × 10 7 cells were sorted at the NIHR Cambridge BRC cell phenotyping hub on a BD FACS AriaIII and FACS Aria Fusion (using a 100 µm Nozzle and run at a pressure of 25 psi); Alexa Fluor 647 fluorescence was detected with a 670/30 BP detector on the AriaIII and Aria Fusion cell sorters. Target cell population to be sorted was gated based on the lowest anti-galectin-3-AF647 fluorescent signal, using Purity as sort precision mode. After cell sorting, cells were cultured in 20% serum media for 24-48 h, then cultured in culture media. When the population had expanded to 5 × 10 7 cells, two samples were taken; 5 × 10 6 cells were frozen for guide sequencing, and 1 × 10 6 were assessed by flow cytometry. Gating leniency was decreased with progressive rounds, as shown in Figure 1. After the third sort, two distinct negative populations were observed, so a fourth sort was performed to separate these two populations,

Amendments from Version 1
This version includes modifications suggested by the reviewers, which was essentially text corrections and additional discussion of the data with previous literature. gating each population around a 5-10% peak. For flow cytometry of 1 × 10 6 cells after cell sorting, cells were immunostained as above, with an additional propidium iodide stain for 5 min, and analysed on an Accuri TM C6 (BD Biosciences) equipped with lasers providing 488 nm and 640 nm excitation sources. Alexa Fluor 647 fluorescence was detected with an FL4 detector (675/25 BP).

CRISPR screen: Library preparation and sequencing
To prepare samples for sequencing, genomic DNA from the different cell populations was extracted from frozen cell pellets using Gentra Puregene Cell kit and concentration of DNA was measured on the Nanodrop. PCR was carried out in quadruplicate to amplify an amplicon containing the guide, with primers used to attach barcodes, stagger regions and sequencing adaptors for use in sequencing, as previously described by Doench et al. 18 . Briefly, each well was set up to contain 10 µg genomic DNA, 0.5 µM uniquely barcoded P7 primer, 0.5 µM P5 stagger primer mix, 200 µM each dNTP, 7.5 units ExTaq and 1x ExTaq buffer in a total volume of 100 µl. PCR cycles were: initial denature step at 95°C for 1 min; 28 cycles of 95°C for 30 s, 53°C for 30 s, 72°C for 30s; final extension at 72°C for 10 min. One replicate from each sample was analysed on a 2% agarose gel to confirm amplification was successful. PCR products were pooled, adding 30 µl from each PCR reaction, then purified using AMPure XP magnetic beads to retain only DNA fragments larger than 100 bp 18 . An equal volume of the pooled product and AMPure XP magnetic beads were mixed at room temperature for 5 min, then placed on a magnet to retain beads. Beads were washed three times with 70% ethanol and purified PCR product was eluted with 500 µl EB buffer. Sequencing was carried out by the CRUK genomics facility using SE50 sequencing on the Illumina HiSeq 4000.
CRISPR screen: Bioinformatics analysis using MaGECK Data were analysed using both MaGECK-MLE 20 and MaGECK-RRA 21 (version 0.5.7), using an enrichment (positive) sort type. In all analyses, median normalisation was used. PCR replicates were treated as technical replicates and sort replicates were treated as biological replicates. Each population was given an identification code; descriptions are available in data file S2 (see Underlying data) 22 . Genes were annotated with their function using UniProt 22 , and with the organelle they reside in using Gene Ontology subsets 23,24 .
Secondary screen: HRP assay Based on having an unknown or unclear function in secretion, hits from the MaGECK analyses were selected to create a shortlist of 368 genes to be used in a secondary screen. This shortlist was screened in an arrayed siRNA screen, using siGenome smart pools (Dharmacon) in which each plate contained two replicates of non-targeting siRNA as a negative control. To screen for conventional secretion, HeLa cells expressing ss-HRP were reverse transfected in with the secondary screen library, as previously described 25  To identify genes required for HRP secretion, a line with a gradient of 1 was fit to the normalised CL vs normalised SN data.
Outliers of this line, as identified by the ROUT method 26 , are genes that result in an up or down regulation of HRP secretion. The same method was also used to identify outliers from the line of y = 1 for the normalised CL:SN ratio line. The Gene Ontology (GO) Term Mapper was used to classify hits identified into broad categories, using the ontology aspect "process" and ontology category "Generic slim" (GO_Slim) for Homo sapiens (GOA @EBI + Ensembl) proteins 27 . A gene was defined as related to protein transport if it was annotated with any of the GO_Slim annotations: transport (GO:0006810), transmembrane transport (GO:0055085) or vesicle-mediated transport (GO:0016192).

Secondary screen: Golgi morphology
HeLa cells were reverse-transfected as described above, although here transfected in 96-well ViewPlates. At 72 h after transfection, cells were fixed in ice cold methanol for 5 min, washed in PBS, stained with appropriate primary and secondary antibodies and ProLong with DAPI, and observed using an Opera Phenix High-Content Screening System to obtain unbiased confocal pictures. Analysis was performed on CellProfiler 28 (version 3.1.5) to define cells and Golgi, and to measure Golgi intensity, size, shape and granularity. Briefly, cells were defined as secondary objects by propagation from nuclei, using a global "minimum cross entropy" thresholding method. Golgi were defined within a broad pixel diameter range of 2-60 px, and using an adaptive, two-class, Otsu thresholding method. The classifier of CellProfiler Analyst (CPA) 29 (version 2.2.1) was then trained using cells randomly chosen from the whole experiment. These were classified as having aberrant or intact Golgi until the sensitivity reached an acceptable level: 82% and 77% for the intact Golgi and aberrant Golgi classes, respectively; here, 65 cells were used. The CPA classifier was then used to define all cells as having intact or aberrant Golgi.
Transfection with cDNA HeLa cells at ~50% confluency were transfected with a mix of cDNA at 1 µg µl -1 and 0.2% (v/v) TransIT in OptiMEM. Cells were incubated at 37°C, 5% CO 2 for 4 h, then medium was changed to fresh culture medium; cells were then incubated for a further 20 h.

Immunofluorescence microscopy
For immunofluorescence microscopy, cells were cultured on coverslips, fixed with either 4% paraformaldehyde in PBS for 5 min and permeabilised with 0.1% Triton X100 in PBS for 5 min, or fixed and permeabilised with ice cold methanol for 5 min. Cells were then blocked in 10% (v/v) FBS / 1x PBS for 30 min, incubated with primary antibodies for 2 h, washed three times with PBS, and incubated with secondary antibodies for 30 min. Primary and secondary antibodies, and the concentration they were used at for immunofluorescence, are listed in Table 1. Samples were mounted using ProLong Gold antifade reagent with DAPI and observed using a Leica SP8 laser confocal microscope.

Analysis of hits
To find interacting partners of proteins of interest, we used the BioPlex network, a resource of proteins shown to interact by affinity purification-mass spectrometry 30,31 . Default parameters were used to search the network.

Co-localisation analysis
To measure co-localisation between golgin A5 and TGN46, 17 cells were selected from each treatment from confocal micrographs and analysed using the coloc 2 plugin in Fiji (downloaded 25 th May 2018) 32 . For cells with TMEM220 or GPR161 knocked down, only cells with aberrant Golgi were selected from confocal micrographs to be confident that siRNA knockdown was effective. Pearson's coefficient with no threshold was recorded for each cell analysed. Data were tested for normality and equality of variance by the Shapiro-Wilk test and Bartlett's K squared test, respectively. After passing normality tests, data were analysed by one-way ANOVA followed by Tukey HSD post-hoc test between groups. All statistical analysis was performed using R (version 3.5.1).
An earlier version of this article can be found on bioRxiv, DOI: https://doi.org/10.1101/522334.

Results
A genome-wide CRISPR screen revealed novel regulators of glycoprotein secretion We carried out a genome-wide CRISPR screen with the aim of identifying novel genes involved in glycoprotein secretion, using the level of galectin-3 on the surface of live cells to look at glycoprotein secretion via the ER-Golgi pathway. The workflow for the screen is shown in Figure 1A. Suspension HeLa cells stably expressing Cas9 were transduced with the Brunello lentiviral guide (sgRNA) library, an optimised library for the human genome 18 , at a low multiplicity of infection such that the majority of the transduced cells received exactly one sgRNA. After transduction, we performed three rounds of cell sorting by fluorescence activated with increasing selection stringency for low levels of cell surface galectin-3. This method initially allows more cells through cell sorting, then becomes gradually more stringent by decreasing the gating percentage, increasing the number of true positives in later sorted populations. This resulted in the enrichment of two distinct populations with low levels of cell-surface galectin-3 after the third sort, in population N3 ( Figure 1B). As there are lots of cells with very little or no galectin-3 on the cell surface in population N3, the other populations may have more galectin-3, transferred from the extracellular pool, or a higher effective concentration of anti-galectin-3 antibody. A fourth round of cell sorting was used to separate these distinct populations, resulting in populations M4 and N4 ( Figure 1B). We extracted genomic DNA from all populations and carried out deep sequencing to determine the raw sgRNA counts in each population, available in data file S1 (see Underlying data) 22 . MaGECK analysis allowed the identification of hits from population N1 onwards; full MaGECK results are available in data file S2 (see Underlying data) 22 . Hits enriched in populations N1 and N4 are shown in Figure 1C.
The identity of the hits changed as cell sorting progressed. This was particularly true of hits which localise to the ER: in total, 64 hits localise to the ER in population N1, compared to 29 in population N4 ( Figure 1D). As many genes in the ER are involved in critical biological processes, it is likely that these genes were lost in later sorts as cells lacking these genes have a growth or survival defect. For example, many of the enzymes involved in N-linked glycosylation, such as the asparagine-linked glycosylation (ALG) enzymes ALG1, ALG3, ALG5, ALG6 and ALG8, are identified as hits in population N1 but are lost in N4. Conversely, some hits are enriched after multiple cell sorting rounds in N4 but were not identified in N1 ( Figure 1E). As such, we took a selection hits from across all populations forward into the secondary screen. MaGECK-MLE and MaGECK-RRA analysis revealed that there is little difference between the later populations: N3, M4 and N4 (data file S2, see Underlying data) 22 . As we intended to validate hits with further screening, we used a false discovery rate (FDR) of 0.5 to classify hits. This lenient FDR is validated by the presence of hits previously shown to play a role in secretory and glycosylation pathways, such as ribosomal proteins involved in protein translocation into the ER machinery, enzymes involved in N-linked glycosylation, RAB proteins and trafficking protein particle complex (TRAPPC) proteins ( Figure 1E). This shows that our screen was very efficient in identifying previously described regulators of glycoprotein secretion. Additionally, there were many hits identified that did not have a clear link to functions in protein secretion. Therefore, we hypothesised that many genes enriched in our screen with either no characterised function, or with no described function in the secretory pathway, may be new regulators of protein secretion.
An RNAi secondary screen validated hits from the CRISPR screen Due to the lenient FDR used for MaGECK, it was important to validate the hits. Around 400 hits, identified from populations N1 -N4/M4 of the CRISPR screen, were selected for validation by secondary screening. Of these 400 hits, 113 had no annotated function in UniProt; 32 also had no GO annotations. As many of these hits were uncharacterised, they had the potential to represent novel regulators of protein secretion. Twelve positive controls annotated with GO terms related to protein secretion were also included in the screen to validate the statistical analysis. For secondary screening, we used siRNA knockdown of hits in HeLa cells that stably express ss-HRP, which is a glycosylated protein secreted into the extracellular space 33,34 . HeLa-ss-HRP were transfected with arrayed siRNA in a 96 well plate. At 72 h after transfection, we collected the supernatant, lysed the cells and assessed HRP by chemiluminescence.
Knockdown that led to a decrease in the ratio of supernatant luminescence to cell lysate luminescence indicated genes that were important for glycoprotein secretion. Knockdown of 92 of the genes screened resulted in less secretion of HRP ( Figure 2A). Positive controls also resulted in less HRP secretion, as expected (Figure 2A, B). The 92 genes validated by the siRNA screen are listed in Figure 2B and 2C. Of these hits, 55 were not annotated with GO_Slim terms related to secretion or protein transport; 22 of these had no GO_Slim annotations and 6 had no GO annotations at all ( Figure 2C).
Further screening revealed many genes with altered glycoprotein secretion also had a fragmented aberrant Golgi apparatus To further probe how the identified genes affect glycoprotein secretion, the 92 genes identified in the HRP screen were further analysed for changes in Golgi apparatus morphology using the same siRNA knockdown. At 72 h after siRNA transfection, cells were fixed and analysed by immunofluorescence, using an antibody against GM130, a marker of cis-Golgi membranes. Images were collected and classified as having aberrant or intact cis-Golgi using the machine learning platform within CellProfiler, as shown in Figure 3A and 3B (raw results are provided in S6, Underlying data) 22 . Many of the silenced genes resulted in an increase in aberrant cis-Golgi compared to the control (~20%). Genes with more than the median percentage of cells with aberrant Golgi are shown in Figure 3C, and all genes are shown in Figure E1 (see Extended data) 22 . Based on having unknown functions in secretion, five genes from the top half of these hits were chosen to further investigate the mechanism by which these hits lead to a secretion defect: GPR161, TMEM220, FAM98B, FAM102B and MXRA7. We confirmed that the knockdown of these five genes led to a perturbation in the architecture of the Golgi and affected protein secretion. This can be seen by the more diffuse localisation of two Golgi-resident proteins, TGN46 and GM130; by the more diffuse localisation of SEC31, a marker of ER exit sites; and by the defect in the transport of the glycoprotein MHC-I to the cell surface ( Figure 3D).

Subcellular localisation of new regulators of glycoprotein secretion
Given the poor characterisation of GPR161, TMEM220, FAM98B, FAM102B and MXRA7 in the literature and the limited resources available to study their localisation, we obtained cDNA constructs of each of these gene fused to a FLAG-tag, to study the localisation of the protein inside the cells (Figure 4). HeLa cells were transiently transfected with cDNA, then fixed 24 h after transfection. Fixed cells were immunostained for FLAG and either calnexin or TGN46, markers of the ER and the Golgi, respectively. GPR161 localised to the Golgi, as seen by its co-localisation with TGN46 ( Figure 4A), whereas TMEM220 and MXRA7 both partially colocalised with calnexin, indicating that they are found at the ER ( Figure 4B, C). FAM98B was primarily found in the cytosol, with a portion of FAM98B also localising to the nucleus ( Figure 4D). Finally, FAM102B was found both at the plasma membrane and in the cytosol ( Figure 4E). However, it is important to note that the overexpression of these FLAG-tagged constructs may not reflect the endogenous protein localisation, as some dominant  negative effect might happen when using the cDNA construct. We believed this was the case for TMEM220, as the TGN46 immunostaining seems compromised in the transfected cells ( Figure 4B). It was therefore important to validate the localisation of the protein using specific antibodies. We studied the localisation of the endogenous proteins GPR161 and TMEM220 and confirmed that GPR161 localised mainly in the Golgi (Figure 5Ai). This was further demonstrated using siRNA against GPR161; here the Golgi localisation of GPR161 disappeared ( Figure 5Bi). However, endogenous TMEM220 localised to the Golgi (Figure 5Aii), in contrast to the cDNA overexpression showing an ER localisation. This Golgi residence was similarly confirmed using siRNA (Figure 5Bii). It is therefore likely that the overexpression of TMEM220 causes a perturbation in the trafficking of proteins from the ER to the Golgi.

GPR161 is a novel Golgi resident protein involved in protein secretion
Our study shows that GPR161 is a new Golgi-resident protein.
To understand its role in secretion, we used the BioPlex network, a resource of proteins shown to interact by affinity purificationmass spectrometry, to look at its interacting partners 30,31 .
Interestingly, the BioPlex network shows that GPR161 interacts with golgin A5 (also known as golgin 84). Golgin A5 is a coiled-coil membrane protein that plays a role in intra-Golgi vesicle capture 35 and contributes to maintaining Golgi morphology, likely by its interaction with Rab1A 36-38 . We showed that golgin A5 and GPR161 colocalised ( Figure 5C), and the knockdown of GPR161 decreased the colocalisation between golgin A5 and TGN46 ( Figure 5D). This appears to be specific to GPR161, since the knockdown of TMEM220, which we also found localises to the Golgi, resulted in similar perturbation of Golgi, but did not affect golgin A5-TGN46 colocalisation. Therefore, an aberrant Golgi structure alone does not change golgin A5 localisation to the Golgi ( Figure 5D). These data suggest that GPR161 may serve to recruit golgin A5 to the Golgi membrane.

Discussion
Here we identified novel genes involved in glycoprotein secretion using a combination of pooled CRISPR screening and Colocalisation microscopy shows that both (i) GPR161 and (ii) TMEM220 colocalise with TGN46 and GM130, two Golgi markers, and not with calnexin, a marker of the ER. (B) siRNA knockdown of (i) GPR161 or (ii) TMEM220 demonstrates that the antibodies used to detect these endogenous proteins only specifically detect protein at the Golgi. (C) GPR161-FLAG (red) colocalises with golgin A5 by microscopy. (D) Microscopy shows that knockdown of both GPR161 and TMEM220 lead to Golgi fragmentation seen by the TGN46 staining, but only GPR161 knockdown results in less colocalisation of golgin A5 and TGN46. Quantification of this change in colocalisation, measured by Pearson's coefficient, shows that GPR161 knockdown cells have significantly less colocalisation than control cells (***p < 0.001), but there is no significant difference in colocalisation between control cells and TMEM220 knockdown cells (p = 0.302).
siRNA screening. This work expands on our results from a previous CRISPR screen measuring cell-surface galectin-3, which had identified a number of regulators of glycosylation and protein trafficking; however, most of those hits had had well-known functions 14 . Although galectin-3 itself is unconventionally secreted, this type of screen should primarily identify genes responsible for the transport of glycoproteins to the cell surface, as secreted galectin-3 can be transferred between cells 14,39 . Here, we sought to build on the findings of our previous screen by using galectin-3 to identify novel proteins with roles in glycoprotein secretion. We used the improved Brunello sgRNA library 18 rather than GeCKOv2 library, and an altered cell sorting strategy which would allow more hits through cell sorting initially, in order to increase the number of true positives in later populations. The Brunello library is an optimised library for the human genome, which was designed to increase the percentage of active guides while reducing off target effects compared to other libraries 18 . Furthermore, here we analysed populations at all stages of cell sorting rounds performed, rather than only the final sorted population.
Overall, our data suggest that the enhanced cell sorting strategy we employed here does improve results. While there is little enrichment visible by anti-galectin-3 staining in the earlier sorted populations, deep sequencing showed that enrichment had taken place. Many of the hits enriched in the first sorted population, N1, localise to the ER and the plasma membrane; later populations had fewer hits localised to the ER. This is likely due to problems with cell survival; over the extended time period and stress derived from the four sorts, cells with ER defects may be less able to survive. Enrichment of two distinct galectin-3 negative populations was visible by flow cytometry after the third round of cell sorting, in population N3. The most negative population within N3 consisted of cells that had little or no galectin-3 bound to the cell surface but were still able to secrete galectin-3 into the supernatant, so the excess of extracellular galectin-3 could bind to both the mid-negative population and the unaffected population of cells. Combined with the higher effective concentration of anti-galectin-3 antibody during staining, this meant that the unaffected population appeared to have more galectin-3 staining after earlier sorts. Across all sorted populations, the enrichment of many hits that are known to be in secretory and glycosylation pathways provides an initial validation of the success of the CRISPR screen. In comparison to our previous screen, we identified more hits known to have roles in the maturation or secretion of glycoproteins; moreover, we found a much higher number of genes not known to have such roles. These hits may have roles is wider protein secretion; further work is required to investigate this possibility.
Due to the large number of hits identified by the MaGECK algorithms, we selected hits to be taken forward for secondary screening for protein secretion. Via this screening method, we identified 92 hits that result in a secretion defect, of which 55 are not annotated with GO terms related to secretion. The 92 hits validated for secretion represent approximately one quarter of the hits screened, highlighting the importance of secondary screening after a CRISPR screen due to the high chance of false positives. However, the fact that 55 of the validated hits have previously unknown or unclear functions in secretion demonstrates the power of this screening strategy. In comparison to other siRNA screens investigating protein secretion, there is very little overlap with our validated list of 92 hits 40-42 ; however, this is not surprising as many of our hits taken forward for secondary screening were selected based on having an unclear function in protein secretion. Additionally, these screens looked specifically at the early secretory pathway 40 , or started with smaller-scale libraries 41,42 , so an overlap of hits after secondary screening in the different pipelines may not be expected. To further validate these hits, we investigated Golgi morphology. There was a clear increase in the percentage of cells with aberrant Golgi morphology for many of the hits. As siRNA knockdown efficiency was not quantified for each gene, the percentage of cells with aberrant Golgi does not give a definitive ranking of the scale of the effect each gene has on protein secretion. Additionally, while the presence of aberrant Golgi morphology gives a reason for the altered secretion and therefore validates both the CRISPR screen and HRP assay results, it does not explain how each individual gene product contributes to protein secretion. As such, we selected five hits with little known about their function in protein secretion to investigate further. Most of these had very strong phenotypes in both assays, such as MXRA7, but some had a weaker, yet still significant, phenotype, including GPR161; we studied this selection in order to validate the full list of genes as a resource for further investigation.
Among the five hits that we studied for further mechanistic insight, GPR161 and TMEM220 localised to the Golgi, whereas MXRA7 localised to the ER. FAM98B and FAM102B localised to the cytosol or the plasma membrane. Further studies will be required to fully understand how each gene regulates protein secretion. Interestingly, a recent study has shown the importance of the extracellular matrix (ECM) in regulating Golgi organisation and function via the activation of ARF1 43 . MXRA7, a component of the ECM, might play a similar role.
Little is known about FAM98B and FAM102B, but FAM102B shows a clear localisation at the plasma membrane, so it could regulate protein secretion through an identified signalling pathway. FAM98B shows cytosolic and nuclear localisation, and interacts with other such proteins to form a complex involved in shuttling RNA between the nucleus and cytoplasm 44 , suggesting that FAM98B may affect protein secretion by a transcriptional route.
Our further investigation of the two proteins that localised to the Golgi, TMEM220 and GPR161, suggests that they regulate Golgi morphology by different mechanisms. TMEM220 is a transmembrane protein that has been shown to interact with both actin and testis-specific glyceraldehyde-3-phosphate dehydrogenase (GAPDH) in the BioPlex network 30,31 . Actin, and the cytoskeleton in general, is well known to contribute to protein trafficking along the secretory pathway; for example, it is recruited to the Golgi to by a complex of SPCA1 and active cofilin, where it is suggested to form a membrane domain required to initiate sorting of secretory cargo 7,33,45 . It is possible that TMEM220 acts as an anchor for actin on the Golgi membrane. This possibility needs to be further investigated using in vitro experiments. Alternatively, TMEM220 could affect protein secretion via an interaction with GAPDH, which has recently been implicated in protein secretion via the inhibition of COPI vesicle biogenesis 46 .
GPR161 is a G-protein-coupled receptor (GPCR) that is involved in neural tube development and acts as a regulator of cell signalling pathways, including Shh signalling, protein kinase A (PKA) signalling, retinoic acid signalling and Wnt signalling 47,48 . This was a particularly interesting hit as it has recently become clear that GPCRs function at membranes other than the plasma membrane 49 , and previous work has suggested that a Golgi-resident GPCR regulates transport from the Golgi, although the specific GPCR involved remains unknown 50 . Our results showing that GPR161 is both localised to the Golgi and involved in regulation of Golgi morphology and glycoprotein traffic fit with this emerging idea that GPCRs can regulate protein trafficking. The BioPlex network for GPR161 shows an interaction with golgin A5, as well as with five PKA subunits 30,31 ; other work has also demonstrated an interaction between GPR161 and PKA 51 . The interaction with PKA subunits is interesting, as a PKA signalling pathway has previously been shown to regulate retrograde transport from the Golgi to the ER, which indirectly also affects anterograde traffic 52 . However, previous work found that PKA regulatory subunits binding to GPR161 results in its transport to the plasma membrane to signal through PKA 51 . Here, we did not observe any localisation of either overexpressed or endogenous GPR161 at the plasma membrane, suggesting that a different mechanism may be important here. Furthermore, our data suggest that the interaction between GPR161 and golgin A5 is important for maintenance of the Golgi architecture and function. One mechanism for this may be that GPR161 acts to recruit golgin A5 to the Golgi. To confirm this mechanism, further experiments involving mutated forms of GPR161 and golgin A5 will have to be performed, along with higher resolution imaging.
In summary, here we describe an optimised CRISPR screening strategy that successfully identifies new regulators of glycoprotein secretion. Our secondary screening validated 55 hits not previously known to be directly involved in protein secretion; many of these hits also regulate Golgi morphology. We hope that these validated hits can serve as a resource for other researchers investigating protein secretion and Golgi morphology. We also highlight GPR161, a particularly interesting protein given that Golgi-localised GPCRs have recently been implicated in protein trafficking. We find that it is a novel Golgi-localised protein that appears to interact with golgin A5 in order to maintain Golgi structure.  Figure 1C-E) -S3.zip (raw FCS files from flow cytometry staining of sorted samples, used to create Figure 1B).

Data availability
-S4.xlsx (raw chemiluminescence data, used to create

Version 1
28 August 2019 Reviewer Report https://doi.org/10.21956/wellcomeopenres.16623.r36252 © 2019 Stephens D. This is an open access peer review report distributed under the terms of the Creative Commons , which permits unrestricted use, distribution, and reproduction in any medium, provided the original Attribution License work is properly cited.

David Stephens
School of Biochemistry, Faculty of Life Sciences, University of Bristol, Bristol, UK This is an interesting manuscript that applies CRISPR gene editing approaches to screen for components of the glycoprotein secretion machinery. It appears to be carefully conducted with appropriate analysis applied to the screen. I am not an expert in that area and so cannot comment on the design, execution, or analysis of the screen further.
The results are clearly presented and some key "hits" defined. I would have liked to have read more about the overlap between the current work and previous efforts e.g. whole genome RNAi screens in mammalian cells and flies . Particularly given the previous use of ssHRP as a reporter.
Fundamentally the screen is really testing those genes required for localization of galectin-3 to the cell surface, this is not quite the same as a global analysis of glycoprotein secretion. Perhaps validation with other lectins, or of specific glycoproteins, or even an unbiased glycoproteomics approach might have been appropriate -at least for the "hits" that were worked up. Really, I would have liked to see some analysis of other endogenous glycoproteins as well as non-glycosylated secretory proteins. In particular, it is not clear to me whether the effects seen are indeed general or specific to galectin-3.
The analysis of five of the key "hits" is quite preliminary but the work is carefully conducted and nicely presented. This provides some framework for others to build on these data. The link between defects in glycoprotein secretion and Golgi organization are not really explored and this could have been built on to provide more insight.
A fundamental limitation for me is that none of the validation work has been done in any system other than HeLa cells. Given the literature on the role of GPR161 in cilia , one might have hoped to see some work-up done in ciliated cells.
I expect that this will stand alone as a useful data resource and can certainly be built on by others.
Specific comments: RNAi efficacy (important for Figure 5) is not validated anywhere that I could see.
The colocalization analysis is limited. GolginA5 and TGN46 should not overlap significantly as they 1,2,3 4 The colocalization analysis is limited. GolginA5 and TGN46 should not overlap significantly as they localize to distinct parts of the Golgi/TGN (5D), therefore I have doubts over the robustness of this readout. In Figure 5C colocalization of GPR161 and GolginA5 does not appear robust to me and is not quantified. The images shown are low resolution and more detailed analysis is warranted. Nocodazole treatment of cells to scatter the stacked Golgi would be useful here.
The integrity of the microtubule network should be checked as disruption of this could result in altered Golgi morphology (but not likely to secretion per se). The two phenotypes might be unconnected.