Advancing brain barriers RNA sequencing: guidelines from experimental design to publication

RNA sequencing (RNA-Seq) in its varied forms has become an indispensable tool for analyzing differential gene expression and thus characterization of specific tissues. Aiming to understand the brain barriers genetic signature, RNA seq has also been introduced in brain barriers research. This has led to availability of both, bulk and single-cell RNA-Seq datasets over the last few years. If appropriately performed, the RNA-Seq studies provide powerful datasets that allow for significant deepening of knowledge on the molecular mechanisms that establish the brain barriers. However, RNA-Seq studies comprise complex workflows that require to consider many options and variables before, during and after the proper sequencing process. In the current manuscript, we build on the interdisciplinary experience of the European PhD Training Network BtRAIN (https://www.btrain-2020.eu/) where bioinformaticians and brain barriers researchers collaborated to analyze and establish RNA-Seq datasets on vertebrate brain barriers. The obstacles BtRAIN has identified in this process have been integrated into the present manuscript. It provides guidelines along the entire workflow of brain barriers RNA-Seq studies starting from the overall experimental design to interpretation of results. Focusing on the vertebrate endothelial blood–brain barrier (BBB) and epithelial blood-cerebrospinal-fluid barrier (BCSFB) of the choroid plexus, we provide a step-by-step description of the workflow, highlighting the decisions to be made at each step of the workflow and explaining the strengths and weaknesses of individual choices made. Finally, we propose recommendations for accurate data interpretation and on the information to be included into a publication to ensure appropriate accessibility of the data and reproducibility of the observations by the scientific community. Next generation transcriptomic profiling of the brain barriers provides a novel resource for understanding the development, function and pathology of these barrier cells, which is essential for understanding CNS homeostasis and disease. Continuous advancement and sophistication of RNA-Seq will require interdisciplinary approaches between brain barrier researchers and bioinformaticians as successfully performed in BtRAIN. The present guidelines are built on the BtRAIN interdisciplinary experience and aim to facilitate collaboration of brain barriers researchers with bioinformaticians to advance RNA-Seq study design in the brain barriers community.


Brain barriers: terms and definitions
Central nervous system (CNS) homeostasis is ensured by endothelial, epithelial, mesothelial and glial brain barriers that divide the CNS into compartments [1]. CNS barriers allow undisturbed neuronal function within the parenchyma while ensuring immune surveillance at the borders of the CNS.
For the purpose of clarity, we here define some general terms, as they lack a cohesive reference within the brain barriers community. For the purposes of this manuscript: The blood-brain barrier (BBB) is localized at the level of endothelial cells of the CNS microvasculature, which includes capillaries, pre-capillary arterioles and postcapillaries venules. BBB characteristics are not intrinsic to CNS microvascular endothelial cells but rather rely on the continuous crosstalk of cellular and acellular elements around CNS microvessels, which are referred to as the neurovascular unit (NVU). The NVU contains BBB endothelial cells, the endothelial basement membrane with a high number of embedded pericytes and the glia limitans composed of the parenchymal basement membrane and astrocytic endfeet [2]. The blood-cerebrospinal fluid barrier (BCSFB) is composed of epithelial cells surrounding the choroid plexuses (ChP), which extend into the cerebrospinal fluid (CSF) filled brain ventricles (Fig. 1).
The known functions of the BBB and BCSFB include inhibition of free diffusion of molecules from the blood to the CNS while ensuring rapid efflux of toxic metabolites out of the CNS [3]. In addition, both the BBB and BCSFB control immune cell entry into the CNS [4][5][6]. The present study does not include references to the following CNS barriers: The arachnoid mater, which establishes a BCSFB between the dura mater lacking a BBB and the CSF filled subarachnoid space [7]. The pia mater, which is localized at the surface of the brain and spinal cord and embraces the subarachnoid arteries [4]. The glia limitans, which ensheaths the entire CNS parenchyma [4].

Endothelial cells of the BBB are biochemically unique
The BBB endothelial cells are characterized by the presence of molecularly unique, complex and continuous tight junctions, in addition to adherens junctions, lack of fenestrations and a low rate of pinocytotic activity [8,9]. Moreover, BBB endothelial cells express specific enzymes and transporters that allow an efficient transport of nutrients into the CNS and efflux of toxic metabolites out of the CNS [10,11]. Despite these unique biochemical characteristics, endothelial cells of the BBB share some properties with endothelial cells in peripheral microvascular beds. For example, all endothelial cells develop adherens junctions and may express tight junction proteins, but at the BBB, adherens junctions are accompanied by complex and continuous tight junction strands surrounding the entire circumference of the brain microvascular endothelial cells [1]. A better understanding of the unique structural and functional characteristics of the BBB endothelium would significantly improve our understanding of the contribution of BBB impairment in neurological disorders.
Phenotypic characteristics of the brain barriers are ultimately regulated at the transcriptional level. In fact, the analysis of transcription profiles has been a useful tool in biomedical research and has had an increasing impact in the last few decades. Therefore, different research groups have begun to investigate the BBB transcriptome employing different methodologies. For instance, the first RNA-Seq study of CNS cells, including the BBB, was published 10 years ago [12] while a recent study made use of claudin-5-GFP reporter mice to sort GFP + endothelial cells from the brain of mice for subsequent single cell RNAseq (scRNA-Seq) [13]. This study identified zonated transcriptional profiles of brain endothelial cells along the arteriovenous axis [13]. Other studies have employed bulk RNA-Seq of endothelial cells isolated from the brain and peripheral tissues of VE-Cadherin-Cre ERT2 -Rosa-tdTomato mice to isolate endothelial cells from the brain and peripheral tissues in health and disease, including mouse models of stroke, multiple sclerosis, traumatic brain injury and seizures [14]. This molecular profiling has defined core BBB genes expressed by brain endothelial cells that become deregulated in pathology, suggesting potential therapeutic targets common to multiple neurological disorders [14]. These approaches underscore the relevance of transcriptomic profiling of the brain barrier cells to advance our understanding of the molecular pathways underlying brain barriers function and dysfunction.

Progression of transcriptional analysis techniques: from Sanger to next generation sequencing
Throughout the 1980s, Sanger sequencing was used to identify transcripts within tissues and cells, while quantitative methodologies, such as quantitative realtime polymerase chain reaction (qRT-PCR), came into prominence in the 1990s. These methods, usually referred to as low-throughput or 'first-generation' sequencing, are still being used to this day for specific purposes, although they are laborious, costly (for output level) and therefore, not suitable for establishing full transcriptomes of an entire tissue. The new millennium brought high-throughput techniques for transcriptomics analysis, with microarrays followed by next-generation sequencing (NGS) technologies. The most relevant in the present context and the most commonly used NGS technique is RNA-Seq, which allows advances in the characterization and quantification of transcriptomes, including whole transcriptome sequencing in a much less laborious and time-consuming fashion compared to previous methods [15][16][17].
New technologies bring along opportunities that allow for a more in depth understanding of known mechanisms and the discovery of novel pathways. The new challenges and problems must be addressed, and so, the field of bioinformatics with its associated researchers who are highly specialized data analysts, started. In their short history, RNA-Seq methods have seen a sharp decline in costs coupled with the improvement of the underlying technology. This translated to an exponential increase of studies and groups taking advantage of this technology and of the amount of large datasets produced and published (Fig. 2).

Transcriptome analysis and brain barriers: challenges and manuscript objectives
A major challenge when aiming to compare transcriptome profiles from a given cell is understanding the source of the cell and how it was isolated. In the present context it is important to highlight that we found discrepancies in the protocols used for isolating brain endothelial cells to be common but remarkably underreported. The inaccuracy begins with a lack of consensus in the nomenclature for the different CNS vascular segments isolated and analyzed, with some laboratories referring to brain microvessels when isolating pure capillary fractions Fig. 1 The blood-brain barrier in the context of the neurovascular unit and the blood-CSF barrier. The blood-brain barrier (BBB) is located within the neurovascular unit (NVU, left scheme) at the level of the brain parenchymal microvasculature and composed of endothelial cells tightly connected by unique tight junctions. It separates brain parenchyma from the peripheral blood. Endothelial cells produce a basement membrane in which pericytes are embedded. Astrocyte endfeet closely contact the microvessels and astrocytes lay down the parenchymal basement membrane. The choroid plexus (ChP) stroma is separated from the CSF space by the blood-CSF barrier (BCSFB, right scheme), which is composed of ChP epithelial cells tightly connected by apical tight junctions. The apical side of the epithelium faces the CSF, while the basolateral side resting on an epithelial basement membrane faces the ChP stroma. The ChP stroma is highly vascularized with blood vessels lacking a BBB and populated by immune cells. The endothelial cells produce their own endothelial basement membrane and others referring to capillaries when in fact the isolated microvessels are comprised of a mixture of arterioles, venules and capillaries. Considering the reported zonated gene expression of endothelial cells along the CNS vascular tree [13], transcriptome profiling studies performed on the BBB can hardly be compared, as most of the published studies lack an in depth description of the CNS endothelial isolation procedures.
To unveil the full power of transcriptome profiling it is, thus, essential to have a solid intersection in the fields of transcriptome profiling, bioinformatic analysis and classical brain barriers research. In this manuscript we highlight the intersection of transcriptomic profiling (with an emphasis on RNA-Seq) and the field of studying the brain barriers (with an emphasis on the endothelial BBB and the epithelial BCSFB). We start by addressing considerations to be taken into account for the overall experimental design, and then elaborate on the multiple and essential intermediate steps throughout the workflow, including comparing different BBB isolation methodologies for RNA-Seq, data analysis and publishing recommendations. It is not our intention to establish rigid rules on how to perform an RNA-seq study in the field of brain barriers. Rather, our aim, based on our collaborative approaches in BtRAIN, is to raise awareness of the relevance of each experimental step and to highlight the relative strengths and weaknesses of the available alternatives. We then summarize what we consider essential information to be included in original manuscripts describing RNA-seq to define BBB signature genes in health and disease. We are convinced that appropriate availability of information will improve comparability and reproducibility of the different studies and thus advance quality and cost-efficiency of these studies in the field. By setting the stage for datasets that allow for metaanalysis-based research, our suggestions will furthermore allow for the implementation of the 3R rules of experimental animal research by reducing and refining animal experimentation.

Considerations for the experimental design of a BBB RNA-Seq study
Experimental design is possibly the most important step of any transcriptomic experiment as the success of the project heavily depends on the choices made at this early stage. The first step is to have clear and defined goals. Questions that should be addressed before starting an experiment include: (i) Is the intent of the experiment to specifically define the transcriptome of the brain endothelial cells along the entire vascular tree or rather solely of BBB endothelial cells in CNS microvessels or even capillaries? (ii) Is the aim to compare the transcriptome of brain endothelial cells at different stages of e.g. development or under specific pathological conditions? (iii) Is the intent to define the transcriptome of a specific tissue (i.e. ChP epithelial cells vs kidney epithelial cells), a specific time point (i.e. embryonic vs post-natal BBB development) or a specific pathological condition (Multiple Sclerosis vs Alzheimer's Disease)? (iv) What are the There are also extrinsic factors that influence experimental design in the form of practical limitations. They are (i) biological sample availability, e.g.: human CNS tissue is sparse and may not be obtained in the required quality to allow for RNAseq analysis. (ii) costs, e.g.: pre-sequencing optimization costs as well as sufficient sequencing of samples per group to the required depth. (iii) time, e.g. time required for breeding experimental animals to obtain the required brain barriers genotype, time for protocol optimization of the tissue of interest (be it BBB, NVU, whole cortex or others) isolation protocol and validation of the results and iv) human resources, as a transcriptomics project might involve several scientists, from the principal investigator to wet lab researchers and technicians, sequencing facility technicians and bioinformaticians (Fig. 3).

Guidelines
• Define a clear goal for the transcriptomics study: What is the specific intention for the project. • Consider both extrinsic and intrinsic factors when designing an experiment. • Plan the experiment including the advice from the experts involved in the different steps, such as sequencing facility staff and bioinformaticians.

Vascular heterogeneity in the CNS to be considered when characterizing the BBB transcriptome
The vasculature is heterogeneous throughout the CNS [18] (Fig. 4). This heterogeneity is reflected in the transcriptome, and therefore should be considered prior to isolating CNS microvessels for a transcriptomic study of the BBB. Two main factors are the capillary density and the BBB properties, which may in addition be affected by age, sex or the pathological conditions investigated. Capillary density is related to the metabolic demands, and thus neuronal activity, of the respective CNS regions [19]. The gray matter (GM) of the cerebral cortex harbors many neuronal cell bodies and is therefore more metabolically active than the white matter (WM), where the myelinated axonal fibers run. Thus, GM harbors a higher density of capillaries when compared to WM [20]. In addition, there are specific regions in the CNS, such as the hippocampus, characterized by a remarkable heterogeneity in capillary density [21].
The cellular and molecular characteristics of the NVU components are also heterogeneous throughout the CNS. Endothelial cells of the BBB present some of the highest regional differences. Indeed, expression of endothelial junction proteins (occludin, claudin-5 and a-catenin) is higher in the WM compared to the GM [22]. In the blood spinal cord barrier, the endothelium is less tight and is characterized by a lower pericyte coverage [18]. Astrocytes also show heterogeneity along the brain vasculature, including higher expression of glial fibrillary acidic protein (GFAP), an intermediate filament, in WM relative to GM [22,23]. In contrast, the expression of aquaporin-4 (AQP4), a water channel localized at astrocyte endfeet, is more homogenous throughout the perivascular glia limitans [22,24]. Moreover, endothelial cells do not form a BBB throughout the whole CNS. Particularly microvessels within the circumventricular organs (CVOs) lack BBB properties. CVOs are localized around the brain ventricles and fulfill neurosecretory and neurosensory functions. The CVOs include the subfornical organ, the vascular organ of the lamina terminalis, the area postrema, the median eminence, the neurohypophysis, the pineal gland, and also the ChPs (Fig. 4). CVOs contain fenestrated microvessels that allow for the free diffusion of blood components into the CVO stroma. Thus, co-isolation of microvessels from the CVOs should be avoided when aiming to specifically analyze the BBB transcriptome [25].
The vascular tree presents gradual phenotypic heterogeneity, a phenomenon known as zonation, accompanied The research question will guide the initial experimental setup based on intrinsic factors. Then, extrinsic limitations should be taken into account to adjust and refine the overall design by transcriptional differences [13]. Organization of TJs, rate of pinocytosis, expression of enzymes such as alkaline phosphatase, Na + /K + ATPase, expression of transporters or efflux pumps or of adhesion molecules are not the same in endothelial cells of brain arterioles, capillaries and venules (Table 1) [26], in line with the different functions of these vascular segments. In addition, the mural cell subsets in these microvascular segments differ, with smooth muscle cells embracing arterioles and, to a lesser degree, venules, while pericytes are highly concentrated at the level of the capillaries [27,28] (Fig. 5).
Vascular heterogeneity can also be induced by CNS pathology, e.g. neuroinflammation, neurodegeneration or brain tumors, which may in fact lead to focal alterations in the NVU associated with the CNS pathology. This may range from changes in cellular composition of the NVU, e.g. pericyte drop during stroke, perivascular accumulation of inflammatory cells in multiple sclerosis or to alterations in vessel diameters as observed in brain tumors. All these alterations will affect the outcome of established brain barrier cell isolation protocols with respect to purity of the brain barrier cells as well as RNA stability.

Guidelines
• Consider CNS capillary density of the region of interest to obtain enough RNA yield: capillary density is higher in GM compared to WM. • Consider regional differences in BBB properties.
Clearly indicate from which CNS region microvessels, capillaries or endothelial cells were isolated.
• Consider if you wish to analyze the transcriptome of endothelial cells from a specific vascular segment or a specific region of the CNS. • Consider general or regional alterations in BBB properties when studying CNS pathologies. Clearly indicate from which CNS region and at what disease stage the microvessels, capillaries or endothelial cells were isolated.

Regional heterogeneity among the four choroid plexus to consider for analyzing the BCSFB transcriptome
The choroid plexus (ChPs) protrude into each of the brain ventricles, and thus there are two lateral (telencephalic) ChPs, one in the third ventricle (diencephalic) and one in the fourth ventricle (hindbrain or myelencephalic). There is increasing evidence that the gene expression profile of each ChP reflects their positional identities. The mouse lateral and fourth ventricle whole ChPs present a differential transcriptome and secretome, as assessed by RNAseq and mass spectrometry [41]. A recent single nucleus RNA-seq study revealed a unique cellular composition in each of the mouse ChPs [42]. Of note, the more regionalized cell types were epithelial cells and fibroblasts, while ChP endothelial cells were found to be more homogeneous across the ventricles.
Although most transcriptomic studies have focused on the lateral ChPs, the choice of ChP will influence the sequencing results and the comparison with available datasets. The heterogeneity of the ChP among Regional differences in the brain microvasculature. Schematic representation of a brain sagittal section (left) and a spinal cord transverse section (right). Capillary density is higher in the CNS gray matter than in the white matter, according to their differential metabolic activity. The white matter of the corpus callosum is highlighted. The microvessels in the circumventricular organs (CVOs, highlighted in blue) lack BBB characteristics, rather they are fenestrated and thus permeable to blood components. CVOs include the subfornical organ, the vascular organ of the lamina terminalis, the area postrema, the median eminence, the neurohypophysis, the pineal gland, and the choroid plexus the four ventricles in the human brain remains to be characterized.

Guidelines
• Consider regional transcriptional heterogeneity among the four ChPs. The choice of ChP should be stated in the methods.

Brief recommendations on how to select a reporter mouse
Many genetically modified mouse models have been developed for studying brain barriers development and function. For a general overview on the available genetic tools that are used to study the BBB function, we recommend the review of Sohet et al. [43], while for a study involving a solid report in the BCSFB we recommend the reading of Johnson et al. [44]. In the context of this manuscript the brain barriers reporter mouse lines that allow to distinguish brain barriers endothelial or epithelial cells from other cells of the CNS by means of expression of a fluorescent reporter are of specific interest. While many of those mouse lines have been developed for imaging purposes they also allow for sorting of the cells of interest from the CNS based on expression of their fluorescent reporter [12,13,45]. For the purpose of this manuscript, we will simply mention some recommendations and possible pitfalls while using genetic mouse models in RNA-Seq studies in the brain barriers field. Before any experimental approach, a deep understanding of the genetic mouse model that is used is needed. To this end, online tools such as http://www.infor matic s.jax.org/ can be used to have a detailed overview of the mouse line of interest. Original literature on how the mouse line has been created and the expression pattern of the respective reporter needs to be carefully evaluated. This includes consideration of the promoter used for driving the expression of the fluorescent reporter with respect to cellular specificity or efficiency of expression, which could depend also on the age or disease state investigated. Furthermore, inducible expression systems, e.g. based on Cre-recombinase or TET-ON or TET-OFFregulation need to be tested for their specificity, leakiness and completeness of driving reporter expression [46,47]. Specificity and intensity of the brain barriers reporter expression should always be tested in house prior to using the respective mouse model for sorting of brain barriers cells.

Considerations for protocols on the isolation of CNS microvessels, capillaries or single endothelial cells
A RNA-Seq study of CNS microvessels, capillaries or single BBB endothelial cells necessarily relies on the protocol used to isolate the target tissue or cell. Due to the highly Transferrin receptor (TFRC) -+++ [13,37] Alkaline phosphatase +++ +++ [29] Mg 2+ -ATPase +++ + [29] 5′-nucleotidase +++ + [29] γ-Glutamyl transpeptidase (GGTP) ? +++ [38,39] Bidirectiona/l vesicular horsedish peroxidase transport +++ + [29,40] 5 Heterogeneity in microvasculature diameter determines the outcome of filtration steps for BBB enrichment. a The CNS vascular tree from arteries to veins. An indicative range of vessel diameter for each vascular section is provided, along with other cell types that may be co-isolated. Arteries and veins have a diameter > 100 μm, arterioles and venules from 100 μm to 50 μm. The brain microvasculature has a diameter smaller than 50 μm and consists of pre-capillary arterioles, capillaries and post-capillary venules. Capillaries are generally considered those with a diameter < 10 μm and often show diameters of about 5 μm. APC: Antigen-presenting cell. b Enzymatic digestion and mechanical disruption during CNS vascular isolation protocols alter the physical properties of the microvasculature fragments, influencing the downstream steps of the isolation protocol, particularly size dependent filtration across nylon membranes. Small vascular fragments obtained by mechanical disruption can allow for undesired vessels to pass through the filter. Enzymatic digestion leads to swelling of the blood vessels, prohibiting their elution through the filter complex structure of the NVU and the aforementioned heterogeneity of NVU components throughout the CNS vasculature, a detailed description of the protocol used to isolate the material for RNA-seq analysis is mandatory. Each step of the isolation protocol is critical and has a direct impact on the sample purity as well as on the yield of the material obtained for sequencing. To date, many different protocols for the isolation of endothelial cells of the BBB, or CNS microvessels, or brain endothelial cells have been published. Most classical isolation protocols consist of a combination of mechanical disruption and enzymatic digestion and a subsequent size selection of the isolated vascular segments by filtration through nylon membranes with different pore sizes (Fig. 5). Additional enrichment of the respective vascular segment is usually achieved by density gradient separation [48][49][50]. More refined techniques can be incorporated to increase purity, such as selection with antibody-coupled magnetic beads or FACS, or taking advantage of fluorescent reporter mice [51], as discussed above.

Tissue disaggregation: enzymatic digestion versus mechanical disruption
The first step for the isolation of CNS microvessels or endothelial cells of the BBB is to properly slice or cut the tissue into small pieces in order to facilitate enzymebased digestion or the following Dounce homogenization steps [52]. Typical techniques of brain tissue disaggregation are either enzymatic or mechanical. Enzymatic digestion is often performed using a combination of enzymes such as collagenase and dispase as well as DNAse. Mechanical disruption, on the other hand, is usually performed using Dounce homogenizers with different loose pestle sizes depending on the amount of tissue and the selected species [53,54]. This ensures tissue loosening by shear forces without affecting cell viability. Mechanical techniques may prove to be more effective while at the same time may be too harsh depending on certain factors, such as CNS region or age of the individual (i.e. aged tissue is more susceptible to damage). Therefore, the choice of technique for tissue disaggregation can influence the rest of the protocol and should be described in detail in the methods of the study.

Microvessel hierarchy selection depending on size: filtration steps
Filtration of the dissociated tissue or CNS vasculature fragments enriches for a specific component from the brain homogenate. One of the most common methods for the isolation of CNS microvessels or specifically capillaries is performing either single or a series of filtrations through nylon or polyester membranes followed by gradient centrifugation steps with Percoll ™ , Dextran or serum albumin, in order to separate microvessel fragments from cellular debris, myelin and other non-desired cell types [55][56][57][58].
The enrichment of a certain type of CNS microvessel (i.e. capillaries) over the others depends on their diameter and can be achieved by using different filter pore sizes. Therefore, the combination of larger (~ 100 μm) to smaller (~ 20 μm) filter pore sizes, in addition to using one or several filters in sequence, can determine the final vascular segment that is isolated [59]. Indeed, the choice of meshes should take into account the different vessel caliber of the CNS vasculature tree (Fig. 5a). Generally speaking, arteries have a diameter ≥ 100 μm, arterioles and venules between 100 and 50 μm, post-capillary venules and pre-capillary arterioles between 50 and 10 μm, and capillaries are considered to have a diameter ≤ 10 μm [60][61][62]. The diameter of the arteries decreases from the surface of the brain towards the deeper regions [63]. Moreover, variability in brain artery diameter between different mouse strains has been observed [64], underscoring the necessity of a detailed description of the source of sequencing material. In humans, CNS vessel diameters are affected by the health status of the donor [65] while in rat age was shown to contribute to reduced capillary diameter in the brain stem [66].
Although size selection represents a possible choice for defining the CNS microvascular segment that is isolated, some technical details have to be kept in mind. Other steps of the protocol can highly influence the physical properties of the isolated microvessels or capillaries (Fig. 5b). For instance, mechanical disruption reduces the length of the vasculature fragments and therefore might impact size selection. Enzymatic digestion causes swelling of the microvasculature fragments, increasing their diameter. Due to these technical aspects, the researcher should use the size filtration as a guideline, and empirically determine the exact fraction of the vascular tree that has been purified at the end of the isolation protocol (visually analyzing the vascular fragment). Alternate tools that help achieve this validation include publicly available scRNA-Seq datasets of different BBB cell types [13]. These datasets collect information about specific cell markers that can be used to complement the isolation. However, it should be clear that expression of a few chosen markers might not successfully identify specific microvascular segments and that gene expression differences along brain endothelial cells conform to a gradient rather than discrete segments.
In sum, size selection is a critical step in the isolation protocol of microvessels, capillaries or individual endothelial cells of the BBB, and ambiguity in terminology should be avoided by accompanying qualitative and quantitative information.

Guidelines
• Tissue processing methods can affect size selection.
Indicate the method used and, if possible, the state of the tissue after dissociation. • Indicate the size-dependent filtration steps, if used, including pore size of the filter mesh and the combination of more than one filter. • For animal studies, indicate the strain, age and sex of the animals used. For reporter mouse studies indicate the precise mouse line used as indicated on the MGI homepage (http://www.infor matic s.jax.org). • For human studies, provide age, sex and relevant clinical information of the individuals.

Strategies for BBB purity refinement: selection with antibody-coupled magnetic beads
Selection via magnetic beads coated with specific antibodies represents a useful and precise method to isolate microvessels or endothelial cells of the BBB. This technique can be used for positive selection of the material of interest or for negative selection of possible contaminants, alone or combined. For example, to purify endothelial cells, positive selection can be achieved by using beads coated with anti-CD31 and/or VE-cadherin antibodies [67,68]. On the other hand, beads coated with antibodies against CD68, PDGFRβ, NG2 or GLAST might be used to specifically select macrophages, pericytes and astrocytes respectively [69][70][71][72][73], or to deplete these cells in those cases where a pure endothelial fraction is required. Positive and negative selection may be combined in order to improve the specificity of the technique. Despite the high selectivity and improved final purity that the bead-mediated selection offers by targeting specific cell types, some disadvantages need to be considered. First, the state of the vessel suspension and the physical interaction between different BBB components is a critical factor. During the isolation protocol, the brain vessels are not revealed in a single cell suspension but rather as vascular fragments that consist of tightly connected cell types, such as endothelial cells, pericytes and astrocyte endfeet [59]. Prior to bead selection, additional disaggregation steps, including enzymatic or mechanical disruption, might aid in obtaining a higher fraction of single cells versus microvessel fragments by weakening the interaction between different cell types, which may enhance the disassociation and determine the outcome of the isolation [54]. The close interaction between the different components of the BBB [74,75], makes a total single cell suspension from CNS microvessels a challenging task, ultimately limiting the availability of binding sites for the antibodies and influencing the isolation efficiency if not performed properly. An additional aspect to consider is that the enzymatic digestion may influence the surface expression of receptors which could be internalized or lost by shedding or affect presence of surface epitopes and thus ultimately cell surface expression of potential antigens chosen for positive selection. Therefore, if the digestion and disaggregation processes are not fully achieved, this may lead to low yields despite the high purity, which must be taken into account when performing sequencing analysis since it may influence the downstream procedure. Also, extended purification protocols aiming to reach single brain endothelial cell suspensions bear the risk of inducing changes in gene expression in the endothelial cells due to loss of the tight junction interactions, as cross-talk between mature cell-cell junctions and the nucleus are well established.
Another important factor is that the selection is based on generally accepted markers for the cell population of interest, with the above-mentioned limitations. In addition, heterogeneity in marker expression along the brain microvasculature might influence targeting efficiency. Therefore, it is good practice to refer to the most recent studies that better define different cell populations of the brain vasculature [13,76], in order to improve the targeting strategy and the selectivity of the technique.
When using magnetic beads in a positive selection it is also important to know how, or if, to separate cells from the beads afterwards. A step that might become essential is when the isolated fraction is intended to be used in cell culture. According to the first protocols using this method of isolation, incubation of the cells in trypsin/ EDTA at 37 °C releases the beads once the selection is achieved [77]. Improvements in this technique allow for the establishment of procedures which require a less aggressive approach or even do not require detachment of the beads following isolation [78], since they do not affect growth nor survival of isolated cells. Therefore, also depending on the brand, some beads have been shown to detach spontaneously from the cultured cells after several days, whereas other beads might need a DNAse treatment to break the DNA chain that attaches the microbead to the antibody [52,70]. In any case, most of the currently used magnetic beads are completely suitable for subsequent analysis.

Guidelines
• Clearly indicate the antibody used to coat the beads and the rationale behind the choice; refer to recent publications (if possible) to define the cell population that will ideally be targeted. • Clearly indicate the amount of material obtained after bead selection, including number of isolated cells and amount of RNA extracted from them; this is useful information for the study itself and for future reference. • Clearly indicate the necessity of separating beads from cells or not, if choosing positive selection of the target cell.

Specific separation of microvessel cell populations: fluorescence-activated cell sorting (FACS)
FACS is a powerful technique that uses flow cytometry to selectively separate cell populations from complex pools of different cells. FACS-based selection has been used in transcriptome profiling studies of the brain barriers [13,14,79]. Interestingly, these techniques include high purification of cells expressing fluorescent reporters in transgenic mice, however, any cell that expresses the construct will be selected, introducing possible contamination in the downstream analysis. For example, FACS has been successfully used to enrich brain endothelial cells isolated from claudin-5-GFP mice [13] or tamoxifen-treated Rosa-tdTomato; VE-Cadherin-Cre ERT2 mice [14]. Alternatively to fluorescent reporter mice, other studies have achieved isolation of brain endothelial cells via FACS by antibody staining prior to sorting, using fluorochromeconjugated antibodies against CD31 [80,81] or by a combination of antibodies against CD31 and CD13 to isolate endothelial cells and pericytes, respectively, from different microregions in the mouse brain [82].
In any case, the final sample after FACS consists of a highly enriched fraction of the cell type of interest. However, a high amount of starting material is often necessary to obtain a sufficient yield after the sorting, although it will also depend on the needs of the downstream application and/or analysis [83,84]. Also, FACS sorting may induce an oxidative stress response in the endothelial cells that needs to be considered.
As already mentioned in other sections of this paper (e.g.: beads selection), obtaining a viable single cell suspension is also crucial when isolating cell populations of the brain barriers prior to FACS. Similar to the beads selection, a combination of mechanical and enzymatic digestion prior to FACS is often used to improve cell-cell dissociation. Indeed, a good single cell suspension reduces the amount of false positives/negatives produced by antibody staining (when not relying on reporter mice) or reporter proteins prior to FACS. In line with the necessity of a single cell suspension, duplet exclusion should be tightly controlled to ensure the best purity and reliability of the sorted material, as well as a strict gating strategy according to the experimental needs [85]. In addition, abundance of the population of interest is also critical, e.g.: if the level of endothelial cells is lower than 15% of the total, the sorted cells might not be viable. A density gradient enrichment before performing FACS could potentially solve this issue [51]. In general, FACS poses some technical challenges and a fine balance between time needed for the sorting, quality of the starting material and viability of the sorted cells needs to be experimentally tested.

Guidelines
• The flow conditions and the instrument used should be indicated. • The precise scatter and fluorescence gating strategy used for the FACS of the target cell should be included in the supplement of the research article, as well as a detailed description of the isolation protocol and potential staining steps performed prior to the sorting and their duration. • Duration of sorting itself and yield of cells received should be described. • Time of RNA extraction following the FACS should be clearly stated, as different experimental designs implement differences in time points, e.g.: extraction of RNA right after FACS of after several hours due to travel from the sorting facility back to the laboratory.

Laser capture microdissection (LCM)
Laser capture microdissection (LCM) allows for the dissection of CNS microvessels from a CNS tissue section with the help of a microscope and a laser. Dissected CNS microvascular endothelial cells can be later captured by adsorption, ejection, gravity or aspiration. LCM permits to take a snapshot of the transcriptomic profile of the BBB, in opposition to methodologies that require long incubation times. One of the main limitations of LCM is the low yield of this laborious technique, which can be circumvented by using kits designed to isolate RNA from small amounts of cells [86] or by performing rounds of RNA amplification prior to downstream analysis [87]. However, using LCM to capture small cells, such as BBB endothelial cells, may be challenging, and contamination from astrocytic endfeet and/or pericytes is a major concern. In order to improve the cellular purity of the preparation, thinner sections can be used, thus decreasing the chances of including cells above or below the plane of interest. Alternatively, LCM on cross sections of vessels also provides better purity than longitudinal sections, although lower yield [88]. To aid the visualization of endothelial cells, rapid immunohistochemistry may be coupled to LCM [89], in a technique known as immuno-LCM. The reproducibility of immuno-LCM to study BBB gene expression in mice has been demonstrated [90]. This technique was further validated in human postmortem frozen [87] and Formalin-Fixed Paraffin-Embedded (FFPE) brain sections [89].

Guidelines
• Consider the balance between yield and purity of the isolated CNS microvascular endothelial cells when deciding to use cross sections or longitudinal sections. • It is recommended to use RNA isolation kits specifically designed for small amounts of cells, or performing RNA amplification prior to sequencing. • Test for and consider the possible cellular contaminants co-isolated with the BBB endothelial cells.

BBB in vitro models: cultured primary brain endothelial cells and brain endothelial cell lines
Isolated brain microvascular fragments or single cells can be cultured and used as a BBB endothelium in vitro model. Most of the in vivo BBB characteristics are maintained by primary cultures of brain endothelial cells, hence representing powerful tools to study various aspects of BBB properties. However, often these cultures offer restricted capacity of genetic manipulation (e.g. transfection) and can be maintained in culture for a limited amount of time and/or passages. Nevertheless, numerous primary cultures of brain endothelial cells have been established from both mouse [58], rat [91] and human [92,93] brain. On the other hand, BBB in vitro model established by immortalized cell lines allow for much easier handling, as in many cases the cells can be cultured and passed as needed, in addition to much better tolerance to genetic manipulations. This makes cell lines a very suitable tool for high-throughput screening purposes, as they are also a much more homogenous cell population compared to primary cell cultures, where often contaminants are found in the culture. As a major drawback, cell lines of the BBB endothelium do not strictly retain BBB characteristics such as high tightness and very low permeability to the same degree as primary brain endothelial cells, therefore careful selection of the best suited BBB cell line is needed according to the specific scientific question being answered.
In both cases, the presence of additional BBB cell types found in vivo is not always modeled in the in vitro systems, such as for example the presence and anatomical disposition of the astrocytic endfeet found in the NVU in vivo. To overcome these limitations, sophisticated coor tri-cultures of brain endothelial cells together with pericytes or astrocytes from different sources have been established and, to a certain degree, mimic the in vivo NVU structure [94][95][96].
In the context of transcriptomic studies, both BBB cell lines and primary brain endothelial cells cultures have been successfully used in transcriptomic approaches. For example bulk RNA-seq has been performed on the human cerebrovascular endothelial cells (hCMEC/D3) [97] and on primary mouse brain microvascular endothelial cells (pMBMECs) [79]. Interestingly, a comparative microarray analysis between freshly isolated or cultured pMBMECs with the endothelioma cell line bEnd.5 has highlighted important changes in the mRNA levels of genes associated with BBB characteristics [69].
Recent advancements in stem cell technology have furthermore allowed derivation of human in vitro models of the BBB from stem cell sources including human cord blood-derived stem cells of circulating endothelial progenitors [98] and human induced pluripotent stem cells (hiPSCs; summarized in [99]). hiPSCs derived from one individual opens the entirely novel opportunity to study BBB dysfunction from individual patients as their hiPSCs provide a scalable and renewable source for establishing brain microvascular endothelial cells. The presently available hiPSC derived in vitro BBB models are very well characterized with respect to their barrier properties and expression of BBB specific transporters and efflux pumps [100][101][102]. At the same time RNA-Seq analysis has shown that hiPSC derived brain microvascular endothelial cells, as most hiPSCE-derived cells, do not fully recapitulate all aspects of the BBB [98]. Present hiPSC derived in vitro BBB models, e.g. still lack expression of the full array of trafficking molecules required for immune cell interaction with the BBB.

Guidelines
• Consider the effect of culture-induced mRNA expression changes in the in vitro BBB models due to medium composition. • Use RNA-Seq profiling of in vitro BBB models and especially of hiPSC-derived in vitro BBB models to benchmark them against the BBB in vivo.

Considerations for isolating the entire choroid plexus vs choroid plexus epithelium
The ChP consists of a highly vascularized stroma populated by immune cells and is surrounded by a layer of highly specialized epithelial cells which form the BCSFB. Contrary to the BBB, the ChP endothelium is fenestrated and does not form a BBB [103]. ChP transcriptomic studies of the entire ChP tissue will include the transcriptome of the epithelial cells forming the BCSFB but also from the endothelial cells, stromal fibroblasts and immune cells of the ChP. Alternatively, the ChP epithelial cells can be isolated to focus on the transcriptome of the BCSFB.
Using the whole ChP greatly simplifies the protocol for tissue isolation, but the cellular heterogeneity within the ChP will complicate subsequent analysis and interpretation of the results especially when performing bulk RNAseq studies. However, these studies will provide additional information on the other components of the ChP, such as the vasculature or immune populations. Many RNA-seq studies have taken this approach, particularly those focused on humans [104,105].
Alternatively, if the barrier component of the ChP is the focus of the study, the epithelial cells can be isolated. While this results in cleaner data, dissociating the ChP may be challenging (see below). To overcome these difficulties, the novel single-nucleus RNAseq method emerges as an option for tissues that are hard to dissociate, such as the ChP [42].
The research question and the technical limitations will determine whether the whole ChP or the isolated epithelium will be sequenced.

Methods for isolating the choroid plexus epithelium
The techniques for isolating the ChP epithelial cells are similar to those for the BBB. Of note, the ChP epithelium is composed of large cuboidal cells, which are easier to dissect microscopically with LCM compared to the thin BBB endothelium. Indeed, LCM has been used to isolate the human ChP epithelium for microarray studies of the BCSFB [106][107][108]. In animal models, mechanical disruption of the entire ChP surgically removed from the brain ventricles is typically combined with enzymatic digestion. In order to release epithelial adherens and tight junctions as well integrin mediated adhesive contacts to the epithelial basement membrane, calcium removal is recommended, for example by using the chelator EDTA [109,110] or calcium free medium [111], but is not an essential requirement [112]. Further purification can be achieved by FACS using an epithelial marker such as TTR [41]. However, expression of TTR has been recently identified in ChP macrophages [113], and the choice of markers should be done with awareness of their limitations. To our knowledge, the only human primary epithelial cells are those commercially available (ScienCell), and no isolation protocol has been published to date.

Guidelines
• The research question and the technical limitations will determine whether the whole ChP or the isolated epithelial cells will be sequenced. This should be specified in the methods. • The ChP can be particularly challenging to dissociate. The techniques for isolating the ChP epithelium include LCM, mechanical and enzymatic digestion.
Resulting purity should be assessed and reported.

Pre-sequencing tissue or cell purity assessment
Before performing RNA-seq, it is good practice to ensure that the chosen isolation strategy results in the desired brain barrier cell purity. Indeed, knowledge about the degree of brain barrier cell purity is essential as it dramatically reduces possible biases in the downstream analysis, overall improving the biological meaningfulness of any RNA-seq study. Pre-sequencing purity assessments can be achieved by different techniques, often used in combination. Common contaminants when isolating endothelial cells of the BBB are pericytes or pericyte fragments and astrocytic endfeet, which can hardly be avoided (Fig. 5, Table 1). These contain RNA and are thus readily detectable by assessing expression by qPCR of specific markers such as platelet-derived growth factor receptor beta (PDGFR-β) or GFAP, respectively. Immunostaining provides information about the location of the probed proteins while flow cytometry allows for quantitative detection of the contaminants with higher sensitivity, although it requires a significantly higher number of cells compared to immunofluorescence imaging. Therefore, qPCR can be used in combination with immunofluorescence imaging in order to estimate the purity at both the RNA and protein level. All of the techniques depend, to differing degrees, on described cell markers ( Table 2). Another potential source of contaminants when isolating endothelial cells of the BBB are ChP cells, in particular the ChP epithelium. Indeed, in the vast majority of the preparation the ChP is not removed from the processed material, therefore potentially accounting for contamination. Performing qPCR for choroid plexus specific markers such as transthyretin or keratin-8 will allow to determine the presence of ChP mRNA in the brain endothelial preparation. For example, low expression of claudin-3 mRNA has been reported in freshly isolated brain microvessels, despite recent evidence proving lack of claudin-3 expression in the mouse brain microvasculature. This could be due to contamination of the isolated brain microvessels with ChP epithelial cells, which express claudin-3 [79].

Isolation and purification of RNA from BBB endothelial cells or microvessels
RNA isolation methods have to be chosen depending on the type and availability of starting material, on the one hand, and the intended RNA-seq analyses, on the other (Fig. 6). The use of inappropriate RNA isolation methods can result in low quantity and/or quality of RNA and consequently in less accurate and irreproducible results or even in complete failure of the analysis [150]. Contaminations with extrinsic RNA and DNA, or with nucleases that might lead to the degradation of RNA samples can have a negative impact on the results. General measures to avoid these issues include thorough and regular cleaning of work areas and equipment with decontamination solution as well as use of clean gloves, aerosol barrier pipette tips, and DNAse and RNAse-free plasticware. Additionally, it is recommended to carefully handle RNA samples at the temperature suggested by the manufacturer of the isolation kit [151].
Commercially available RNA isolation protocols generally follow two main steps: (i) sample lysis, homogenization and clearing, and (ii) RNA purification. Isolation kits and protocols must be chosen according to the type of sample (e.g. cell culture, frozen tissue, FFPE tissue, etc.) and RNA molecules to be purified (e.g. small or large RNA molecules).

Sample lysis, homogenization and clearing
Cell lysis is commonly performed using a guanidine-thiocyanate-based buffer combined with a strong reducing agent, such as tris(2-carboxyethyl)phosphine hydrochloride, 2-mercaptoethanol or dithiothreitol to ensure the complete cell lysis and protein denaturation (lysis buffer), but these procedures vary according to the characteristics of the starting sample (e.g. adherent cell culture, cell suspension, frozen tissue or formalin-fixed paraffinembedded (FFPE) tissue)- (Fig. 7).

Cultured adherent cells
Cultured adherent cells, like endothelial cells, can be subjected to trypsinization prior to cell lysis or they can be lysed directly in the culture container by replacing the liquid medium with lysis buffer directly to the cell monolayer. Cell lysis by addition of lysis buffer is recommended to be performed prior to storage to avoid transcriptomic alterations during freezing. The cell lysate is safe to be stored at − 80 °C.

Cell suspensions
Cell suspensions, such as microvessel fragments or single endothelial cells, can be pelleted by gentle centrifugation (≤ 500×g). After complete removal of the supernatant, the cell pellet is re-suspended in lysis buffer. As for the lysis of adherent cells, cell lysis is recommended to be performed prior to storage. Sorted and isolated single cells can be directly collected in lysis buffer and mild lysis buffer, respectively.

Tissue samples
At the moment of collection, tissue samples, like the ChP, are recommended to be stored in a RNA stabilization solution or to be snap-frozen in liquid nitrogen. Samples in stabilization solution can be stored up to 4 weeks at 4 °C or at − 20 °C for long-term storage [152]. Samples snap-frozen in liquid nitrogen are safe to be stored at − 80 °C for more than 20 years [153]. Tissue samples might need to be disrupted using different techniques, such as the TissueRuptor, TissueLyser, ZR BashingBead Lysis Tubes or thorough grinding under liquid nitrogen using a mortar and pestle [152]. Remaining tissue and other precipitates might need to be removed by centrifugation and the supernatant can be used for subsequent RNA isolation.

FFPE tissue samples
FFPE tissue samples derived from brain microvessels must be subjected to deparaffinization using xylene or other commercially available solutions. Subsequently, tissue and protein digestion is performed using proteinase K. Next, formaldehyde-derived crosslinks of nucleic acids and proteins must be reversed by incubating at more than 80 °C. Finally, the sample might be cleared by centrifugation and the supernatant can be used for subsequent RNA isolation [154].

RNA purification
During experimental design and before performing RNA purification, it is necessary to identify which RNA molecules are relevant for the research question. Messenger   Fig. 6 Overview of the main steps for processing a CNS tissue sample into BBB-related material ready for RNA isolation. Fresh samples are dissociated by mechanical disruption, enzymatic digestion, or a combination of both. Typically, tissue is first mechanically disaggregated into smaller pieces to facilitate the exposure to the enzyme solution. Dissociated tissue is then selected according to size by passing through one or a series of filters, by a density gradient, or both. This process isolates the microvessels. For isolating single barrier cells, tissue dissociation (particularly enzymatic digestion) can be repeated [1] after the initial size selection steps. The single cell suspension can be further purified or enriched for certain cell types [2] by using a fluorescence-activated cell sorter (FACS) or magnetic microbeads labeled with an antibody against a cell marker. Alternatively, if the tissue of interest is frozen or formalin-fixed paraffin-embedded (FFPE), a common approach is to isolate the microvessels by laser capture microdissection (LCM) (See figure on next page.) RNA (mRNA) is the RNA that will be translated by the ribosomes into proteins. mRNA is characterized by having a coding sequence surrounded by 3′ and 5′ untranslated regions and a long sequence of adenine nucleotides at the 3′ end (poly-A tail). Several other types of non-coding RNA have important roles in cell biology, e.g. ribosomal RNA (rRNA) and transfer RNA (tRNA) are necessary for the translation process. Additionally, other RNA families are important for gene expression regulation as for example, microRNA (miRNA) with a size of ca. 22 nucleotides, other small RNAs (< 200 nt) and long non-coding RNA (lncRNA) with sizes greater than 200 nucleotides. Regarding the RNA content in a cell, it is important to notice that just mRNA and many lncRNA have a poly-A tail at the 3′ end. Additionally, rRNA represents the majority of the RNA content in the cell.
The combination of a highly concentrated chaotropic salt (e.g. guanidinium thiocyanate) in the lysis buffer with a certain concentration of an organic solvent (typically ethanol or isopropanol) allows the adsorption of nucleic acids to the silica matrix in spin columns. Although this solid phase extraction allows for efficient and easy isolation of purified nucleic acids, several details must be taken into consideration. The final concentration of ethanol or isopropanol in the mixture with lysis buffer is essential to promote the adsorption of RNA to the silica matrix [150]. Different ethanol or isopropanol concentrations result in the isolation of RNA molecules with different sizes, e.g. small RNA molecules (containing miRNAs) with a size between 16 and 200 nucleotides and large RNA molecules (containing mRNA and lncRNA) with a size greater than 200 nucleotides. Therefore, it is necessary to carefully select the applied protocols and correspondent ethanol/isopropanol concentrations before starting the isolation. Due to the possible impact of genomic DNA (gDNA) contamination in RNA-seq analyses, thorough digestion of gDNA remnants in RNA samples is mandatory. gDNA removal columns or integrated on-column DNA digestions are included in most RNA isolation kits [152].

Guidelines
• The RNA extraction protocol should be selected based on tissue type and quantity, as well as the intended sequencing and analysis. • Specific protocols are required for the isolation of total RNA including miRNA. • Correct sample homogenization and clearing are essential for isolation efficiency of RNA and analysis reproducibility. Fig. 7 Overview of commonly used RNA isolation protocols. Preparation of BBB-derived samples according to the type of sample. Cells in suspension are first collected by centrifugation, while adherent cultured cells are commonly trypsinized; then lysis buffer is added and cells are homogenized before proceeding to isolation of the RNA. Fresh frozen tissue can be mechanically disrupted in lysis buffer; debris should be removed by centrifugation before RNA isolation. Formalin-fixed paraffin-embedded (FFPE) tissue is first deparaffinized, and tissue disruption can be achieved by enzymatic (proteinase K) and/or mechanical means; de-crosslinking is followed by addition of lysis buffer, and then RNA is isolated • Genomic DNA contamination can have a considerable impact on the sequencing results therefore, gDNA removal or digestion is mandatory.

RNA quantification and quality control
RNA concentrations are best determined using a fluorometric quantification or qPCR. For example, fluorometric quantification can reliably measure RNA concentrations as low as 0.2 ng/μl. Spectrophotometric quantification is not recommended due to its inaccuracy, especially for small amounts of RNA, and due to the unreliable results in the presence of contaminants with an absorption wavelength close to those of DNA and RNA, e.g. phenol.
Nonetheless, the spectrophotometer can be a useful tool to determine contaminations. For example, measurements with absorption wavelengths of 230 and 280 nm can indicate contaminations with guanidinium salts and proteins, respectively. Quality control is best performed using an automated capillary electrophoresis platform that calculates a score for the RNA quality. Depending on the automated capillary electrophoresis platform used for RNA characterization, the RNA quality score might have different names (e.g., RNA integrity number, RNA Quality Score, RNA Quality Number, etc.). The RNA quality score has values between 0 (poor quality) and 10 (good quality) and it is calculated using an algorithm that incorporates several features of the RNA electropherogram, such as the ratio of 28S:18S ribosomal RNA [155]. A quality score for total RNA higher than 8 is recommended for the majority of RNA-seq library preparation techniques. In case of FFPE tissue, samples with score values around 2 can be used to perform RNA-seq using specific protocols for library preparation [156,157]. RNA quality scores of the respective RNA isolates are often missing in RNA-seq datasets in public databases, such as Gene Expression Omnibus (GEO-https ://www.ncbi.nlm.nih.gov/geo/) or Sequence Read Archive (SRA-https ://www.ncbi.nlm.nih.gov/sra). Given their impact in study reproducibility and crossstudy meta-analyses it is advisable to include these in publications as well as public databases.
Human BBB and BCSFB samples from a clinical setting often present suboptimal preservation, which may affect the resemblance of the transcriptome to the in vivo situation. With some exceptions [158][159][160][161][162][163], the use of biopsies from human brains for RNA-seq is uncommon and postmortem material is used instead. Two main factors should be considered when using postmortem brain or spinal cord samples, namely the premortem agonal state of the patient and the postmortem delay until sample collection. Prior to death, the patient may have suffered from fever, sepsis or hypoxic changes, as well as the provision of oxygen, which will strongly and selectively affect the levels of certain mRNAs [164]. Postmortem delay of tissue retrieval and preparation hampers RNA integrity as a result of transcript degradation, possibly in a nonrandom way [165]. RNA integrity can strongly affect transcript levels [165]. Particularly, low RNA integrity samples present an upregulation in translation-related pathways [166]. The time to sample preservation should be minimized, but this is usually not in the hands of the researchers. Samples with low RNA quality can be excluded by using a certain threshold (as measured by the RNA Integrity Number or RIN), or a mathematical model can be applied that accounts for the differential decay of different transcripts, thus increasing the statistical power [165].
Other factors that may lead to RNA degradation are the handling and storage conditions and, if applicable, the sectioning process. However, there are reports of a remarkable RNA stability in postmortem human brain samples [167,168].
In summary we advise the starting material to fulfill three criteria: • The RNA quality should be accessed, and the library preparation protocol should be chosen accordingly. • The quantity of isolated RNA must be sufficient to perform library preparation. Commercially available kits allow for preparation of libraries with 0.1 to 500 ng of total RNA. • The RNA samples must be clear of contaminants such as proteins, salts, sugars or DNA.

Guidelines
• Integrity values for RNA are essential both for the selection of the enrichment method as well as sample selection. • Assess RNA quantity and quality of the samples prior to sequencing. A quality threshold should be debated between the wet-lab researcher, the sequencing technician and the bioinformatician.

Design and preparation of sequencing libraries
When designing an RNA-Seq experiment, the following aspects must be considered: sequencing strategy, library preparation, number of replicates and sequencing depth (Fig. 3).
The sequencing strategy is important to guarantee the quality of the analysis at a reasonable cost. Pairedend sequencing means sequencing from both ends of the cDNA fragment. Since both fragments are aligned as a pair of reads, this strategy is preferable for de novo transcriptomic assembly, to study isoform expression or poorly annotated transcriptomes. Single-end sequencing implies that just one end of the cDNA fragments is sequenced. This approach is more cost-effective than paired-end sequencing and it is suitable for studying gene expression in well annotated organisms, such as human, mouse, rat and zebrafish. Additionally, the length of the reads produced during sequencing can be chosen depending on the purpose of the study. Longer reads (from 150 to 300 bases) are more cost-intensive but offer a higher resolution to study alternatively spliced RNA isoforms or poorly annotated transcriptomes. Shorter reads (e.g. 75 base pairs) are suitable to study gene expression in well annotated organisms.
The sequencing depth (number of sequenced reads per sample) used in RNA sequencing experiments is important for detection of differentially expressed genes, especially for lowly abundant transcripts. RNA-Seq experiments are recommended to use a minimum sequencing depth between 10 and 30 million reads per sample [169,170]. However, in human samples, it was shown that increasing sequencing depth above the ten million reads per sample threshold does not improve the identification of differentially expressed genes as much as increasing the number of replicates [171].
Inclusion of replicates in RNA-seq experiments is important to assess technical and biological variations: • Technical variability in sequencing is usually low.
Nonetheless, the technical variability introduced during sample collection and library preparation can be estimated using technical replicates or RNA spike-in like the one developed by the external RNA controls consortium (ERCC) [172] The ERCC RNA spike-in is a mixture of 92 synthetic RNA molecules and each molecule has a defined concentration. This mixture can be used to technically validate sequencing results and estimate technical variability. • Biological variability is the natural variation due to physiologic differences among subjects or over time in the same subject. Biological variance is usually more pronounced than technical variance and must be addressed with greater concern. Biological variability can be observed at different levels according to the sample type (e.g. cell line, mouse strain or human tissue). For instance, while studying the BBB from human samples, biological heterogeneity might be introduced by several variants, such as genetic background, lifestyle, hormonal level, medical history, sex or age. Differently, when studying the BBB from animal models (e.g. mouse), the variants that might introduce biological variability are mostly age, sex and strain. This reduced number of variants is expected to introduce less biological variability in samples from animal models when compared to samples from humans. Although the variability introduced from such variants can be minimized by using, inter alia, sex and age matched samples, uncontrollable biological heterogeneity should be accounted for by using biological replicates. For example, human BBB samples cannot always be collected in the same stage of the male or female hormonal cycles. Since it is known that hormones like steroids and estradiol regulate BBB permeability and tight junction protein expression, respectively, it is important that such biological variability is taken into consideration by using biological replicates [173]. Against this backdrop, the ideal number of biological replicates must be increased when studying very heterogeneous samples, species or strains. Currently, most RNA-Seq experiments include at least three biological replicates. A recent study using human whole-blood RNA-Seq data shows that the power to detect differentially expressed genes (twofold or higher change) is 87% and 98% using three or five biological replicates, respectively. Using three or five replicates, the ability to detect smaller changes in expression (1.25fold) is 17% and 25%, respectively [174]. Therefore, a sequencing depth of 10 million reads and a minimum of 3 to 5 replicates are recommended to reliably detect major changes in gene expression. Since adding more replicates is more beneficial than increasing sequencing depth, the use of 12 replicates is recommended to detect minor changes in gene expression (e.g. 1.25-fold) [170,175,176].
Since ribosomal RNA makes up most of the cellular RNA, rRNA depletion or mRNA enrichment should be performed prior to library preparation. Ribosomal RNA depletion or mRNA enrichment methods must be carefully chosen depending on the RNAs of interest and the integrity of the RNAs. For instance, ribosomal RNA depletion can be preferable for degraded material if poly-A bias is observed.
It is important to understand and choose carefully between stranded and unstranded library preparation protocols. Stranded RNA-Seq library preparation is characterized by the use of deoxyuridine triphosphate (dUTP) instead of deoxythymidine triphosphate (dTTP) during the second strand synthesis. Thanks to this alteration, the second strand can be degraded using uracil-Nglycosylase prior to the PCR amplification and only the first strand of cDNA will be amplified and sequenced (Fig. 8). A stranded RNA-Seq library retains the information of the template DNA strand from which an RNA was synthesized. Stranded RNA-Seq performs better in handling read ambiguity in overlapping genes transcribed from opposite strands and identifying antisense transcripts. Therefore, strand-specific RNA-Seq is preferred to unstranded protocols. However, when well-annotated genomes are available (e.g. human or mouse) or when analyzing samples with low RNA input, unstranded RNA sequencing can be considered nonetheless [177]).
Although not the main focus of this manuscript, since they have important applications in the field of brain barriers, we also want to briefly highlight 3′ RNA sequencing and small RNA sequencing protocols.
3′ end RNA sequencing is an alternative to RNA-Seq. Library preparation techniques are almost identical but unlike RNA-Seq, 3′ end RNA sequencing includes an enrichment step that, following fragmentation, excludes all but the outermost 3′ fragments adjacent to the poly-A tail. Due to the information lost during the enrichment for terminal fragments, 3′ end RNA-seq is not suited for studying open reading frames and, consequently, alternatively spliced transcript isoforms. Since 3′ end RNA-Seq generates exactly one read for each transcript and allows the incorporation of unique molecular identifiers (UMI) as applied in the Massive Analysis of cDNA Ends (MACE) method [178]. The technique also quantifies alternative poly-adenylation events which are important for mRNA properties such as stability [179]. Notably, 3′ end RNA sequencing generates reliable transcriptome profiles also from poor-quality samples and is currently the main method used in single cell RNA-Seq [180]. Thus, this technique can be recommended for quantification of gene expression and consequently for determining the downstream molecular and cellular mechanisms essential for brain barriers differentiation and function [181].
Several studies have shown the important role of small RNA in the molecular mechanisms that control the function of the BBB [182]. For example, the microRNA miR-27a-3p was identified in intracerebral hemorrhage patients to regulate BBB function and edema formation via up-regulation of AQP11 [183]. Therefore, small RNA sequencing is an important tool to better understand the role of small RNAs in the BBB. However, sequencing of smaller RNA molecules (e.g. microRNA, piwi-interacting RNA, etc.) cannot be performed using regular RNA-Seq library preparation techniques because of their relatively small size. Small RNA-Seq library preparation is usually performed by ligation of adapters to the RNA molecules followed by reverse transcription. The most recent kits for library preparation avoid generation of adapter-adapter by-products and allow the use of minute amounts of starting material.

Guidelines
• Correct RNA quantification and quality control are essential to ensure the quality of the sequencing results and their reproducibility. • Strandedness and sequencing strategy have a big impact on the analysis and the results. For that reason, they must be carefully chosen. • Biological replicates are essential and minimum sample size is affected by extrinsic and intrinsic factors. • The selection of the correct library preparation technique is crucial for accurate analysis and should be carefully chosen according to the topic under research.

Sequencing platforms
After completion of the first human genome sequence, the demand for cheaper and faster sequencing methods accelerated the development of NGS. Nowadays, NGS platforms enable low-cost and high-throughput analyses by sequencing billions of reads in parallel. Two of the currently most used short read sequencing methods in research labs are Ion Torrent (LifeTechnologies) and Illumina sequencing [184]. Although both technologies rely on sequencing-by-synthesis, Illumina platforms detect the fluorescence generated by the incorporation of fluorescently labeled nucleotides during DNA synthesis, while Ion Torrent platforms detect pH changes induced by the release of a hydrogen ion during the incorporation of a nucleotide into a growing strand of DNA [185,186]. The advantages of Ion Torrent platforms are the less expensive equipment and the relatively short run time (as low as 2 h per run), while its disadvantages are increased error rates while sequencing homopolymers, lower throughput (up to 80 million reads) and a higher cost per base. In contrast, Illumina platforms require expensive equipment and a running time up to 3.5 days. The advantages of Illumina platforms are the high-throughput capabilities (up to 20 billion reads per run) and the relatively low cost per sequenced base [184]. While Ion Torrent platforms can just perform single-end sequencing with read sizes around 200 and 400 base pairs, Illumina sequencing platforms have options to perform single-end Fig. 8 Library preparation protocols. Commonly used library preparation protocols for RNA sequencing. For RNA-Seq, a first step of ribosomal RNA (rRNA) depletion from the total RNA is performed; random primers are used for reverse transcription; dUTPs are used for the second strand synthesis; Y-shaped adaptors are then ligated, and then the second strand containing dUTPs is depleted, allowing to retrieve stranded information. 3′ RNA-Seq makes use of oligo dT primers for the reverse transcription, which selects mRNA. Using template switching, the second adapter is incorporated in the cDNA molecules. Small RNA-Seq uses adaptors that ligate to the small RNAs and allow the reverse transcription or paired-end sequencing and to sequence reads between 50 and 600 base pairs. Since Illumina and Ion Torrent have similar capabilities to detect differential gene expression between samples [187], the choice of the best suited platform should be based on the sequencing strategy, cost and time. Additionally, several other technologies can be used according to the research question at hand. For instance, single-molecule real-time sequencing technology allows the sequencing of reads with lengths up to 100,000 bases and therefore offers a valuable tool to study alternative splicing events [188]. The high cost of this technology and the high cost to perform expression profiling studies are its major limitations. Nanopore sequencing technology offers an alternative to sequence long reads. This technology uses a small and portable sequencer and can sequence reads with length up to more than 2 million bases [189]. Although the sequencing throughput is very low, the cost of the sequencing devices and reagents is also relatively low.

Guidelines
• Full transcript RNA sequencing can be performed using nanopore or single-molecule real-time sequencing and might reveal the impact of alternative splicing variants in the BBB or BCSFB function. • Gene expression profiling studies from BBB or BCSFB samples can be performed using both Illumina and Ion Torrent platforms. • Sequencing technology and platform should be chosen according to four main criteria: -Sequencing strategy (single-end or paired-end).

Data Analysis: Where to focus
Data analysis for RNA-Seq is a multi-step process that can be achieved with a plethora of technologies and strategies. A proper analysis should take into consideration all the steps mentioned throughout this manuscript as they can influence the results and their interpretation. There is no optimal "one size fits all" pipeline to be used for all different transcriptomic projects in the field of brain barriers, although the overarching steps will mostly be the same (Fig. 9). Analysis are often divided into Upstream Fig. 9 Overview of the main steps for RNA-Seq data analysis. Raw data goes through quality control steps and, if necessary, pre-processing steps are implemented. Next step is the alignment, most commonly in brain barriers studies being through mapping to reference. One more round of quality control is recommended based on the metrics of the alignment. After sorting, the files can be indexed and visualized in a genome browser. Counting can be attributed at different levels (gene, exon or transcript) and there are multiple algorithms for normalization, both in cases of inter-sample and intra-sample normalization. Finally, after the differential expression analysis, further information can be obtained with steps like gene ontology analysis, gene set enrichment analysis or pathway analysis and Downstream. For the purposes of the current manuscript, Upstream Analysis consists of the raw FASTQ files to the Count Matrix Table and Downstream Analysis comprises everything after that. A major key for a successful data analysis is clear communication between the wet lab brain barriers researcher, the sequencing facility staff and the bioinformaticians allowing for appropriate considerations of the respective limitations for optimal achievement of the project goals. We believe that there are several good publications and resources on how to optimize an analytical pipeline and benchmarking the different tools [174,190]. As such, we will not discuss it in utmost detail, but we will instead focus on some of the steps that we have experienced to be often overlooked and we will provide some recommendations.

Quality control and data pre-processing: often neglected, always important
For the purposes of this paper we will start after the demultiplexing process with the FASTQ files. These are the raw files that sequencing facilities and companies most commonly give to the researchers.
Along with the FASTQ files, researchers usually also get a text file with hashes (string of unique characters, in this case) for each of the FASTQs. These can be created with different algorithms (i.e. md5sum, checksum) and are used to ensure that the files were not corrupted during transfer. As such, these hashes should always be used to check the integrity of the FASTQ files.
Some companies also do some pre-processing steps, for instance trimming reads with low quality (in Phred64 or Phred33 of the Phred log scale, depending on the version and type of the sequencing platform). It is important to check if these steps were implemented, and if the answer is no, which steps should be. Removing low quality reads, for instance, can improve the efficiency of the analysis while also removing possible errors.
The most commonly used software for the effect of prealignment quality control (QC) is FastQC [191]. It is easy to use (by command line or graphical interface) and creates intuitive reports (in PDF and HTML format). These reports contain information about quality score encoding, the number of reads per sample, reads size, presence of contaminants, adapter sequence as well as low quality reads and the quality of the reads per position. A drop in quality is common both at the beginning and mostly at the end of the read and in paired reads; Read 2 (paired reads consist of Read1 and Read2) will generally have lower quality.
Based on this information it should be decided if the raw data is ready for alignment, if it requires pre-processing, such as removal of repetitive sequences, or if the sequencing or pre-sequencing steps need to be optimized and/or repeated.
After mapping the raw data (see next section), a genome browser, such as Integrated Genomics Viewer (IGV) [192], can be used to visualize the aligned reads. Tools like IGV [193] allow for a graphical visualization of the BAM files. Importantly, the same version of the reference genome as the one used to map the raw data must be selected. Although not its primary purpose, tools like this provide an optic way of visualizing issues such as a poly-A bias. A poly-A bias (also known as 3′ Bias) is most common in cases of RNA degradation, which is expected with the use of human post-mortem samples from the CNS, as previously mentioned.
Mapping will also generate metrics regarding overall alignment rates, reads aligned to genes, reads aligned uniquely versus reads aligned multiple times and unmapped reads. A close look at these metrics can uncover problems such as sample contamination or DNA still present in the sample.
Finally, for post-differential expression analysis, one can perform a batch effect test and, if necessary, a batch effect correction based either on known variables (samples sequenced or prepared on different days, for instance) or blindly. Plot visualization is also very informative regarding sample variability, by means of PCA or heatmaps colored by group and identified by sample, as an example.

Guidelines
• Quality control must be performed at every step in the analysis process. • There are no hard rules on quality control metrics, though all applied quality measures must be mentioned in the publication of the data. • Mapping metrics can inform about problems not detected at the raw data level and therefore should be taken into account when analyzing the data. • Data visualization (either post-mapping or postdifferential expression) can be informative regarding read distribution and sample variability, respectively.

Overall steps: how to get the desired information from your transcriptomics study
While analytical design can be modified to serve the purposes of the analysis, most RNA-Seq experiments with the objective of identifying differentially expressed genes will follow the same general workflow. From the raw files, the data is aligned, either by means of mapping to an available and curated reference genome or by means of performing a de novo transcriptome assembly. In the field of brain barriers, the vast majority of experiments are done using vertebrate, well established animal models, human tissue or in vitro models of the BBB and BCSFB, including cell line models. As such, usually, the alignment is done by means of mapping to a reference genome of the species of interest. There are multiple sources for reference genomes (NCBI-https ://www.ncbi.nlm.nih.gov/refse q/ and Ensembl-https :// www.ensem bl.org/index .html-being the most used for the referenced species) as well as multiple versions and builds. When writing a manuscript, it is imperative to reference the source, version and build used for the reference genome and annotation or precise replication will not be possible.
The next major step is quantification, where the mapped reads are counted by coordinates and then grouped. This grouping can be done at either gene level, exon level or transcript level, depending on the objective of the study (i.e. is alternative splicing of interest?). This choice should be mentioned in publications in order to allow the possibility of replication.
After quantification, the count matrix contains the raw counts, which need to be normalized. Several normalization algorithms are commonly used with different purposes and efficiencies. FPKM (Fragments Per Kilobase of transcript per Million mapped reads) and RPKM (Reads Per Kilobase of transcript per Million mapped reads) are commonly used for intra-sample comparison by normalizing for both gene length and library depth. However, they have been shown to be inconsistent when compared to other methods such as TPM (Transcripts per Million), and as such, if intra-sample comparison is the objective, the latter has proven more robust [194]. On the other side, if inter-sample comparisons are the objective, the two most commonly used methods are TMM (Trimmed Mean of M-values) from edgeR [195] and RLE (Relative Log Expression) from DESeq2 [196]. Both of them forego the gene length normalization aspect as it is irrelevant (inter-sample comparison compares the same gene across samples, so they will have the same gene length). TMM and RLE have been shown to be consistent and to have good performance [197,198]. However, there are differences between them, even if the overall concepts are similar. Recommendations for use are based on sample size, with cases of more than 12 biological replicates, DESeq 2 is recommended, while for 12 or fewer replicates, both can be used [176].
With the normalized values calculated, the following steps are the differential expression analysis and the gene ontology (GO) and pathway analysis. We will discuss cut off values and their importance in the next section and briefly mention GO and pathways.

Guidelines
• Always reference the source, version and build of the reference genome and annotation. • Indicate the counted feature (Exon, gene or transcript). • Normalization of the counts is mandatory. The selected normalization algorithm needs to take into account the different strengths and utilities.
Differential expression analysis: cut-offs, candidate selection and cell/tissue purity assessment After the differential expression analysis to select differentially expressed genes (DEG) a common approach is to apply cut-offs to the statistical values obtained, usually a measure of probability of a false positive, one of relative expression and one of absolute expression. One of the statistical values obtained is a p-value. These values should not be used for DEG selection. P-values are obtained for comparison between groups on a per gene basis without taking into account the number of genes. However, when undergoing parallel multiple comparisons (as is the case with tens of thousands of genes) the likelihood of obtaining false positives will increase. To appreciate this, it is necessary to understand what p-values actually represent. A p-value is merely a statistical probability that the result/observation for that gene would be at least as extreme (the observed difference would at least be the same) if the null hypothesis was true (that there is actually no difference between the two conditions for that gene). As such, if we have a p-value of 0.01 but 20,000 genes tested, it would be expected that 200 of those genes were false positives. In order to solve this issue, a multiple testing correction is applied. There are a variety of algorithms, some of them very strict. Bon Ferroni is probably the simplest and the strictest one. It consists of dividing the α (chosen significance level, usually 0.05 or 0.01) by the number of tests (genes) to get a new α. However, while increasing confidence on the genes that do pass the new significance threshold (increased True Positive Rate) it is probable that there is a concomitant increase of the false negative ratio. A more appropriate correction is the Benjamini-Hochberg correction [199], which produces an adjusted p-value that can then be judged on the same initial α, most predominantly known as false discovery rate (FDR).
Another option is to ignore the p-value (adjusted or not) and focus solely on the fold change. Massively simplifying it, the fold change indicates the order of magnitude of the difference for that gene between conditions. This allows for the discard of genes where there are small differences that may be difficult or impossible to validate by qPCR and especially at the protein level, due to problems pertaining to low depth and the RNA-Protein abundance discrepancy [200]. Also, it should be mentioned that thresholds for log2FC (the base 2 logarithmic transformation of Fold Change that allows one to discern the direction-up and down-of regulation) are not as well defined and accepted.
A commonly used solution is to apply cut-offs for both the FDR and the log2FC, potentially also combined with an extra cut-off for overall expression.
If the goal of the experiment is mechanism or process discovery, pathway and enrichment analysis can provide guides of where to proceed in the functional validation. However, there are a few issues that we found problematic and should be considered.
Firstly, these analyses are heavily dependent on the quality of the respective databases used. For instance, the more a respective field is studied (e.g. cancer) the more information about that field is included in the database. The corollary is that the less that is known in a specific research field or about a molecule or mechanism, the less information will be available [201].
Secondly, there is a trend to use commercial tools for pathway and gene ontology analysis. While the tools appear to be solid, their restricted availability complicates reproducibility of the results. Thus, we recommend the use of open source tools for the same purpose instead. Finally, while it is possible to have an estimation of cell and tissue purity based on the results from transcriptomics studies, the methodologies to do just that should not be relied upon. The advent of extensive datasets, such as the one from Vanlandewijck and colleagues [13] can provide useful and necessary data points but will need to be supplemented with deep bulk RNA-Seq datasets. This is especially true for the expression of genes such as claudin-3 and claudin-12, which were previously thought to be signature claudins of the BBB, which has recently been disproven or put in another perspective [79,202].

Guidelines
• In RNA-Seq studies p-values require multiple test correction. However, overly harsh multiple test correction can increase the False Negative Ratio of differentially expressed genes. • The use of individual cut-offs and or mixes have their strengths and should be used in accordance to the objective.
• The intrinsic limitation of pathway and enrichment analysis should be taken into account.

Validation of transcriptomics results: Challenges and recommendations
It is often requested by journals or reviewers to validate RNA-Seq results in two ways. The first one is by qPCR. This RNA level validation can be performed either on the same samples as the RNA-Seq or on different ones. It is an open debate if qPCR validation is actually necessary. It is our opinion that, while not absolutely necessary, qPCR validation can provide valuable information. When done on the same samples, qPCR can validate the sequencing process, while validation of the results using a different set of biological replicates can help confirm the results [203]. The other validation approach is at the protein level. This can be achieved by different methodologies including Western Blots (WB), immunohistochemistry (IHC) and immunofluorescence (IF) staining. WBs tend to be more informative when it comes to quantification values, IHC and IF staining on the other hand allow for cellular and subcellular location of the protein being investigated. Protein level validation should be performed prior to functional studies based on the chosen candidates from the RNA-Seq results. Common issues are the absence of correlation between protein and RNA levels or of the differential expression detected amongst groups due to the RNA-protein discrepancy [200].

Data storage and availability
Upon publication of the manuscript, all relevant data should be made accessible. This necessarily includes the raw data but can also include processed data and any relevant metadata. There are a number of specialized data repositories, including GEO (https ://www.ncbi.nlm.nih. gov/geo/), SRA (https ://www.ncbi.nlm.nih.gov/sra) and the European Nucleotide Archive (ENA-https ://www. ebi.ac.uk/ena). As most of the data repositories allow for the data to remain private until publication, it is recommended that the process of uploading the data is done in advance. This also allows to provide the reviewers with access tokens. Outside of public data repositories, local copies of the data should also be stored and maintained.
As a complement, some groups have in addition created web-interfaces allowing other researchers to explore their results. This includes BBBomics (http://bioin forma ticst ools.mayo.edu/bbbom ics/) [97], the Vascular Single Cells Database (http://betsh oltzl ab.org/Vascu larSi ngleC ells/datab ase.html) [13,76], Brain RNA-Seq (https ://www.brain rnase q.org/) [204,205], Single Cell Analysis of Mouse Cortex (http://linna rsson lab.org/corte x/) [206] and the Allen Brain Map (https ://porta l.brain -map.org/). These web interfaces are usually representative of studies that focus on the assessment of molecular expression in specific tissues or cell types and can be valuable tools when designing a new experiment or doing quality control and purity assessment of bulk RNA-Seq data.
Publishing recommendations for RNA-Seq studies in the field of brain barriers • Include a description of the selected region/tissue/ cell type (i.e. capillary endothelial cells from the prefrontal cortex) as well as discarded regions/tissues/ cell types (through different purification steps. • Provide a detailed barriers isolation protocol using a schematic overview that addresses the following points: -Relevant details on the tissue source: age, sex and for animal strains and for humans detailed information on the individuals as applicable. -Rational for specific brain barriers isolation methodologies used. -In depth information on workflow of isolation protocol including precise information on digestion and size-dependent selection steps applied. -For bead selection, mention advantages and disadvantages of the used antibodies. -Obtained yield of biological material.
-Test for purity and its result including the rationale for testing for specific contaminants and methodology applied. .
• Description of the RNA extraction protocol can be less extensive than that for tissue isolation. However, it should clearly indicate RNA integrity values (or range) as well as quantification. • Add sequencing specific information, including: .
• While referring to protocols in the methods section, point to original paper and make the differences to it clear. Alternatively, include a detailed protocol in the supplementary material. • At the time of publication make both the raw data, the analyzed data and any relevant metadata publicly available.

Emerging transcriptomics applications for the field of brain barriers
As mentioned above, the present manuscript has focused on the application of bulk RNA-Seq in the field of brain barriers research. New applications in the field of transcriptomics are however emerging and may thus be of equal significance in the near future. scRNA-Seq is perhaps the most established and known in brain barriers research. Several studies have used this methodology to great effect allowing to discover a previously unknown diversity of cells in the brain and the brain vasculature, respectively [13,206]. Single nucleus RNA-Seq (snRNA-Seq) is an alternative method to scRNA-Seq. While it presents a limitation in that only nuclear RNA will be captured and sequenced, it has the advantage of not requiring live dissociated cells. As such, it was used in conjugation with scRNA-Seq to compare adult and embryonic transcriptomic profiles of the ChP in the cases where dissociation failed to produce viable cells [42].
Spatial RNA-Seq is a recent technique that allows to sequence tissue sections in a way that the transcriptomic profiles can be attributed back to different locations in the section, with each location corresponding to a uniquely barcoded spot. Briefly, a tissue section is placed on a special microscopic slide (coated with spatial barcode oligos) that allows to maintain positional information throughout the sequencing process. As such, in conjugation with a previously acquired image of the section taken beforehand, it is possible to attribute specific transcriptomic profiles to regions in the tissue section. At present the resolution of spatial RNA-Seq allows to distinguish about 5000 spots per slide with each spot having a 55 µm diameter and a 100 µm center to center distance between spots. Thus, this technology is not well suited for precise assignment of transcriptomic information to the fine structures of the microvasculature (such as BBB microvessels), but rather to focal areas in the brain and spinal cord allowing to determine regional differences in the transcriptome in the brain and spinal cords in health and disease [207,208]. As the resolution and the technology improves it has the potential to become a valuable tool for brain barriers research.
As a final note of this section, it is important to point out that none of the mentioned technologies will replace bulk RNA-Seq. Rather, they provide alternative or supplementary technology to bulk RNA-Seq and are accompanied by different strengths and weaknesses. For instance, scRNA-Seq usually requires live, viable, dissociated cells and transcripts of low and medium expression will be underrepresented. The choice of which technology is appropriate should be made based on what the objectives of the study in question are as well as extrinsic factors, such as cost. This decision should be made early on the process and the input of a sequencing facility technician and of a bioinformatician is valuable to the wet-lab researcher.

Concluding remarks
This manuscript intends to provide a valuable and exhausting source of information for brain barriers researchers and bioinformaticians when planning RNA-Seq analysis in the context of brain barriers research based on the experiences made in the BtRAIN network. Our intention is that this manuscript incites closer interaction between classical brain barriers researchers and bioinformaticians in planning and performing RNA-Seq analysis. We are convinced that considering the issues raised here will allow for future publication of studies providing more specific and accurate information which is prerequisite for data comparison and replication.

Glossary
• Alignment and Mapping: Alignment is the process of attributing reads to the corresponding place in the genome/transcriptome. Mapping usually refers exclusively to doing this process by aligning the reads to a reference genome while alignment can also refer to a de novo assembly. • Batch effect: Type of variability not related to the biological question that can obfuscate the results. It tends to have a technical source and it is usually hard to pinpoint. Can be due to different labs, different time days for RNA extraction as well as many other factors. • BAM file: Binary form of a SAM file (Sequence Alignment/Map), which is the file resulting from the alignment. These files can be sorted by gene name of by genomic coordinates. • Checksum (i.e. md5sum): algorithm that creates an "hash" (a string of characters) corresponding to a file.
Since small changes to the file completely alter this string it can be used to detect corruption of the file during transfer. • Demultiplexing: in silico process of assigning the reads back to the corresponding sample. During sequencing samples are often multiplexed (pooled and sequenced together) making demultiplexing an essential step. • Downstream Analysis: Usually refers to the more flexible statistical steps that start with the counts matrix. It includes, amongst others, normalization, differential expression analysis, batch effect correction, multiple testing correction and annotation. • FASTQ: file type that stores the biological sequence alongside the corresponding PHRED scores. It is usually the file with which the analysis starts. • Fluorescence-activated cell sorting (FACS): technique that uses flow cytometry to sort a heterogeneous mixture of cells into two or more groups based upon light scattering and fluorescent characteristics. • Genome Browser: Tool that allows for direct visualization of the reads aligned to a reference genome using the coordinate sorted BAM files. • Laser capture microdissection (LCM): technique that allows the dissection and isolation of cells or tissue. It uses direct microscopic visualization and a laser. Dissected material can be captured by absorption, ejection, gravity or aspiration. • Paired or single-end reads: single-end reads are only sequenced from one end of the DNA fragment while paired-end reads are sequenced from both ends. This will have an impact on certain features, for instance, paired-end reads are better suited for the discovery of novel alternative splicing sites. • PHRED score: Logarithmic quality score for basecalling errors. The higher the score, the less the probability of an error for each base. A PHRED of 3, for instance, corresponds to a 50% probability of a correct call while a PHRED score of 30 corresponds to a 99.9% probability of a correct call. • Principal component analysis (PCA): statistical method that uses the provided data (for RNA-Seq it usually is the gene expression values) to define unre-lated variables. Usually presented as either 2 or 3 dimensional, it can be used in an exploratory fashion to access the sources of variability in the data. • Quantification: Also sometimes referred to as counting. In this step reads are attributed to features. These features can be genes, transcripts or exons. • Transcriptome: complete set of transcripts (RNA molecules) in a cell, and their quantity, for a specific developmental stage or physiological condition. • Transcriptomics: the study of transcriptomes and their functions. • Raw files: files that have not been processed, usually referring to the fastq files. • RNA-sequencing (RNA-Seq): high-throughput method for both mapping and quantifying transcriptomes. • Upstream Analysis: Usually refers to the analysis up until a count file is generated. This includes, but is not restricted to, trimming, alignment, sorting, counting and the initial quality control steps.