An optical window into brain function in children and adolescents: A systematic review of functional near-infrared spectroscopy studies

Despite decades of research, our understanding of functional brain development throughout childhood and adolescence remains limited due to the challenges posed by certain neuroimaging modalities. Recently, there has been a growing interest in using functional near-infrared spectroscopy (fNIRS) to elucidate the neural basis of cognitive and socioemotional development and identify the factors shaping these types of development. This article, focusing on the fNIRS methods, presents an up-to-date systematic review of fNIRS studies addressing the effects of age and other factors on brain functions in children and adolescents. Literature searches were conducted using PubMed and PsycINFO. A total of 79 fNIRS studies involving healthy individuals aged 3-17 years that were published in peer-reviewed journals in English before July 2020 were included. Six methodological aspects of these studies were evaluated, including the research design, experimental paradigm, fNIRS measurement, data preprocessing, statistical analysis, and result presentation. The risk of bias, such as selective outcome reporting, was assessed throughout the review. A qualitative synthesis of study findings in terms of the factor effects on changes in oxyhemoglobin concentration was also performed. This unregistered review highlights the strengths and limitations of the existing literature and suggests directions for future research to facilitate the improved use of fNIRS in developmental cognitive neuroscience research.


Introduction
During the first two decades of life, many aspects of cognitive, social, and emotional functions undergo continual development. For example, executive functions emerge early in life and continue to improve and differentiate throughout childhood, reaching an adult-like level by middle to late adolescence ( Best and Miller, 2010 ;Romine and Reynolds, 2005 ). Social skills progress from a reliance on one's egocentric perspective to considering the perspectives and understanding the intentions and false beliefs of others, with mentalizing abilities continuing to mature throughout childhood and into young adulthood ( Dumontheil et al., 2010 ;Wellman et al., 2001 ). The ability to recognize and empathize with the emotions of others while identifying, expressing, and regulating one's own emotions according to various contexts also undergoes gradual improvement from birth to late adolescence or early adulthood ( Lawrence et al., 2015 ;Zeman et al., 2006 ). While genes exert a major influence on these forms of development, physical and psychosocial environmental factors also play pivotal roles ( Tucker-Drob et al., 2013 ;Wang and Saudino et al., 2013 ). For instance, prenatal factors (e.g., although the extent to which regions within these networks are engaged appears to be age dependent ( Houdé et al., 2010 ;Yaple and Arsalidou, 2018 ). Notwithstanding these findings, our understanding of functional brain development -especially development pertinent to social interactions and speech production -remains limited. This is because fMRI, which is the most popular neuroimaging modality, is relatively expensive, sensitive to motion artifacts, and uncomfortable for some children.
In the context of these challenges, there has been an emerging interest in using functional near-infrared spectroscopy (fNIRS) to study functional brain development, and this article focuses on the methods involved in this approach. This technique utilizes near-infrared lights -to which living tissues are relatively transparent -to measure hemodynamic changes that occur at the cortical surface ( Villringer and Chance, 1997 ). When near-infrared lights are delivered to the scalp by light-emitting optodes, they transmit through the skin, skull, cerebrospinal fluid, and brain tissue in a banana-shaped trajectory before leaving the scalp and becoming detected by light-receiving optodes ( Okada and Delpy, 2003 ). When neurons are active, neurovascular coupling occurs, which leads to an increase in oxyhemoglobin concentration (HbO) and a decrease in deoxyhemoglobin concentration (HbR). Since the absorptivity of these two chromophores in the near-infrared region varies as a function of wavelength, an fNIRS device utilizes at least two wavelengths of near-infrared light to measure cerebral hemodynamics using algorithms such as the modified Beer-Lambert law ( Delpy et al., 1988 ). Notably, three fNIRS techniques can be used ( Ferrari and Quaresima, 2012 ). Continuous-wave fNIRS can only measure relative -but not absolute -hemoglobin concentration; however, it remains the most commonly used technique due to its cost-effectiveness. Time-and frequency-domain fNIRS can measure absolute hemoglobin concentration but have rarely been applied in research due to their high instrument costs.
Since fNIRS is relatively resilient to motion artifacts and can be used in a natural setting, it has been increasingly used in research on lifespan development. Applications of fNIRS in infancy or early childhood ( Lloyd-Fox et al., 2010 ;Moriguchi and Hiraki, 2013 ;Vanderwert and Nelson, 2014 ) and aging Yeung and Chan, 2020 have been reviewed narratively and systematically, respectively. However, much remains unknown regarding its application in childhood and adolescence, during which cognitive and socioemotional functions undergo continual development. Compared to infants and toddlers, children and adolescents have more independence and autonomy that allow them to make their own decisions, explore new places, and encounter new people. These life experiences can have long-term consequences on their cognition and mental well-being, as well as brain health, because childhood and adolescence constitute critical periods for the development of higher-order cognitive and adaptive emotional skills ( Khan et al., 2015 ;Kim et al., 2013 ;Larsen and Luna, 2018 ). The first onset of many commonly occurring mental disorders, including impulse-control, anxiety, substance use, and schizophrenic disorders, also usually occurs in childhood or adolescence ( Kessler et al., 2007 ;Meyer and Lee, 2019 ). Since fNIRS is less expensive than fMRI and more resistant to motion artifacts than fMRI and EEG, which enable widespread use, its application can help speed up our understanding of the neural mechanisms underlying various aspects of child and adolescent development.
In addition, previous developmental reviews have addressed various methodological issues related to applying fNIRS in infants and young children and summarized the findings of some key studies. They also provided some suggestions for improving the fNIRS setup, task design (e.g., jitter block durations), and data interpretation (e.g., reporting both HbO and HbR). Nevertheless, it is not clear whether these suggestions were taken in developmental studies since then. In addition, since these reviews were published, some empirical investigations and methodological reviews have advanced certain aspects of fNIRS data preprocessing and analysis ( Brigadoi et al., 2014 ;Huppert, 2016 ;Pinti et al., 2019 ;Scholkmann et al., 2014 ). Such advanced knowledge and other method-ological issues, including those pertinent to statistical power, age correction, artifact removal methods, statistical test assumptions, and quality control, have not been incorporated or addressed in previous reviews, although they are important in ensuring high signal quality and accurate data interpretation. Furthermore, currently there exists a large variety of approaches across different labs, and there is a need for more standardization to facilitate reproducibility. Thus, a systematic review and critical evaluation of the methodological aspects of developmental fNIRS studies is timely and useful for ensuring good and standardized research practice in the field.
Thus, this article aimed to bridge the knowledge gaps by providing a systematic review and critical evaluation of the methodological aspects of fNIRS studies with healthy children and adolescents. The research design, experimental paradigm, fNIRS measurement, data preprocessing, statistical analysis, and result presentation, were evaluated. This review highlights the strengths and limitations of the existing literature and suggests directions for future research to facilitate the improved use of fNIRS in developmental cognitive neuroscience research.

Search strategy and study selection
This review conformed to the standard methodological guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement ( Moher et al., 2009 ) and was not registered in any registry. A literature search was performed using PubMed and PsycINFO on June 21, 2020 to identify fNIRS studies including healthy children and adolescents. The keywords used were "child * " or "adolescen * " or "teen * " and "near-infrared spectroscopy " or "nirs " or "fnirs ". No limits were set for the search.
Following the initial search, the titles and abstracts of the articles were screened. A study was included if it applied fNIRS to study brain function, included at least one sample group with a mean age between 3 and 17 years, and examined the effects of age or other factors (i.e., individual differences or interventions) on cerebral oxygenation in children and adolescents. The article had to be written in English and published as original research in a peer-reviewed journal. The gray literature was excluded to ensure that all included studies were of high quality. Also, studies on psychiatric and neurological disorders or any other medical conditions were excluded to ensure that typically developing children and adolescents were included. After the screening, the full texts of the screened articles were retrieved for eligibility assessment based on the same inclusion and exclusion criteria.
These criteria were set to ensure that the review was representative of the fNIRS literature in healthy children and adolescents, who are in critical developmental periods characterized by growing independence and autonomy. Studies with infants or toddlers were not included because many of them had been covered in previous reviews ( Lloyd-Fox et al., 2010 ;Vanderwert and Nelson, 2014 ), and research practice in infant and toddler studies can be considerably different from that in child and adolescent studies in terms of the fNIRS setup and task design. In addition, studies focusing on atypical development (i.e., neurodevelopmental disorders) were not included because they have been covered in recent reviews on autism spectrum disorder  and attention-deficit/hyperactivity disorder ( Mauri et al., 2018 ).

Data extraction
In addition to the first authors' names and publication years, the following information about the research design, experimental paradigm, fNIRS measurement, data preprocessing, statistical analysis, and result presentation was extracted from each identified article: (a) research design: study type, factors, sample characteristics, a priori power analysis, and screening test for psychopathology; (b) experimental paradigm: functional domain and construct, task, task conditions, paradigm design, intertrial interval, and block duration; (c) fNIRS measurement: model, manufacturer, technique, wavelengths, sampling rate, emitterdetector spacing, emitter number, detector number, channel number, brain regions, and presence of channel registration; (d) data preprocessing: toolbox/software, manual and automatic removal, simple frequency filtering, motion/systemic correction, detrending, autocorrelation correction, other artifact removal methods, and age correction of differential pathlength factor; (e) statistical analysis: test for factor effects, normality check for fNIRS signals, channel clustering (for regionof-interest analysis), channel-wise multiple comparison correction, and use of a hemodynamic response function model; (f) result presentation: brain function, fNIRS measures, main effects and contrasts, HbO and HbR results , analysis of both HbO and HbR, significant results found for HbR only, and image reconstruction for factor effects. All attributes and results were synthesized qualitatively rather than quantitatively due to the large variability in fNIRS systems and data analysis across studies. No study authors were contacted for additional information.
These aspects were reviewed in light of the features or limitations of fNIRS: (a) research design: due to the relatively low signal-to-noise ratio of fNIRS compared to fMRI, sufficient statistical power and homogeneous samples are important in fNIRS studies; (b) experimental paradigm: since fNIRS measurement is limited to certain cortical areas, fNIRS may be preferentially applied to study some types of functions. Task parameters, such as trial and block durations, also influence the estimation of the hemodynamic response; (c) fNIRS measurement: the nature of signals measured by an fNIRS device critically depends on its specification (e.g., optode arrangement) and where the optodes are placed on the scalp; (d) data preprocessing: artifacts are common in fNIRS signals, and the adequacy and efficiency of artifact removal/modeling determine the quality of fNIRS measurements; (e) statistical analysis: various data analysis methods exist for fNIRS, each with its assumptions, and a violation of the assumptions may yield inaccurate results; (f) result presentation: fNIRS yields (changes in) HbO and HbR, and the relationship between them informs the quality of signals and the strength of evidence.

Data coding
Each identified study was coded based on the aforementioned attributes. While some attributes were coded as binary (i.e., yes/no) or continuous, others were coded in categories as follows. For study type, studies examining age effects were classified as cross-sectional or longitudinal, while those examining non-age effects were classified as correlational if individual differences were studied and as experimental or quasi-experimental (e.g., experiments without control) if interventions were studied. Functional domains were categorized into executive function, language, social function, working memory (WM), arithmetic, attention, sensory function, motor function, emotion, or rest. Paradigm designs were coded as blocked if a task involved the repeated or continuous presentation of one task condition and alternation between task conditions, as event-related if a task involved different discrete and transient events presented in a randomized order, and as mixed if a task involved alternation between task conditions, with different events presented in a randomized order during one condition.
Preprocessing for fNIRS data typically involves trial and channel rejection, simple frequency filtering, motion/systemic correction, detrending, and/or temporal autocorrelation correction ( Scholkmann et al., 2014 ), which were coded as follows: (a) whether artifactual data were removed manually, guided by visual inspection or video recording, or using (semi-)automatic algorithms based on amplitude and/or standard deviation ( SD ) criteria; (b) whether cardiac and respiratory activities, Mayer waves, and slow drifts were attenuated using simple lowpass, highpass, and/or moving-average filtering; (c) whether motion and sys-temic artifacts were corrected using advanced algorithms, including the movement artifact reduction algorithm (MARA; i.e., spline interpolation), wavelet filtering, correlation-based signal improvement (CBSI), functional-systemic separation, or temporal derivative distribution repair (TDDR); (d) whether detrending was performed using methods other than simple highpass filtering, including discrete cosine transform (DCT)-based highpass filtering, wavelet minimum description length (wavelet-MDL) detrending, or polynomial fitting; (e) whether temporal autocorrelation was corrected using precoloring or prewhitening; (f) whether other methods such as independent component analysis (ICA) and the autoregressive iteratively reweighted least squares (AR-IRLS) algorithm were applied to model and correct for noise.
A test for factor effects refers to statistical tests (e.g., ANOVA) used to assess the effects of certain factors on brain function. Brain function refer to the dimension of fNIRS signals examined, including the level, laterality, latency, connectivity, network organization, and interpersonal synchrony of changes in HbO and/or HbR. fNIRS measures include HbO and/or HbR variables representing different aspects of brain function. Main effects and contrasts indicate the key (pair of) task conditions used to assess the factor effects on fNIRS measures. Where contrasts between two task conditions and between a task condition and a non-task condition were present, the former was chosen. Finally, for HbO and/or HbR results, ' + ' and '-' indicate significant positive and negative factor effects, respectively, while 'o' indicates no significant factor effect. When multiple comparison corrections were performed, the results after correction were considered.
Overall, there was only one coder, but it is unlikely that any of the review's conclusions would be influenced by inclusion of more coders. This is because this review focused on the methodological aspects of fNIRS studies, and the coding was straightforward and objective for most items. Nevertheless, a second coder was sought to classify the functional domain of tasks, where judgments can be split. Any discrepancy was resolved by consensus.

Risk of bias assessment
Throughout the review, the risk of bias assessment was conducted similarly to a recent systematic review on the application of fNIRS in aging research . Underpowered studies are prone to false-negative and false-positive error ( Forstmeier et al., 2017 ), while multichannel fNIRS studies that fail to correct for multiple comparisons at the channel or regional level also have an increased risk of falsepositive error. Thus, it was noted whether studies performed a priori power analysis and applied multiple comparison correction when testing the factor effects on fNIRS variables in multiple regions. Notably, developmental changes in skull and brain structures lead to differences in the distance of light traveling from source to detector in the head, affecting changes in hemoglobin concentration measured by fNIRS ( Duncan et al., 1996 ). Therefore, it was also noted whether studies controlled for these age-related differences by adjusting the differential pathlength factor for age or by using effect size or other unitless metrics that are independent of the assumed differential pathlength factor.
Artifacts are common in fNIRS signals and may lead to deviant observations that distort parametric tests. Thus, it was examined whether studies performed normality checks for fNIRS signals before statistical analysis. Moreover, it is common for fNIRS studies to analyze only HbO because this index typically has a higher signal-to-noise ratio and exhibits a higher correlation with blood-oxygen-level-dependent (BOLD) signals compared to HbR ( Cui et al., 2011 ;Strangman et al., 2002 ). However, some researchers may also analyze HbR because only this index can yield significant results. Thus, whether studies reported both fNIRS indices when addressing factor effects and whether they reported significant results for HbR only were examined to determine the potential presence of reporting bias.           Note. EF = executive function; HbR = deoxyhemoglobin concentration; hrf = hemodynamic response function; IQ = intelligence quotient; SES = socioeconomic status; WM = working memory. When more than one main effect and contrast existed, the underlined main effect and contrast was chosen for the qualitative synthesis of evidence. The symbols "+ ", "-", and "± " represent significant positive, negative, and mixed effects (after multiple comparison correction), respectively, whereas the symbol "o " represents a nonsignificant effect. A mixed effect means that significant positive and negative effects are both present, and the directionality depends on the region/channel. The order of symbols follows the order of main effects and contrast. published articles increased to 14. A total of 16 articles were published in 2019 and five were published between January 1 and June 21, 2020.
The total sample size of each study ranged from 8 to 392, with a median of 38. When considering only individuals aged below 18 years, the median sample size was 33. Only three (4%) of the studies provided a priori power analysis or a rationale to justify their sample sizes. Also, only five (6%) of the studies administered standardized tests to screen for childhood psychopathology. These tests include the Child Behavior Checklist, Kiddle Schedule for Affective Disorders and Schizophrenia (Present and Lifetime version), Mini-International Neuropsychiatric Interview, Strengths and Difficulties Questionnaire, and Symptom Checklist-90 ( n = 1 each).

Experimental paradigm
Supplementary Table 3 details the experimental paradigm employed in each study. Overall, 25 (32%) of the studies were concerned with executive functions: 17 studies assessed inhibition using the Stroop ( n = 8), Go/No-go ( n = 6), flanker ( n = 3), delay of gratification ( n = 1), and/or modified Real Animal Size ( n = 1) tests. Nine studies examined shifting using the Dimensional Card Sorting ( n = 6), Wisconsin Card Sorting ( n = 2), or modified Real Animal Size ( n = 1) tests. One study assessed monitoring and creative thinking using a reality judgment task, while one other study used the Torrance Test of Creative Thinking-Figural Test. Additionally, 23 (29%) of the studies evaluated language functions: 10 examined word processing using a repetition or reading task ( n = 6), a learning task ( n = 2), a lexical selection task ( n = 1), or a comprehension task (i.e. the Pyramids and Palm Trees Test) ( n = 1). Seven studies examined verbal fluency using the phonemic fluency test ( n = 4) and/or the semantic fluency task ( n = 4). Five studies examined sentence processing using a listening task ( n = 3) or a judgment task ( n = 2). Moreover, one study examined story processing using a listening task.
Notably, 12 (15%) of the studies examined social functions: three studies examined social information processing using a watching or listening task, four studies examined parent-child or interpersonal dyads using a cooperation task ( n = 3) or the Disruptive Behavior Diagnostic Observation Schedule ( n = 1), two studies assessed face recognition using an old/new paradigm, and one study each examined action imitation, gesture recognition, and social appreciation, respectively. Additionally, 10 (13%) of the studies assessed WM using tasks that emphasized the maintenance of spatial ( n = 4), visual ( n = 2), or visual and verbal ( n = 1) WM, the manipulation of spatial WM ( n = 1), and/or the updating of visual ( n = 1) or spatial ( n = 1) WM. Furthermore, five (6%) of the studies assessed emotional function using a frustration-based emotion regulation task ( n = 3), a facial emotion recognition task ( n = 1), or a perception task ( n = 1). Also, four (5%) of the studies examined mental arithmetic skills with a task involving all four arithmetic operations (n = 1) or only addition ( n = 1), subtraction ( n = 1), or multiplication ( n = 1) problems. Another four (5%) of the studies focused on rest with closed ( n = 3) or open ( n = 1) eyes. Additionally, two (3%) of the studies assessed attention, including sustained ( n = 1; video game playing) and selective attention ( n = 1; triad classification). Finally, two (3%) of the studies examined sensory functions using movie-viewing tasks, while one (1%) of the studies examined motor functions involving cycling.
Overall, 58 (73%) of the studies employed the blocked design exclusively. Among these studies, active task blocks and rest or non-task blocks were of variable duration in 13 and 18 studies, respectively. Also, 17 (22%) of the studies employed the event-related design exclusively, with the rapid design (i.e., average intertrial intervals < 6 seconds) and jittering interstimulus intervals being used in three and six studies, respectively. Two (3%) of the studies employed both the blocked and event-related designs (depending on the task), while two (3%) of the studies employed a mixed design. Additionally, after excluding three studies that focused on rest only, 69 (91%) of the studies employed one task, while seven (9%) studies employed at least two tasks. A pool of 85 tasks was employed across studies, each of which included 1 to 10 task conditions, with a median of two conditions. Among these 85 tasks, 73 (86%) included more than one task condition, whereas the remainder included only one task condition.
The sampling rate of data acquisition was unknown in four studies. For the remaining studies, the sampling rate ranged from 0.8 to 50 Hz, with a median of 10 Hz. Source-detector spacing was unknown in three studies. For the remaining studies, this value ranged from 20 to 50 mm, with a median of 30 mm. Notably, 71 (90%) of the studies employed a fixed source-detector separation. Probe distances varied in eight studies, which was largely due to caps adjustable to head sizes being used and the probes being placed according to the international 10-20 system. For all studies, the number of sources used ranged from 1 to 20, with a median of 8. The number of detectors used ranged from 1 to 32, with a median of 20. The number of available measurement channels ranged from 1 to 80, with a median of 22. Furthermore, measurements over the frontal, temporal, parietal, and occipital lobes were performed in all (100%), 23 (29%), 21 (27%), and five (6%) of the studies, respectively. All but four (95%) of the studies measured brain activity in both hemispheres. Moreover, 53 (67%) of the studies performed spatial registration of the channels, which was aided by 3D digitizers in 13 (25%) studies.
The removal of artifactual data was performed manually in 24 (30%) of the studies, 20 of which relied on a visual inspection, while the remainder relied on video recording. This removal was also performed using (semi-)automatic algorithms in 24 (30%) of the studies, 13 of which were based on SD only, 9 on amplitude only, and 2 on both criteria. Additionally, simple frequency filtering was performed in 48 (61%) of the studies. Specifically, the included studies applied lowpass filtering ( n = 36; range: 0.05-1 Hz, median: 0.3 Hz), highpass filtering ( n = 28; range: 0.007-0.05 Hz, median: 0.01 Hz), and/or moving-average filtering ( n = 11; three with a 5 s window, three with unknown parameters, two with a Savitzky-Golay filter, two with five data points, and one with three-point triangular smoothing). Notably, 26 studies applied both lowpass and highpass filtering (i.e., bandpass filtering), while one study applied both highpass and moving-average filtering.
Regarding the age-dependent differential pathlength factor used to estimate changes in hemoglobin concentration, 47 studies did not require age correction because these studies analyzed the effect size or other unitless metrics, including beta estimates ( n = 26), Z -scores ( n = 9), Pearson's r (i.e., connectivity) and other network measures ( n = 7), and ratios between conditions or chromophores ( n = 5). For the remaining 32 studies, only five (15%) corrected the differential pathlength factor for age using an equation.

Statistical analysis
Supplementary Table 6 summarizes the statistical analyses conducted in each study. To determine the relationship between different factors and fNIRS variables, 76 (96%) of the studies used at least one parametric test, including t -test, ANOVA, ANCOVA, Pearson's correlation, partial correlation, multiple regression, and/or a linear mixed model. In contrast, only six (9%) studies used a non-parametric test, including the Spearman's correlation or Mann-Whitney U test. Moreover, seven (9%) of the studies tested for the normality of fNIRS variables. Normality was achieved in five studies, with four proceeding with parametric tests. Normality was violated in two studies, and one study proceeded with parametric tests.
After excluding one study using a single-channel fNIRS system, 37 (47%) studies clustered channels and performed a region of interest analysis to determine factor effects. Also, after excluding 29 studies involving factor effects on brain functions across regions or broadly in the two hemispheres, 22 (44%) studies applied multiple comparison correction at the channel or regional level. These correction methods included false discovery rate correction ( n = 13), Bonferroni correction ( n = 5), Dubey/Armitage-Parmar adjustment (which considers the spatial correlation among channels) ( n = 2), Sun's tube formula correction ( n = 2), and an arbitrary adjustment of the p -value threshold ( n = 1).

Result presentation
Supplementary Table 7 presents the fNIRS results and how these were presented in each study. Sixty-four (81%) of the studies examined the level of changes in brain activity in terms of mean change over a task period ( n = 34), beta value or Pearson's r ( n = 27), mean change around the peak ( n = 2), an unspecified change ( n = 2), rate of change ( n = 1), or ratio of the range of change ( n = 1). Three of these and one other study examined the laterality of changes in fNIRS signals by investigating the ratio ( n = 3) or difference in the rate of change ( n = 1) of brain activity between the two hemispheres. Among these 65 studies on the level and/or laterality of brain activity, 27 (42%) relied on modeling the hemodynamic response. Also, five (6%) of the studies examined connectivity in terms of the Pearson's r ( n = 3), partial coherence ( n = 1), or Granger causality ( n = 1) between two regions. Furthermore, four (5%) of the studies examined interpersonal synchrony in terms of the wavelet coherence ( n = 3) or Pearson's r ( n = 1) between the brain activity of two individuals, while one (1%) of the studies examined the similarity of changes in brain activity between twins. Four studies examined functional network organization ( n = 3) and its laterality ( n = 1) using graph measures. Finally, one study each examined the latency of brain activation, the variability of brain activity, and the similarity of brain activity between the resting and task states, respectively.
Regarding the task conditions that formed the basis of factor effects, some studies did not require contrasting conditions. Three studies examined the effect of different factors on the resting-state connectivity or brain network configuration. Moreover, three studies examined connectivity throughout a task, while one study compared the similarity of brain activity between the resting and task states. For the remaining 72 studies, only 37 (51%) contrasted one task condition with other task conditions, while the remainder only contrasted a task condition with the rest or non-task period.
Notably, 77 (97%) of the studies analyzed HbO, while two studies analyzed HbR only. After excluding seven studies that combined HbO and HbR as a result of data preprocessing or analysis, 26 (36%) of the 72 studies analyzed both indices when determining the factor effects on brain function. Interestingly, among these 26 studies, no studies reported significant results for HbR only, whereas 10 (38%) studies reported significant results for HbO only. Also, 11 (42%) of these 26 studies noted significant factor effects supported by opposite or consistent changes in HbO and HbR. Finally, among all the 79 studies included in the present review, image reconstruction was used to illustrate the factor effects on brain function in 35 (44%) of the studies, while 20 (57%) involved interpolation between channels.
In light of the high heterogeneity of studies and the limited number of studies in each functional domain and age group, the study findings are not commented (but see Table 1 for individual study findings and  Supplementary Table 8 for a qualitative synthesis of evidence). Readers interested in a narrative presentation of the key findings of developmental fNIRS studies are referred to the reviews by Moriguchi and Hiraki (2013) and Vanderwert and Nelson (2014) .

Discussion
Recently, there has been growing interest in the use of fNIRS to study brain function in the context of lifespan development, which is largely due to the increasing recognition that this technique is cost-effective, resistant to motion artifacts, and can be used in natural settings. Applications of fNIRS in infants or young children ( Lloyd-Fox et al., 2010 ;Moriguchi and Hiraki, 2013 ;Vanderwert and Nelson, 2014 ) and older adults  have been reviewed narratively and systematically, respectively. However, fNIRS application in healthy children and adolescents, who exhibit increasing independence and autonomy and undergo continual cognitive and socioemotional development, remains largely unknown. This article systematically reviews 79 studies that addressed the effects of age and other factors (e.g., individual differences) on brain function in healthy children and adolescents. This discussion will introduce key points from the existing fNIRS literature, highlight its limitations, and provide recommendations and directions for future developmental fNIRS research. After analyzing the various methodological aspects of these studies, it is suggested that future studies make fuller use of fNIRS without falling into the potential pitfalls of this technique. Table 2 lists the methodological issues of previous studies and the recommendations for the field; the suggestions emerged out of the literature on addressing these issues (i.e., methodological studies and reviews) and of the author's research experience.

Considerations for the design of an fNIRS study
Most fNIRS studies involving children and adolescents to date have generated cross-sectional and correlational evidence. Notably, relatively few longitudinal and experimental studies have sought to identify the causal roles for age and other factors in children's and adolescents' brain function. Notwithstanding these contributions, few studies have considered age-related differences in skull and brain structures or brain-scalp distance ( Beauchamp et al., 2011 ). Failure to control for these differences can lead to the underestimation of brain activity in older individuals (due to larger brain-scalp distances) and increase between-subject variability. Thus, it is essential to make corrections for these confounding factors either by using effect sizes ( Schroeter et al., 2003 ) or by correcting the differential pathlength factor for age using the general equation ( Scholkmann and Wolf, 2013 ), or the generalized model (which additionally corrects for sex and head circumference) if the study involves 5-to 11-year-olds only ( Whiteman et al., 2017 ). Although general linear modeling with the canonical hemodynamic response function yields beta estimates that are independent of the assumed differential pathlength factor, age may affect the shape of the hemodynamic response function ( Kamran et al., 2018 ). Thus, future work would benefit from characterizing the relationship between age and hemodynamic response function to improve the accuracy of the model-based analytic approach.
The median total sample size of the reviewed studies is 38. Considering a power of 0.80 and an alpha level of 0.05, this sample size is just enough to detect a medium-to-large correlation ( r = 0.39) or a large difference ( d = 0.83) between two equal-sized groups, if any. Notably, this argument assumes that all statistical tests are driven by a priori hypotheses, or that only one test is performed to compare two groups or evaluate a bivariate relationship. The sample size will not be enough to detect effects of these effect size values if some statistical tests are exploratory and require multiple comparison correction (such as when comparing more than two groups and channels/regions), or when performing multiple regression or multivariate analysis. Accordingly, it is speculated that many fNIRS studies might have been underpowered. It is well known that underpowered studies are prone to committing falsepositive and false-negative errors ( Forstmeier et al., 2017 ). Therefore, it is suggested that future studies conduct and report a priori power analyses to justify their sample sizes based on studies that shared similar task and sample characteristics and/or factors of interest (i.e., look up Table 1 ). Notably, due to random measurement error and variation in task and sample characteristics across studies, the estimated required sample size based on a single study, especially one that is underpowered, is unlikely to be highly accurate. Thus, researchers who are planning a new study should estimate the required sample size based on the weighted mean effect size derived from most, if not all, of the studies that share similar features.
According to the present review, very few studies screened for childhood psychopathology in their samples using standardized tests. A wealth of evidence suggests that many mental disorders first emerge in childhood and adolescence ( Kessler et al., 2007 ;Meyer and Lee, 2019 ), and that neurodevelopmental disorders are becoming more commonly diagnosed ( Boyle et al., 2011 ). Thus, future studies would benefit from administering standardized inventories, such as the Strengths and Difficulties Questionnaire ( Goodman, 1997 ), to screen for common childhood psychopathology and improve the characterization and homogeneity of their study samples.
Most fNIRS studies involving children and adolescents have examined executive, language, and social functions separately, using one task at a time. Accumulated evidence suggests that executive function differentiates into distinct constructs throughout childhood ( Brydges et al., 2014 ;Xu et al., 2013 ) and that executive, language, and socioemotional development are interrelated ( Kuhn et al., 2014 ;Riggs et al., 2006 ).

Table 2
Suggestions to enhance research practice in developmental functional near-infrared spectroscopy (fNIRS) studies in light of the methodological issues identified in this review.

Methodological issue
Suggestion Rationale

1) Little consideration of age-related differences in head structures
Use effect sizes ( Schroeter et al., 2003 ) or correct the differential pathlength factor for age using the general equation ( Scholkmann and Wolf, 2013 ), or the generalized model (which additionally corrects for sex and head circumference) if the study involves 5-to 11-year-olds only ( Whiteman et al., 2017 ) To reduce between-subject variability and ensure that age-related differences in fNIRS variables reflect age-related differences in brain functions 2) Inadequate sample size or statistical power Perform and report a priori power analysis based on studies that shared similar sample and task characteristics and/or factors of interest (i.e., look up Table 1 ) To minimize the risks of committing false-positive and false-negative errors

3) Insufficient sample characterization
Administer standardized inventories to screen for common child and adolescent psychopathology To reduce between-subject variability 4) Use or reporting of only one task at a time Use at least two tasks to contrast different functions, preferably in a within-subject manner To inform the relationships between the developmental trajectories across functional domains 5) Tasks designed in ways that may hinder an accurate estimation of brain activity Jitter interblock and interstimulus intervals when using the blocked design, or adopt the (rapid) event-related design if possible To avoid habituation and anticipation effects and allow a separation of neural responses to different trials 6) Lack of regional comparisons Compare regions using effect sizes ( Schroeter et al., 2003 ), or after correcting the differential pathlength factor for region and other factors if the study involves 5-to 11-year-olds only ( Whiteman et al., 2017 ) To inform the spatial patterns of functional brain development 7) Variable preprocessing pipelines Apply a standardized preprocessing pipeline (see Pinti et al., 2019 ) To improve the reproducibility of results 8) Insufficient artifact removal Apply wavelet filtering (and spline interpolation) to correct for motion artifacts ( Brigadoi et al., 2014 ;Di Lorenzo et al., 2019 ) and a customized bandpass filter ( Pinti et al., 2019 ) to correct for high-frequency noise and slow drifts; alternatively, model noise structures using the autoregressive iteratively reweighted least squares algorithm ( Barker et al., 2013 ) To enable an accurate estimation of the hemodynamic response due to neuronal activity 9) Contrast between task and rest periods only Contrast at least two conditions that differ in the psychological process of interest only To clarify the conceptual meaning of fNIRS variables

10) Little consideration of parametric test assumptions
Check normality using Shapiro-Wilk tests, and remove outliers by conducting Grubbs' tests (Grubbs, 1969) and/or calculating Cook's distance (Cook, 1977) To enhance the accuracy of parametric test results

11) No multiple comparison correction
When testing multiple regions or channels, apply false discovery rate correction ( Singh and Dan, 2006 ) and report both the corrected and uncorrected results To avoid inflated Type I error rates

12) Incomplete reporting of fNIRS indices
Analyze and report both the HbO and HbR data To inform signal quality and provide a complete picture of the hemodynamic response 13) Little consideration of the possible confound between age and channel signal quality Examine if age correlates with the correlation between (changes in) HbO and HbR ( Cui et al., 2010 ;, the coefficient of variation of raw light intensity values ( Piper et al., 2014 ), or the amount of data removed To ensure that age-related differences in fNIRS variables reflect age-related differences in brain functions Note. HbO = oxyhemoglobin concentration; HbR = deoxyhemoglobin concentration Thus, future fNIRS studies comparing developmental changes in brain function among different tasks are required to enhance our holistic understanding of cognitive and socioemotional development throughout childhood. Whenever possible, such investigations should be performed within individuals to minimize between-subject variance and strengthen the evidence.
The blocked design has likely served as the primary experimental design for fNIRS because it offers higher detection power than the eventrelated design. However, the blocked design does not separate neural responses to different trials and is prone to rapid habituation and anticipation, which affects the magnitude and onset of hemodynamic responses. To overcome these challenges, researchers interested in neural responses to different trial types (e.g., correct and incorrect trials) and who want to prevent habituation and anticipation effects may consider adopting the event-related design. However, this design often involves longer experiments and increased boredom among participants when stimuli are presented slowly. To harness the advantages of this design and increase time efficiency, rapid event-related fNIRS with brain activity modeled using a canonical hemodynamic response function and its temporal derivative can be employed ( Heilbronner and Münte, 2013 ;Plichta et al., 2007 ). Also, the present review reveals that most previous studies used fixed durations for task and/or non-task periods. Additionally, jittering interblock and interstimulus intervals should be beneficial in minimizing anticipatory effects.
Most of the reviewed studies measured activity in multiple brain regions, with measurements often being confined to the frontal lobes since the forehead region offers better optode-scalp contact -and thus better signal quality -compared to posterior regions. While approximately 40% of the measurements covered frontal and non-frontal regions, only a few studies made regional comparisons. Moreover, while some evidence suggests that structural brain development follows inferior-to-superior and posterior-to-anterior gradients ( Colby et al., 2011 ;Gogtay et al., 2004 ;Krogsrud et al., 2016 ), the question of whether functional brain development follows similar patterns remains unresolved. Since executive, language, and socioemotional functions engage the frontal, parietal and/or temporal lobes as well as their interactions ( Houdé et al., 2010 ;McKenna et al., 2017 ;Pelphery and Carter, 2008 ), it would be interesting to determine whether there is a parallel between structural and functional brain development. While a comparison between regions is difficult for fNIRS due to differences in brain-scalp distance across regions ( Beauchamp et al., 2011 ), such a comparison could be possible if the differential pathlength factor is corrected for the region. Currently, such correction is only available for 5-to 11-year-olds ( Whiteman et al., 2017 ) and adults ( Zhao et al., 2002 ) using limited wavelengths; therefore, future work would benefit from determining the differential pathlength factor for different scalp regions across childhood and adolescence. Alternatively, effect size metrics can be employed, but whether the hemo-dynamic response function differs among brain regions remains to be determined.

Considerations for the preprocessing, analysis, and presentation of fNIRS data
The preprocessing for fNIRS varied greatly across studies, while many studies did not report their preprocessing pipelines. Most studies relied on custom scripts, with a few reporting the use of open-source Matlab toolboxes. Recently, Pinti et al. (2019) proposed the standardized procedure of fNIRS experimental design, data acquisition, data preprocessing, and statistical inference. To enhance reproducibility of methods and results, fNIRS researchers are recommended to follow this procedure.
Cardiac and respiratory activities, Mayer waves, and other physiological noise and motion are common sources of artifacts in fNIRS. If insufficiently treated, these artifacts reduce detection power and can lead to erroneous conclusions. While most studies have relied on trial and channel rejection as well as simple frequency filtering to attenuate artifacts, only a few have utilized more advanced algorithms to effectively remove or account for noise. Recent years have seen empirical comparisons among different motion correction and other artifact removal methods ( Brigadoi et al., 2014 ;Di Lorenzo et al., 2019 ;Hu et al., 2015 ;Huppert, 2016 ;Pinti et al., 2019 ). Based on the recent literature, the combination of wavelet filtering (and spline interpolation) and bandpass filtering with a higher-order finite impulse response filter and cutoffs determined by inspecting the frequency spectrum of fNIRS signals ( Pinti et al., 2019 ) constitutes an effective preprocessing pipeline under most circumstances. Notably, these algorithms have been implemented by some open-source toolboxes, including HomER2. Alternatively, fNIRS data can be unfiltered, while the AR-IRLS method, which involves the application of autoregressive-based prewhitening and robust regression concepts to model noise structures in a general linear model ( Barker et al., 2013 ) and is implemented in the AnalyzIR toolbox ( Santosa et al., 2018 ), can be used.
While most fNIRS studies have used a task with multiple task conditions, only half of them addressed the effect of their factors of interest on brain function by contrasting one task condition with another. For the remaining half, investigations were based on the contrast between one task condition and a rest or non-task period. Furthermore, non-significant results might have prevented researchers from using the contrast between task conditions. Nonetheless, task conditions and the resting state differ in numerous unspecific ways (e.g., attention and motivation). To isolate the factor effects on brain activity related to a specific psychological process, future studies should contrast at least two task conditions differing only in the psychological process of interest. A parametric design that involves the manipulation of task difficulty and has been adopted by some recent studies ( Chevalier et al., 2019 ;Sagiv et al., 2019 ) is highly encouraged. For example, studies on verbal fluency development may contrast between a low set size condition (i.e., high demand), a high set size condition (i.e., low demand), and a repetition condition (i.e., no demand) to elucidate the functional brain development of word retrieval.
While most studies used parametric tests for statistical analysis, few have verified the normality of fNIRS variables. Since artifacts are common in fNIRS signals, deviant observations that distort the results of parametric tests may be present. Thus, future work would benefit from checking and reporting the normality of fNIRS variables. When normality is violated, data transformation, objective outlier removal (e.g., Grubbs' test and Cook's distance), or non-parametric tests can improve normality or account for any non-normal data. Also, some multichannel fNIRS studies applied multiple comparison correction when addressing the factor effects on activity in multiple brain regions. While a region of interest analysis does not warrant multiple comparison correction because a priori hypotheses are tested, such correction provides information regarding the strength of evidence -results that survive correc-tion constitute stronger evidence than those that do not. Future studies are encouraged to apply multiple testing correction, preferably falsediscovery-rate correction ( Singh and Dan, 2006 ). Results that do not survive correction can also be reported since they may warrant further investigation.
One-third of the reviewed studies analyzed both HbO and HbR, whereas most of the remaining studies analyzed HbO only. Studies involving young adults noted that HbO seemingly has a higher signal-tonoise ratio and correlates better with BOLD signals when compared to HbR ( Cui et al., 2011 ;Strangman et al., 2002 ), yet contradictory evidence exists ( Huppert et al., 2006 ). In line with these observations, the present review highlights that fNIRS studies with children and adolescents are generally more likely to yield significant results for HbO than for HbR and that the analysis and reporting of HbR do not appear to reflect reporting bias. Importantly, some studies have reported significant opposite effects for these two indices, which provides strong evidence for links between different factors and brain function. Since the relationship between changes in HbO and HbR provides invaluable insights into the origin of these changes -with negative relationships indicating neuronal activity and positive relationships reflecting systemic or artifactual activities ( Cui et al., 2010 ) -the analysis of both HbO and HbR constitutes an important quality control step for fNIRS. Future work would benefit from clarifying whether the correlation between changes in HbO and HbR in response to events is moderated by age and other factors.
Due to developmental differences in attention span ( Lin et al., 1999 ), younger children can concentrate on the task for shorter durations compared to older children. Thus, when testing younger children with fNIRS, it is important to keep each run of the task paradigm short and include longer breaks, or the results can be confounded by a lack of task engagement. In addition, younger children are less capable of sitting still for long periods of time compared to older children. Accordingly, younger children may exhibit more prominent motion artifacts, which hinder the estimation of brain activity. Unfortunately, few studies have attempted to look at the potential confound between age and channel signal quality, but there are several ways that could be used to check whether channel signal quality differs across age groups. First, the relationship between age and the correlation between changes in HbO and HbR can be examined ( Cui et al., 2010 ;. However, it relies on the assumption that the coupling of changes in HbO and HbR is age-invariant, which is not necessarily true (Fabiani et al., 2014). Alternatively, the relationship between age and the coefficient of variation, calculated by dividing the standard deviation of raw light intensity values by the mean value (during rest), can be examined ( Piper et al., 2014 ). The relationship between age and the amount of data removed can also be examined.

Where fNIRS stands in developmental neuroscience
The cost-effective, user-friendly, and motion-insensitive properties of fNIRS have made it a suitable tool to study brain dynamics in large groups of young children interacting with each other. Specifically, the present review reveals that 64 (81%) of the included studies involved children between 3 and 7 years of age and contributed to our understanding of brain functions in these children. These findings extend the pertinent fMRI work, which usually investigates older populations (aged at least 8 years) ( Houdé et al., 2010 ;McKenna et al., 2017 ). Additionally, numerous fNIRS studies have demonstrated how different environmental and personality factors affected brain functions in groups of children with sample sizes of over 100 Sugiura et al., 2015 ). Notably, the median sample size for individuals between 3-17 years in these studies is 33, which contrasts with the typical sample size of less than 30 in fMRI studies ( McKenna et al., 2017 ). Furthermore, some fNIRS hyperscanning studies have generated evidence that sheds light on the neural basis of interpersonal interactions in children (particularly parent-child dyads) in ways that are infeasible with other neuroimaging modalities Nguyen et al., 2020 ). In summary, fNIRS studies have opened new lines of investigation into how brain functions develop early in life and the factors that influence brain development.
Like fNIRS, electroencephalography (EEG) is of low cost and suitable for a wide developmental range. Since EEG has better temporal resolution but poorer spatial resolution than fNIRS, the choice of tool depends on whether the research emphasizes the temporal dynamics or spatial localization of brain activity. Depending on the type of EEG/fNIRS systems and the number of electrodes/channels, the setup time can vary greatly. During EEG or fNIRS setup, it is necessary for the experimenters to maintain interest in younger children by talking to them. It can be difficult to get some children to wear the EEG or fNIRS cap. For these children, the experimenters should try to get them familiar with the equipment, such as by asking them to touch it or imagine wearing a swim cap. In EEG research, both EEG power and coherence (level and functional coupling of electrical activity) can be obtained during a prolonged period of rest or cognitive events on the order of minutes. In fNIRS research, however, a task block is often limited to 20-30 and up to 60 s. When blocks are too long, the hemodynamic response may become nonlinear (i.e., diminished after long stimulation periods; Wagner et al., 2005 ); slow drift ( < 0.01 Hz) and stimulation frequencies may also overlap, which poses difficulty in removing the slow drifts without attenuating task-related signal changes. Also, longer trials are typically needed for fNIRS studies due to the sluggish hemodynamic response. Furthermore, fNIRS investigations into the resting state are often restricted to the study of functional connectivity or other network properties; while fNIRS measures vascular functional connectivity, EEG measures neuronal functional connectivity ( Wolf et al., 2011 ).
In conclusion, fNIRS has some limitations that developmental researchers must consider. This technique cannot measure hemodynamic changes in deep brain structures, does not provide neuroanatomical information, and has a lower signal-to-noise ratio and spatial resolution when compared to fMRI. Since both fNIRS and fMRI are based on the hemodynamic response, most fMRI paradigms are readily applicable for use with fNIRS. However, due to a higher noise level, more trials are needed for fNIRS studies under most circumstances. Also, evidence supporting the reliability and validity of fNIRS stems from studies of young adults, and the reliability and validity of this technique remain to be explicitly determined for children and adolescents. Furthermore, fNIRS has potential pitfalls related to data interpretation, and erroneous conclusions may be drawn when study and task designs as well as data preprocessing and analysis are inadequate. Notwithstanding these limitations and potential pitfalls, the unique properties of fNIRS make it a promising tool for understanding the neural basis of cognitive and socioemotional development throughout childhood and adolescence as well as identifying the individual differences and factors that shape these forms of development.

Declaration of Competing Interest
The author has no conflict of interest to disclose.

Acknowledgment
The author would like to thank Tsz L. Lee for his assistance with coding.