A Meta-Regression of Racial Disparities in Wellbeing Outcomes During and After Foster Care

Children in foster care face heightened risk of adverse psychosocial and economic outcomes compared with children in the general population. Yet, the effects of foster care as an intervention are heterogeneous. Heterogeneity outcomes by race and ethnicity are of particular interest, given that Black and Indigenous youth experience foster care at higher rates than other racial/ethnic groups and experience group differences in setting, duration, and exits to permanency. This meta-regression explores racial disparities in education, employment, mental health, and behavioral outcomes during and following foster care. A systematic search of PsycINFO, ERIC, and Academic Search Complete using a series of search terms for studies published between January 2000 and June 2021 found 70 articles and 392 effect sizes that provided outcomes of US-based foster care by race/ethnicity. Findings reveal that Black foster care impacted persons (FCIPs) have 20% lower odds (95% CI: .68–.93) of achieving employment or substantial financial earnings and have 18% lower odds (95% CI: .68–1.00) of mental health concerns compared to White FCIPs. Hispanic FCIPs have 10% lower odds (95% CI: .84–.97) of achieving stable housing compared to non-Hispanic FCIPs. Moderator analyses revealed certain study features (i.e. publication type, timing of the study, location of the study, and placement status of the participants) have a significant impact on the gap between Black and non-Black and Hispanic and non-Hispanic FCIPs. The findings provide important implications for racial disparities in foster care outcomes, as well as highlight important gaps and missing information from published studies.


Search Terms
To locate all studies that addressed outcomes of foster care by race/ethnicity, the lead author conducted a systematic search of PsycINFO, ERIC, and Academic Search Complete using a series of search term combinations.These search terms addressed the type of placement, including "foster care," "kinship care," "out of home care," "congregate care," "foster youth," "foster child*, and "ag* out."These terms were separated by "OR" and were combined with terms related to specific types of outcomes and were connected with an "AND".Outcome terms included, "behavior* problem" or "externalizing" or "internalizing," "mental health" or "depress*," "crim*" or "delinq*," educat*" or "academic achievement," "housing" or "homeless" or "housing stability," "earning*,""employ*," "pregnan*" or "teen parent" or teen moth*," "drug use" or "substance use" or "substance abuse" or "drug abuse".Finally, the term "outcome" was included in all search phrases with "AND".

Exclusions during study selection
Two hundred and thirty-five studies were excluded during full-text review.The most common reasons (only the primary reason for exclusion was recorded) for exclusion during the full-text review were: not including outcomes by race (n = 124), not focusing on the general population of care-as-usual (n = 55), having a study design that did not focus on the analysis of outcomes (n = 13), and focusing on outcomes that were not the focus of this review (n = 12).Other reasons for full-text exclusion that included fewer than 10 studies each were: not enough information to calculate effect size (n = 6), dissertation duplicate (n = 7), non-US (n = 6), data duplicate with missing values or poor fit (n = 4), only one racial/ethnic group (n = 3), missed duplicate (n = 1), wrong publication type (n = 1), and policy brief duplicate (n = 1).There were 87 studies after full-text review that met the inclusion criteria.During data extraction, 16 additional studies were excluded.The most common reason for exclusion during data extraction was for not providing enough information to calculate an effect size (n = 12), followed by wrong population (n = 2) and not enough info about the racial groups (n=2).

Coding
Inter-rater Agreement.To develop inter-rater agreement, coders completed ten studies, then compared the extracted data.This was repeated three times before proceeding with the full sample.The overall inter-rater agreement for the whole sample was 79% (including early rounds).
Adjustments to data.If a study reported effect sizes separated by relevant populations (e.g.heterosexual and homosexual), the two effect sizes were combined and averaged (Rosenthal, 1991;Tsaousis, 2016).This was done for three studies (Chapman et al., 2014;Hook & Courtney, 2011;Jewell et al., 2010).For missing data, we followed the steps outlined by Pigott and Polanin (Pigott & Polanin, 2020) and we inferred from the study where possible and contacted authors if necessary.We contacted two authors who included all of the necessary information for the effect size, but failed to define what race meant in their models (Huffhines et al., 2020;Strong-Blakeney, 2013)-we heard back from one of the authors and the study was included in the analysis.For moderators, we chose to leave values as missing if it the information was not available.Finally, we focused on outcomes that fell within seven different domains (sexual health, mental health, homelessness, high risk behavior, employment, education/earnings, criminal behavior).To do this, we combined multiple indicators/variables that assessed similar concepts that fall within each broader domain, which is similar to the approach taken by other published meta-analyses (Dam et al., 2018;DuBois et al., 2011;Eby et al., 2013).
We originally had an additional domain: employment/education, that included effect sizes that were assessing educational achievement or employment status.This domain only had seven effect sizes that were from two studies.These two studies (Cheatham et al., 2020;Rosenberg & Kim, 2018) had other effect sizes in the Education domain.We decided to drop the employment/education domain because it was too small for analysis by racial comparison group and the studies were already being represented in the education domain.These effect sizes, however, are included in the overall positive outcome analysis.
When a study had an outcome that was in the opposite direction of the others within its domain (e.g.Greeson (2009) looked at material hardship with a higher value representing more material hardship, while the other measures in the earnings/employment domain had higher values representing more positive outcomes, such as being employed and amount of earnings), if available, the standard mean difference was multiplied by -1, then the odds ratio was calculated from the new SMD.This was also done with odds ratios.For two studies, we inverted the odds ratios so the scale was consistent with the others in the domain (Hill, 2010;Shpiegel & Cascardi, 2018).The domains were then grouped into positive outcomes (Education, employment/ earnings) and negative outcomes (criminal behavior, high risk behavior, homelessness, mental health difficulties, and high-risk sexual behavior).
We aimed to make White the reference group for all of our race/ethnicity comparisons.For studies that did not include White or similar as the reference group, we inverted the odds ratios to move White or similar (e.g.non-Black or non-Hispanic) to the reference group.This was done for eight difference studies (Calix, 2009;Greeson, 2009;Milum, 2011;Orgel, 2007;Shin, 2003;Shpiegel & Simmel, 2016;Somers et al., 2020;Zima et al., 2000).For studies that were extracted as frequencies or binary proportions, we reversed the direction of the scale to put White or similar as the reference group.This was done for five studies (Garcia et al., 2012;Harris et al., 2009Harris et al., , 2010;;O'Brien et al., 2010;Villegas et al., 2014).

Sensitivity Tests
For the first sensitivity test we dropped 20 effect sizes that were extremely highly correlated.For example, in Harris ( 2009), there is an effect size that assesses completing high school with a diploma or GED, while there are two additional effect sizes (ES) that assess completing high school with a diploma and another that assesses completing high school with a GED.For the sensitivity analyses we dropped the variable that included both GED and diploma.Others that were dropped looked at Total Behavior Problems (while the individual problem behaviors were assessed in other ES), public assistance receipt (we dropped all but current public assistance), and amount above poverty line (we dropped 3x over poverty line).The other sensitivity test we ran involved testing out different correlation values used for estimating the within study effect size correlation.Our primary analyses used RHO (.08): for the sensitivity tests we ran all analyses with the RHO (.07) and RHO (.09).These alternative values had little to no difference on the estimates.

Publication Bias
In addition to the full sample, we conducted funnel plots and egger regression tests for each racial comparison/domain combination due to anticipated high levels of heterogeneity within the whole sample.Egger tests were conducted when the models had ten or more effect sizes (Borenstein et al., 2009).For the whole sample, visual inspection of the funnel plot revealed a small amount of asymmetry, and the egger regression indicated the presence of small sample effects (beta1: -.52, SE: .175,p<.01).We suspect that much of this is due to heterogeneity within the dataset (I 2 =84.62), especially given our focus on different racial comparisons within different domains.Thus, we conducted funnel plots and egger regression tests for small study effects for each racial comparison within each domain to see if we could parse out the source of the asymmetry.We found no evidence of small study effects in the majority of the racial group comparisons within each domain.However, we found evidence of small study effects for the Black vs Non-Black comparison in the education and earnings/employment domains, and thus the positive outcomes as well.Relatedly, Black vs White had a p < .05value in the employment/earnings domain (the Black vs White comparison makes up most of the Black vs Non-Black comparison, so this was expected).Hispanic vs White was significant at the p<.05 level for the positive outcomes and the high-risk behavior domain.Due to issues related to the file drawer effect (Rosenthal, 1979), we chose to include publication type (peer review vs dissertation) as a moderator in a second round of egger regression tests.We found that when publication type was included in the model, the evidence of small study effects disappeared.This suggests that while there was some evidence of small study effects in our sample, it was largely tied to the publication type of the study.Finally, we conducted the trim and fill method, which resulted in no imputations, suggesting low asymmetry.However, due to high degrees of heterogeneity in the sample, the trim and fill findings need to be interpreted with caution.

Appendix D
Full Results for Bivariate Meta-Regressions

Table 1 .
Coding Scheme -All Variables except domain

Table 1 .
Bivariate Meta-Regressions with Covariates: Black vs Non-Black Comparisons d Degrees of Freedom < 4, significance level for df<4 is p<.001 b Negative domains include: high risk behavior, mental health concerns, justice system involvement, homelessness, sexual behavior c Positive domains include: education, education/employment, employment/earnings d Black vs Non-Black combines all racial comparisons that looked at Black vs another racial group, this includes Black vs White.

Table 2 .
Bivariate Meta-Regressions with Covariates: Hispanic vs Non-Hispanic Comparisons d Degrees of Freedom < 4, significance level for df<4 is p<.001 b Negative domains include: high risk behavior, mental health concerns, justice system involvement, homelessness, sexual behavior c Positive domains include: education, education/employment, employment/earnings d Black vs Non-Black combines all racial comparisons that looked at Hispanic vs another racial group, this includes Hispanic vs White. a