Skip to main content

Using Ordinary Least Squares in Higher Education Research: A Primer

  • Reference work entry
  • First Online:
Higher Education: Handbook of Theory and Research

Part of the book series: Higher Education: Handbook of Theory and Research ((HATR,volume 39))

  • 225 Accesses

Abstract

This chapter serves as a primer in utilizing ordinary least squares (OLS) in higher education research, by providing an overview of this commonly used quantitative approach, which often includes simple linear regression models and multiple linear regression models. The first section of the chapter reviews current literature to explain ways that OLS allows researchers to identify the goals of OLS and differentiate them from basic descriptive analyses and bivariate analyses. It then discusses the types of research questions that may be answered by OLS. The second section walks readers through an example application of OLS using a real-world dataset, reviewing the definitions, key components, and analytic steps in using OLS. The following section addresses important considerations in testing statistical assumptions and the influence of assumption violation before applying OLS. The fourth section further discusses the significance of considering heterogeneous effects in contemporary higher education. The chapter closes with topics related to interpreting findings, as well as the broader application of OLS in higher education research contexts.

Nicholas Hillman was the Associate Editor for this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 329.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The statistical property of unbiasedness refers to whether “an estimator whose expected value of its sampling distribution equals the true value of the population parameter” (Ezell & Land, 2005, p. 943). When the sample estimate is neither an underestimate nor an overestimate of the unknown population parameter, it is unbiased.

  2. 2.

    The public-use data is available at https://nces.ed.gov/datalab/. One major difference between the public-use data and restricted-use data is around how analytic weights are provided in this complex survey design. For example, variance estimation is provided through both Balanced Repeated Replication (BRR) and a Taylor series linearization, but only the BRR variance estimation method is supported for users of public-use data (Duprey et al., 2020).

  3. 3.

    The dependent variable indicates the total tuition and fees charged at the primary institution during the first academic year in postsecondary education after high school completion or exit. This value accounted for students’ attendance status to reflect students’ number of months enrolled full- or part-time in a given academic year. The primary institution was identified based on transcript records with the earliest start date excluding summer enrollments immediately following high school completion/exit.

  4. 4.

    Note that the hierarchical procedure, sometimes called block-wise entry procedure, is used to add or remove variables from regression model in multiple steps. This is different from hierarchical linear modeling when data have a nested structure (e.g., class sections, departments, colleges).

  5. 5.

    Note that percent refers to the rate of change, which is different from actual percentage points change. For example, if the baseline undergraduate enrollment being Pell-eligible is 30%, and it increased by 10 percent in a given year. The new undergraduate enrollment being Pell-eligible is 30% + (30% ×10%) = 33%. If it increased by 10 percentage points in a given year, the new undergraduate enrollment being Pell-eligible is 30% + 10% = 40%.

  6. 6.

    IPEDS defines race/ethnicity based on categories developed in 1997 by the Office of Management and Budget (OMB). These categories “describe groups to which individuals belong, identify with, or belong in the eyes of the community” (NCES, n.d.). In particular, individuals indicating their ethnicity as Hispanic/Latino are defined as “a person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin, regardless of race” (NCES, n.d.).

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaodan Hu .

Editor information

Editors and Affiliations

Section Editor information

Appendices

Appendix A: Data Preparation for Illustration Replication

The public-use HSLS:09 data can be found at https://nces.ed.gov/datalab/. Once the student-level dataset is downloaded and opened in Stata, the following commands were run to generate the OLS example data.dta file.

***sample selection*** *Generate sample with known primary first-year institution identified by 2017 . keep if X5PFYEAR > 0 (14,815 observations deleted) *Exclude students who did not have any dual credit indicated in their postsecondary transcript . drop if X5HSCRDERN <= 0 (6,954 observations deleted) *Exclude students with no valid values of the tuition level of a primary first-year institution . drop if X5PFYTUITION < 0 (1 observation deleted) ***variable selection*** **sociodemographic characteristics in 11th grade** *sex* . recode X2SEX (1=0) (2=1) (-9=.) . label define X2SEXF 0 "Male" 1 "Female", replace . label values X2SEX X2SEXF *race/ethnicity* . recode X2RACE (-9=.) *family income* . recode X2FAMINCOME (-8=.) *parental education* . recode X2PAREDU (-8=.) *dependent indicator* . gen dependent = 0 . replace dependent = 1 if P2DEPENDNUM>0 **students’ expectations of college and costs in 11th grade *educational expectations* . replace X2STUEDEXPCT=. if X2STUEDEXPCT<0 *financial aid expectation* . gen finaidexp = 0 . replace finaidexp=1 if S2QUALNEED==1| S2QUALACHIEVE==1 *the importance of cost of attendance when choosing a college* . recode S2COSTATTEND (-9=.)(-8=.)(-7=.) **academic performance** *high school overall GPA (honor-weighted)* . replace X3TGPAWGT=. if X3TGPAWGT < 0 *the number of AP/IB credits earned . recode X3TCREDAPIB (-9=.)(-8=.)(-7=.)(-6=.)(-1=.) **high school characteristics** *high school location* . recode X3LOCALE (-9=.)(-8=.)(-7=.)(-1=.) *high school control* . recode X3CONTROL (-9=.)(-8=.)(-7=.)(-1=.) (3=2) **enrollment intensity** . recode X5PFYENRLSTAT (-9=.) **Create a hypothetical unit var for independence testing because the public-use data suppressed school ID** . egen SimuID = group(X3REGION X3CONTROL X3LOCALE X2SCHOOLCLI) **Fill missing values with the variable's median value . fillmissing X2FAMINCOME X2PAREDU X3TGPAWGT X2STUEDEXPCT X3TCREDAPIB X3LOCALE X3CONTROL S2COSTATTEND X5PFYENRLSTAT SimuID, with(median) **Keep only relevant variables for demonstration purposes . keep X5PFYTUITION X5HSCRDERN X2SEX X2RACE X2FAMINCOME X2PAREDU dependent S2COSTATTEND finaidexp X2STUEDEXPCT X3TGPAWGT X3TCREDAPIB X3LOCALE X3CONTROL X5PFYENRLSTAT firstgen lowinc SimuID STU_ID *Save file . save "OLS example data.dta", replace

Appendix B: HSLS Variables Used for the Illustrated Example

Dependent Variable

Tuition and fees charged at primarily first-year institution (logged)

continuous

X5PFYTUITION

Independent Variables

The number of known dual credits earned

discrete

X5HSCRDERN

Control Variables

sociodemographic characteristics in their 11th grade

Sex

0 = Male; 1 = Female

X2SEX

Race/Ethnicity

1 = White; 2 = Black/African American, non-Hispanic; 3 = Hispanic; 4 = Asian or Native Hawaiian/Pacific Islander, non-Hispanic; 5 = American Indian or Alaska Native, non-Hispanic; 6 = More than one race, non-Hispanic

X2RACE

Total family income from all sources

1 = less than or equal to $15,000; 2 = family income > $15,000 and <= $35,000; 3 = family income > $35,000 and <= $55,000; 4 = family income > $55,000 and <= $75,000; 5 = family income > $75,000 and <= $95,000; 6 = family income > $95,000 and <= $115,000; 7 = family income > $115,000 and <= $135,000; 8 = family income > $135,000 and <= $155,000; 9 = family income > $155,000 and <=$175,000; 10 = family income > $175,000 and <= $195,000; 11 = family income > $195,000 and <= $215,000; 12 = family income > $215,000 and <= $235,000; 13 = family income above $235,000

X2FAMINCOME

Parents’/guardians’ highest level of education

1 = No postsecondary degree, 2 = Associate’s degree; 3 = Bachelor’s degree; 4 = Master’s degree; 5 = Ph.D./M.D/Law/other high level professional degree

X2PAREDU

Number of dependents on respondent’s parents

discrete

P2DEPENDNUM

Students’ expectations of college and associated costs in their 11th grade

How far in school sample member thinks the respondent will get

1 = No postsecondary degree or don’t know; 2 = Associate’s degree attempt or attainment; 3 = Bachelor’s degree attempt or attainment; 4 = Master attempt or attainment; 5 = Ph.D./M.D./law degree/high level professional degree attempt or attainment

X2STUEDEXPCT

whether students expect being qualified for financial aid based on financial need or academic achievement

0 = No; 1 = Yes

S2QUALNEED, S2QUALACHIEVE

Importance of cost of attendance when choosing college/school

1 = Very important; 2 = Somewhat important; 3 = Not at all important

S2COSTATTEND

academic performance in 12th grade

Overall GPA, honors-weighted

discrete

X3TGPAWGT

Credits earned in AP/IB combined

discrete

X3TCREDAPIB

high school characteristics

Location

1 = Urban; 2 = Suburban; 3 = Town; 4 = Rural

X3LOCALE

Control

1 = Public; 2 = Catholic or other private

X3CONTROL

Enrollment status first-year in college

students’ enrollment intensity status

1 = Exclusively part time; 2 = Exclusively part time; 3 = Mixed full time and part time

X5PFYENRLSTAT

Appendix C: Diagnosis and Data Recoding to Address Assumption Violations

. use "OLS example data.dta", clear **Stata code used in Section 2 *Generate descriptive summary of the dependent and focal independent variables only (Table 1) . sum X5PFYTUITION X5HSCRDERN, detail *Run simple regression (Table 2 & Table 3) . regress X5PFYTUITION X5HSCRDERN *Generate scatterplot with CI (Figure 1a) . twoway (scatter X5PFYTUITION X5HSCRDERN) (lfitci X5PFYTUITION X5HSCRDERN, lcolor(black) color(%50)) ,legend(order(2 "95% CI" 3 "Fitted values")) scheme(s1mono) xtitle("The number of dual credits earned in high school") ytitle("Tuition and Fees Charged", size(small)) *Use hierarchical procedures to select control variables (Table 4) . nestreg: regress X5PFYTUITION X5HSCRDERN (i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent) (i.S2COSTATTEND i.finaidexp i.X2STUEDEXPCT) (X3TGPAWGT X3TCREDAPIB) (i.X3LOCALE i.X3CONTROL) (i.X5PFYENRLSTAT) *Test omitted variable . regress X5PFYTUITION X5HSCRDERN i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent i.S2COSTATTEND i.finaidexp i.X2STUEDEXPCT X3TGPAWGT X3TCREDAPIB i.X3LOCALE i.X3CONTROL i.X5PFYENRLSTAT . linktest . estat ovtest *Test overfitting . overfit: regress X5PFYTUITION . overfit: regress X5PFYTUITION X5HSCRDERN i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent i.S2COSTATTEND i.finaidexp i.X2STUEDEXPCT X3TGPAWGT X3TCREDAPIB i.X3LOCALE i.X3CONTROL i.X5PFYENRLSTAT *Generate descriptive summary of all variables . sum X5PFYTUITION X5HSCRDERN X2SEX X2RACE X2FAMINCOME X2PAREDU dependent S2COSTATTEND finaidexp X2STUEDEXPCT X3TGPAWGT X3TCREDAPIB X3LOCALE X3CONTROL X5PFYENRLSTAT, detail *Generate scatter plot matrix (results omitted) . graph matrix X5PFYTUITION X5HSCRDERN X2SEX X2RACE X2FAMINCOME X2PAREDU dependent S2COSTATTEND finaidexp X2STUEDEXPCT X3TGPAWGT X3TCREDAPIB X3LOCALE X3CONTROL X5PFYENRLSTAT *Calculate the correlation matrix [Option 1] . correlate X5PFYTUITION X5HSCRDERN X2SEX X2RACE X2FAMINCOME X2PAREDU dependent S2COSTATTEND finaidexp X2STUEDEXPCT X3TGPAWGT X3TCREDAPIB X3LOCALE X3CONTROL X5PFYENRLSTAT *Calculate the correlation matrix [Option 2] (Table 5) . pwcorr X5PFYTUITION X5HSCRDERN X2SEX X2RACE X2FAMINCOME X2PAREDU dependent S2COSTATTEND finaidexp X2STUEDEXPCT X3TGPAWGT X3TCREDAPIB X3LOCALE X3CONTROL X5PFYENRLSTAT, star(.05) bonferroni **Stata code used in Section 3 *1. multicollinearity *Perform multiple linear regression model . regress X5PFYTUITION X5HSCRDERN i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent X3TGPAWGT i.X2STUEDEXPCT X3TCREDAPIB i.X3LOCALE i.X3CONTROL i.S2COSTATTEND i.finaidexp i.X5PFYENRLSTAT *Calculate VIF for each independent variable . vif *Recode race* . recode X2RACE(8=1)(3=2)(4=3)(5=3)(2=4)(7=4)(1=5)(6=6) . label define X2RACEF 1 "White, non-Hispanic" 2 "Black, non-Hispanic" 3 "Hispanic" 4 "AAPI, non-Hispanic" 5 " Amer. Indian/Alaska Native, non-Hispanic" 6 "More than one race, non-Hispanic", replace . label values X2RACE X2RACEF *Recode parent education level* . recode X2PAREDU (2=1)(3=1)(4=2)(5=3)(6=4)(7=5) . label define X2PAREDUF 1 "No postsecondary degree" 2 "AA" 3 "BA" 4 "Master" 5 "Doctoral or Prof", replace . label values X2PAREDU X2PAREDUF *Recode education aspiration* . recode X2STUEDEXPCT (2=1)(3=1)(4=1)(5=2)(6=2)(7=3)(8=3) (9=4)(10=4)(11=5)(12=5)(13=1) . label define X2EDEXPF 1 "No postsecondary degree or don't know" 2 "AA attempt or attainment" 3 "BA attempt or attainment" 4 "Master attempt or attainment" 5 "Doctoral or Prof attempt or attainment", replace . label values X2STUEDEXPCT X2EDEXPF *Rerun regression model and calculate VIF (code omitted) **2. linearity** *Perform revised multiple linear regression model . regress X5PFYTUITION X5HSCRDERN i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent X3TGPAWGT i.X2STUEDEXPCT X3TCREDAPIB i.X3LOCALE i.X3CONTROL i.S2COSTATTEND i.finaidexp i.X5PFYENRLSTAT *Generate standardized residuals . predict r, resid *Plot standardized residuals against the predictor variables . scatter r X5HSCRDERN, scheme(s1mono) **Plot augmented partial residuals with lowess smoothed line (Figure 3a) . acprplot X5HSCRDERN, scheme(s1mono) lowess *Plot the pattern to the residual against the fitted (predicted) values with a reference line at y=0 . rvfplot, yline(0) scheme(s1mono) *Conduct numerical tests: Breusch-Pagan / Cook-Weisberg test for heteroskedasticity (the Breusch-Pagan test) . estat hettest *Drop standardized residuals for subsequent analyses . drop r *Transform dependent variable with natural log . gen log_X5PFYTUITION = ln(X5PFYTUITION) *Perform revised multiple linear regression model . regress log_X5PFYTUITION X5HSCRDERN i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent X3TGPAWGT i.X2STUEDEXPCT X3TCREDAPIB i.X3LOCALE i.X3CONTROL i.S2COSTATTEND i.finaidexp i.X5PFYENRLSTAT *Generate standardized residuals . predict r, resid *Plot standardized residuals against the predictor variables . scatter r X5HSCRDERN, scheme(s1mono) **Plot augmented partial residuals with lowess smoothed line (Figure 3b) . acprplot X5HSCRDERN, scheme(s1mono) lowess **Plot augmented partial residuals with lowess smoothed line for other continuous/discrete variables . scatter r X3TGPAWGT, scheme(s1mono) . acprplot X3TGPAWGT, scheme(s1mono) lowess . scatter r X3TCREDAPIB, scheme(s1mono) . acprplot X3TCREDAPIB, scheme(s1mono) lowess **3. Normality *Plot a kernel density plot to be overlaid on a normal density plot (Figure 4) . kdensity r, normal scheme(s1mono) *Plot a standardized normal probability (p-p) plot (Figure 5) . pnorm r, scheme(s1mono) *Plot the quantiles of a var against the quantiles of a normal distribution (Figure 6a) . qnorm r, scheme(s1mono) *Conduct numerical tests: perform the shapiro-wilk w test for normality with 4<=n<=2000 observations . swilk r * Conduct numerical tests for inter-quartile range and symmetric distribution . iqr r *Drop standardized residuals for subsequent analyses . drop r **Address influential observations** *Perform revised multiple linear regression model . regress log_X5PFYTUITION X5HSCRDERN i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent X3TGPAWGT i.X2STUEDEXPCT X3TCREDAPIB i.X3LOCALE i.X3CONTROL i.S2COSTATTEND i.finaidexp i.X5PFYENRLSTAT *Predict the studentized residual . predict r, rstudent *Display stem-and-leaf plots . stem r *Drop influential observations . drop if abs(r) > 3 *Re-plot the quantiles of a var against the quantiles of a normal distribution after excluding influential observations (Figure 6b) . qnorm r, scheme(s1mono) *Drop studentized residuals for subsequent analyses . drop r **4. equal variance (homoscedasticity)** *Perform revised multiple linear regression model . regress log_X5PFYTUITION X5HSCRDERN i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent X3TGPAWGT i.X2STUEDEXPCT X3TCREDAPIB i.X3LOCALE i.X3CONTROL i.S2COSTATTEND i.finaidexp i.X5PFYENRLSTAT *Plot the pattern to the residual against the fitted (predicted) values with a reference line at y=0 (figure 7b) . rvfplot, yline(0) scheme(s1mono) * Conduct numerical tests: Breusch-Pagan / Cook-Weisberg test for heteroskedasticity (the Breusch-Pagan test) . estat hettest **5. independence** *Generate standardized residuals . predict r, resid *Plot the standardized residuals against the unit variable (Figure 8) . scatter r SimuID, scheme(s1mono) xtitle(SimuID) **Save file . save "OLS example data final.dta", replace

Appendix D: Full Model Specification for Results Interpretation

. use "OLS example data final.dta", clear **Test interaction effect between the focal independent variable and institutional control*** . regress log_X5PFYTUITION c.X5HSCRDERN##i.X3CONTROL i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent X3TGPAWGT i.X2STUEDEXPCT X3TCREDAPIB i.X3LOCALE i.S2COSTATTEND i.finaidexp i.X5PFYENRLSTAT *Plot interaction effect of c.X5HSCRDERN*i.X3CONTROL (Figure 9a) . margins X3CONTROL, at (X5HSCRDERN = (1(1)21)) . marginsplot, scheme(s1mono) yscale(range(8.4 9.2)) **Test interaction effect between the focal independent variable and Hispanic race*** . gen Hispanic=0 . replace Hispanic=1 if X2RACE==3 . regress log_X5PFYTUITION c.X5HSCRDERN i.X2RACE c.X5HSCRDERN#i.Hispanic i.X2SEX i.X2FAMINCOME i.X2PAREDU i.dependent X3TGPAWGT i.X2STUEDEXPCT X3TCREDAPIB i.X3LOCALE i.X3CONTROL i.S2COSTATTEND i.finaidexp i.X5PFYENRLSTAT *Plot interaction effect of c.X5HSCRDERN*i.Hispanic (Figure 9b) . margins Hispanic, at (X5HSCRDERN = (1(1)21)) . marginsplot, scheme(s1mono) yscale(range(8.4 9.2)) *Generate coefficients for Hispanic and non-Hispanic group . margins, dydx(X5HSCRDERN) at(Hispanic=(0 1)) ***Subgroup analysis *Generate a scatterplot with 95% CI for non-Hispanic Students Subgroup (Figure 10a) . twoway (lfitci log_X5PFYTUITION X5HSCRDERN if Hispanic==0) (scatter log_X5PFYTUITION X5HSCRDERN if Hispanic==0, msymbol(o)), legend(order(1 "95% CI" 2 "Fitted values" 3 "non-Hispanic Students")) xtitle("The number of dual credits earned in high school") ytitle("Tuition and Fees Charged", size(small)) graphregion(color(white)) scheme(s1mono) *Generate a scatterplot with 95% CI for Hispanic Students Subgroup (Figure 10b) . twoway (lfitci log_X5PFYTUITION X5HSCRDERN if Hispanic==1) (scatter log_X5PFYTUITION X5HSCRDERN if Hispanic==1, msymbol(o)), legend(order(1 "95% CI" 2 "Fitted values" 3 "Hispanic Students")) xtitle("The number of dual credits earned in high school") ytitle("Tuition and Fees Charged", size(small)) graphregion(color(white)) scheme(s1mono) *Run multiple linear regression for the non-Hispanic student subgroup . regress log_X5PFYTUITION X5HSCRDERN i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent i.S2COSTATTEND i.finaidexp i.X2STUEDEXPCT X3TGPAWGT X3TCREDAPIB i.X3LOCALE i.X3CONTROL i.X5PFYENRLSTAT if Hispanic==0 . est store nonHispanicStudents *Run multiple linear regression for the Hispanic student subgroup . regress log_X5PFYTUITION X5HSCRDERN i.X2SEX i.X2FAMINCOME i.X2PAREDU i.dependent i.S2COSTATTEND i.finaidexp i.X2STUEDEXPCT X3TGPAWGT X3TCREDAPIB i.X3LOCALE i.X3CONTROL i.X5PFYENRLSTAT if Hispanic==1 . est store HispanicStudents *Compare regression coefficients across groups . suest nonHispanicStudents HispanicStudents . test [nonHispanicStudents_mean]X5HSCRDERN = [HispanicStudents_mean]X5HSCRDERN *Generate output tables with user-written command outreg2 (Table 6) . outreg2 [nonHispanicStudents HispanicStudents] using regsub_output, stats(coef se) alpha(0.001, 0.01, 0.05) asterisk(coef) dec(3) adjr2 replace . seeout **Final Model Specifications** *Include the dependent variable and the focal independent variable . regress log_X5PFYTUITION X5HSCRDERN . est store Model1 *Add the block of sociodemographic variables . regress log_X5PFYTUITION X5HSCRDERN i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent . est store Model2 *Add the block of cost expectation variables . regress log_X5PFYTUITION X5HSCRDERN i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent i.S2COSTATTEND i.finaidexp i.X2STUEDEXPCT . est store Model3 *Add the block of academic performance variables . regress log_X5PFYTUITION X5HSCRDERN i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent i.S2COSTATTEND i.finaidexp i.X2STUEDEXPCT X3TGPAWGT X3TCREDAPIB . est store Model4 *Add the block of high school characteristics variables . regress log_X5PFYTUITION X5HSCRDERN i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent i.S2COSTATTEND i.finaidexp i.X2STUEDEXPCT X3TGPAWGT X3TCREDAPIB i.X3LOCALE i.X3CONTROL . est store Model5 *Add the block of enrollment intensity variables . regress log_X5PFYTUITION X5HSCRDERN i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent i.S2COSTATTEND i.finaidexp i.X2STUEDEXPCT X3TGPAWGT X3TCREDAPIB i.X3LOCALE i.X3CONTROL i.X5PFYENRLSTAT . est store Model6 *Add the interaction term (Table 7) . regress log_X5PFYTUITION X5HSCRDERN c.X5HSCRDERN#i.Hispanic i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent i.S2COSTATTEND i.finaidexp i.X2STUEDEXPCT X3TGPAWGT X3TCREDAPIB i.X3LOCALE i.X3CONTROL i.X5PFYENRLSTAT . est store Model7 *Calculate robust standard errors . regress log_X5PFYTUITION X5HSCRDERN c.X5HSCRDERN#i.Hispanic i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent i.S2COSTATTEND i.finaidexp i.X2STUEDEXPCT X3TGPAWGT X3TCREDAPIB i.X3LOCALE i.X3CONTROL i.X5PFYENRLSTAT, vce(robust) . est store Model8 *Generate output tables with user-written command outreg2 (Table 8) . outreg2 [Model1 Model2 Model3 Model4 Model5 Model6 Model7 Model8] using reg_output, stats(coef se) alpha(0.001, 0.01, 0.05) asterisk(coef) dec(3) adjr2 replace . seeout *Generate standardized beta . regress log_X5PFYTUITION X5HSCRDERN c.X5HSCRDERN#i.Hispanic i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent i.S2COSTATTEND i.finaidexp i.X2STUEDEXPCT X3TGPAWGT X3TCREDAPIB i.X3LOCALE i.X3CONTROL i.X5PFYENRLSTAT, beta *Generate predicted values holding covariates at their means . regress log_X5PFYTUITION X5HSCRDERN c.X5HSCRDERN#i.Hispanic i.X2SEX i.X2RACE i.X2FAMINCOME i.X2PAREDU i.dependent i.S2COSTATTEND i.finaidexp i.X2STUEDEXPCT X3TGPAWGT X3TCREDAPIB i.X3LOCALE i.X3CONTROL i.X5PFYENRLSTAT . margins, at(X5HSCRDERN =(3 6 9 12 15 18 30)) atmeans

Rights and permissions

Reprints and permissions

Copyright information

© 2024 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Hu, X. (2024). Using Ordinary Least Squares in Higher Education Research: A Primer. In: Perna, L.W. (eds) Higher Education: Handbook of Theory and Research. Higher Education: Handbook of Theory and Research, vol 39. Springer, Cham. https://doi.org/10.1007/978-3-031-38077-8_13

Download citation

Publish with us

Policies and ethics