Stereotypical Descriptions of Computer Science Careers Are Not Representative of Most Computer Scientists

Using data from a large self-initiated online survey, we nd that the career interests of many current and aspiring computer scientists diverge from the ocial prole of computer scientists established and promoted by the U.S. government – specically that from the Department of Labor’s Occupational Information Network (O*NET). Five distinct proles of career interests emerged from the data. Latent prole analysis suggests that many women in the profession value social and artistic expression in a way not currently recognized by ocial representations of computer scientists’ interests. Better admitting to a more nuanced and comprehensive picture of those interests has important implications for career guidance and workforce development and might help to address women’s underrepresentation in this STEM discipline.


Introduction
Accurate information about the characteristics of jobs and the workers who ll them is critical for career guidance, reemployment counseling, workforce development, and human resource management (1)(2). To meet these needs, the U.S. Department of Labor maintains a publicly available database that provides o cial comprehensive descriptions for over 900 occupations (www.onetonline.org). O cial descriptions of computer science (CS) related occupations create the impression that computer scientists are relatively uninterested in social tasks (working with, communicating with, and teaching other people; 2-3).
This description is aligned with the "geek-programmer" stereotype, which casts computer scientists as socially awkward and unsociable (4). Information that con rms the "geek-programmer" stereotype may lead women to "opt-out" of CS careers (5). It is widely acknowledged that people pursue careers they perceive to be aligned with their occupational interests, or preferences for work-related activities and contexts (6). A long history of research suggests that gendered socialization causes women to be more interested in social tasks that allow them to interact with and help others (5)(6)(7)(8)(9). To the extent women perceive their social interests to be mismatched with the work environment and those who occupy it, women may be reluctant to pursue careers in CS (4;5;9). However, the extent to which computer scientists are uninterested in social tasks is debatable.
Despite being presented as representative, the U.S. government's o cial pro les of computer scientists' interests are not informed by actual computer scientists but by the subjective judgement of graduate students in vocational psychology, who may not have any particular experience in the professions they characterize (2,11). Relying entirely on the human judgment of psychology graduate students to create o cial interest pro les as it does (2,(10)(11), o cial statistics may inadvertently perpetuate biases about who should, and should not, pursue careers in CS. Such statistics are widely used by a wide array of private citizens, educational institutions, and other organizations promoting career and workforce development (2).
The present study is the rst to test whether o cial depictions of the career interests of computer scientists re ect realities in the early 21st century. To do this, we describe the occupational interests of actual and aspiring computer scientists -known as interest pro les. We then compare these interest pro les to those assumed by the U.S. government. We nd that actual pro les are not just more varied than assumed -but may also appeal to women more readily.
We used Latent Pro le Analysis (LPA), a probabilistic and person-centered statistical technique (12), to identify groups of people (i.e., latent pro les) who share similar response patterns to the dominant framework of vocational interests: Holland's RIASEC model that characterizes interests as: Realistic (i.e., interest in practical tasks), Investigative (i.e., interests in analytic activities), Artistic (i.e., interests in creative work), Social (i.e., interests in working with others), Enterprising (i.e., interests in selling, leading), and Conventional (interests in working with numbers and machines; 3). These occupational interests are described further in the Methods section. Following best practices (13), we conducted two LPAs using samples of 500 people randomly selected from the following two groups: (1) interested but not employed in computer science and (2) actual computer scientists (see Method). This gave us a total sample size of 1,000.

Results
To examine the interest pro les for those interested and employed in computer science, we rst conducted latent pro le analyses (LPA) on each of the sub-samples: unemployed adults interested in computer science (n = 500) and adults employed in computer science (n = 500).
Pro les in this study are de ned solely based on their variance on the occupational interests, allowing us to test what pro les naturally emerge for those interested in and employed in CS. Model t indices were used to determine the optimal number of pro les. The number of pro les can differ between samples and analyses, with some models being more parsimonious (having fewer pro les) than others.
Aligned with previous research (13-16), we estimated models ranging from two to ten pro les, preferring solutions that did not have classes with a small number of individuals. We used Akogul and Erisoglu's (2017) analytic hierarchy process (AHP) to select the model with the optimal number of pro les. The AHP takes a holistic approach when considering information criteria, such as Akaike's Information Criterion (AIC) and Bayesian Information Criterion (BIC), to select the best model. The AHP has been tested on common real and synthetic datasets and found to produce more accurate results than relying on one type of information criteria alone (17). We found four distinct pro les for those employed in CS (AIC = 9827.48, BIC = 10029.78) and three pro les for those who are unemployed but interested in CS (AIC = 10045.42, BIC = 10218.22). Names and descriptions for each latent pro le for both employed and aspiring computer scientists are provided in Table 1.
An o cial pro le for CS was created by aggregating O*NET projected interest scores for each job included in the sub-samples. Latent interest pro les resembling the o cial pro le were found in both samples and these pro les are referred to as "Stereotypical". However, only 30% of actual computer scientists sampled t into the "Stereotypical" pro le. The other 70% of computer scientists belonged to one of the three following groups: Artistic, Uninterested, and Multi-Interested. Figures 1 and 2 show how these distinct pro les compare to the o cial pro le established by the US government. In corresponding colors, difference scores are provided for each latent pro le, showing the distance from the latent pro les and O*NET projections for each interest type. Each of the four groups show higher social interests than the o cial pro le. Note. Interests with scores equal or greater than 3.5 were labeled as "Higher". Interests with scores lower than 3.5 were labeled as "Lower".
Chi-square tests were then used to determine whether signi cant gender differences existed within each pro le. Gender was signi cantly related to pro le membership for both currently employed (χ2(3) = 58.35, p<.001) and aspiring (χ2(2) = 21.65, p<.001) computer scientists. Groups containing relatively larger proportions of women least resemble the o cial pro le (see Fig. 1). Across samples, the Artistic group contained the largest proportion of women and differed the most from the o cial pro le. Differing the least from the o cial pro le, the Stereotypical groups are made up predominantly of men.

Discussion
The central nding of this investigation is that the occupational interest pro les of computer scientists are more diverse than previously assumed. The U.S. Department of Labor has portrayed computer scientists as relatively uninterested in social tasks (18). We found the interest pro les of actual computer scientists do not re ect this pattern and are generally much more varied. The social interests of actual and aspiring computer scientists appear to be widely underestimated. While computer scientists are generally portrayed as having very little interest in social tasks (4)(5)14), we found they actually have moderate to high social interests.
Men and women also differed in their pro le membership. Aligned with previous research (5, 7-9), we found women to be over-represented in pro les with high interest in social tasks. For example, the group that is most interested in helping others and creating things (i.e., Artistic) are over-represented by women in both samples. Therefore, the argument that women opt out of CS careers due to a lack of interest appears to be built on faulty premises, both about the actual interests of women and about the characteristics of CS jobs. All occupational pro les, regardless of gender, showed higher interest in social tasks than o cially reported. Computer scientists appear to have a diverse set of interests, and many are interested in working with and helping others.
Women's lack of representation in CS careers not only denies society the advantage of their abilities and perspectives but also limits their participation in many well paid, high-growth professions (5). Displaying a more nuanced and comprehensive picture of computer scientists' interests might help address women's underrepresentation in CS. Unfortunately, that nuance is not often captured in the US government's standard recommendations. To the extent that current recommendations are internalized, and used in practice, by teachers, mentors, and others, people may be discouraged from CS careers who might otherwise be engaged and successful. Therefore, we urge guidance counselors, researchers, and policy makers to use these o cial pro les with caution until more resources are focused toward collecting and maintaining high quality data. From this larger sample, we randomly selected 500 participants from the following two groups: (1) employed in CS and (2) unemployed and interested in CS.

Employed in CS
From the larger sample described above, we randomly selected 500 participants who were employed in CS-related careers. The demographics of this group were similar to the group of aspiring computer scientists (described below). Of these 500 computer scientists, 56% were men and 44% were women. Most participants had either an undergraduate (49%) or postgraduate (33%) degree. This sample represented the ages typical of working Americans. Only 11% were between the ages of 18-25 years old, 28% between 26-33 years old, 32% between 34-45 years old, 17% between 46-55 years old, 11% between 56-65 years old, and only 2% were 66 years old or older.

Unemployed and Interested in CS
We also randomly selected 500 participants from the larger sample who identi ed themselves as being unemployed and interested in CS related careers. Of these 500 individuals, 52% were men and 48% were women. Like the larger sample they were selected from, most had either an undergraduate (43%) or postgraduate (29%) degree. Different ages were also well represented; 26% were between 18-25 years old, 17% between 26-33 years old, 19% between 34-45 years old, 12% between 46-55 years old, 13% between 56-65 years old, and 12% were 66 years old or older.

Individual-Level Occupational Interests
A shortened version of the popular Personal Globe Inventory (PGI; 20-21) was used to assess individuals' occupational interests. This shortened version was developed using item response theory (22) and has been validated (21)(22). Participants were asked how much they enjoyed 20 different work activities on a scale from 1 ("Strongly dislike") to 7 ("Strongly like"). Each activity is tied to one of the following six RIASEC career interest types: Realistic, Investigative, Artistic, Social, Enterprising, and Conventional. For each interest type, the corresponding de nitions, reliability coe cients, and items are listed below. Reliability was calculated using the larger, overall sample as described in Glosenberg et al., 2019.
Realistic (omega reliability coe cient = .78): Involves concrete practical activities and the use of machines, tools, and materials. Realistic interests were measured by asking participants how much they would enjoy the following tasks: (1) Install electrical wiring and (2) Oversee building construction.
Investigative (omega reliability coe cient = .65): Involves analytical or intellectual activity aimed at troubleshooting, creative or knowledge use. Investigative interests were measured by asking participants how much they would enjoy the following tasks: (1) Categorize different types of wildlife and (2) Write a scienti c article.
Artistic (omega reliability coe cient = .92): Involves creating work in music, writing, performance, sculpture, or unstructured intellectual endeavors. Artistic interests were measured by asking participants how much they would enjoy the following tasks: (1) Sculpt a statue, (2) Paint a portrait.
Social (omega reliability coe cient =.63): Involves working with others in a helpful or facilitative way. Social interests were measured by asking participants how much they would enjoy the following tasks: (1) Seat patrons at a restaurant, (2) Interview people for a survey, (3) Help children with learning problems, and (4) Teach people to dance.
Enterprising (omega reliability coe cient =.70): Involves selling, leading, and manipulating others to attain personal or organizational goals. Enterprising interests were measured by asking participants how much they would enjoy the following tasks: (1) Oversee a hotel, (2) Manage an o ce, Interview people for a survey, and (4) Seat patrons at a restaurant.
Conventional (omega reliability coe cient =.85): Involves working with things, numbers, or machines to meet predictable organizational demands or standards. Conventional interests were measured by asking participants how much they would enjoy the following tasks: (1) Prepare nancial reports, (2) Oversee a data analyst group, (3) Maintain o ce nancial records, and (4) Manage an electrical power station.

O cial O*NET Occupational Interest Pro les
Occupational Interest Pro les (OIPs) were assessed using the RIASEC scores assigned to each job by the U.S. Department of Labor's detailed occupational database of incumbent workers, the Occupational Information Network (23)(24).
To calculate Occupational Interest Pro les (OIPs), O*NET used two teams of three vocational psychology graduate students to establish interest scores for each occupation included in their database. These teams read information about each job's tasks, requirements, and generalized work activities to provide RIASEC ratings for over 900 jobs. Although the information provided to raters (e.g., job's tasks, requirements, work activities) is gathered by a strati ed randomized sampling and surveying of actual job incumbents across the United States (24), the teams of graduate students, and these teams of graduate students alone, decided the appropriate interest pro les for each job.
Participants of our survey identi ed their job title using a dynamic keyword search that matched their entered job title in lay terms (e.g., teacher, farmer) and the exact job titles used by O*NET (e.g., elementary school teachers, farmworkers).  Table 2.   Table 2) are available at https://osf.io/f5nrm/?view_only=7b9360268ba14b40a0e051cf3a5020ef.

Figure 2
Interest Pro les of Those Interested in CS Compared to the O cial Pro le.