1 Introduction

The mobility restrictions imposed on billions of people during the COVID-19 pandemic in the first half of 2020 successfully decreased the reproduction rate of the virus (Rocklöv et al. 2020; World Health Organization 2020). However, quarantine and isolation also come with tremendous costs on people’s well-being (Brooks et al. 2020) and productivity (Lipsitch et al. 2020).

While prior research (Brooks et al. 2020) identified numerous factors either positively or negatively associated with people’s well-being during disastrous events, most of this research was cross-sectional and included a limited set of predictors. Further, whether productivity is affected by disastrous events and, if so, why precisely, has not yet been investigated in a peer-reviewed article to the best of our knowledge. This is especially relevant since many companies, including tech companies, have instructed their employees to work from home (Duffy 2020) at an unprecedented scope. Thus, it is unclear whether previous research on remote work (Donnelly and Proctor-Thomson 2015) still holds during a global pandemic while schools are closed, and professionals often have to work in non-work dedicated areas of their homes. It is particularly interesting to study the effect of quarantine on software engineers as they are often already experienced in working remotely, which might help mitigate the adverse effects of the lockdown on their well-being and productivity. Therefore, there is a compelling need for longitudinal applied research that draws on theories and findings from various scientific fields to identify variables that uniquely predict the well-being and productivity of software professionals during the first 2020 quarantine, for both the current and potential future lockdowns.

In the present research, we build on the literature discussed above to identify predictors of well-being and productivity. Additionally, we also include variables that were identified as relevant by other lines of research. Furthermore, we chose a different setting, sampling strategy, and research design than most of the prior literature. This is important for several reasons.

First, many previous studies included only one or a few variables, thus masking whether other variables primarily drive the identified effects. For example, while boredom is negatively associated with well-being (Farmer and Sundberg 1986), it might be that this effect is mainly driven by loneliness, as lonely people report higher levels of boredom (Farmer and Sundberg 1986) — or vice versa. Only by including a range of relevant variables it is possible to identify the primary variables, which can subsequently be used to write or update guidelines to maintain one’s well-being and productivity while working from home. Second, this approach simultaneously allows us to test whether models developed in an organizational context such as the two-factor theory (Herzberg et al. 2017) can also predict people’s well-being in general and whether variables that were associated with well-being for people being quarantined also explain productivity. Third, while previous research on the (psychological) impact of being quarantined (Brooks et al. 2020) is relevant, it is unclear whether this research is generalizable and applicable to the COVID-19 pandemic. In contrast to previous pandemics, during which only some people were quarantined or isolated, the COVID-19 pandemic strongly impacted billions globally. For example, previous research found that quarantined people were stigmatized, shunned, and rejected (Lee et al. 2005); this is unlikely to repeat as the majority of people are now quarantined. Fourth, research suggests (Karesh et al. 2012) that pandemics become increasingly likely due to a range of factors (e.g., climate change, human population growth) which make it more probable that pathogens such as viruses are transmitted to humans. This implies that it would be beneficial to prepare ourselves for future pandemics that involve lockdowns. Fifth, the trend to remote work has been accelerated through the COVID-19 pandemic (Meister 2020), which makes it timely to investigate which factors predict well-being and productivity while working from home. The possibility to study this under extreme conditions (i.e., during quarantine) is especially interesting as it allows us to include more potential stressors and distractors of productivity. This is critical. As outlined above, previous research on the advantages and challenges of remote work can presumably not be generalized to the population because mainly people from certain professions and specific living and working conditions might have chosen to work remotely. Sixth and finally, a longitudinal design allowed us to test for causal inferences. Specifically, in wave 1, we identified variables that explain unique variance in well-being and productivity, which we measured again in waves 2. This is important because it is possible that, for example, the amount of physical activity predicts well-being or that well-being predicts physical activity. Additionally, we are able to test whether well-being predicts productivity or vice versa — previous research found that they are interrelated (Krekel et al. 2019; Carolan et al. 2017).

The software engineering community has never before faced such a wide-scale lockdown and quarantine scenario during the global spread of the COVID-19 virus. As a result, we can not build on pre-existing literature to provide tailored recommendations for software professionals. Accordingly, in the present research, we integrate theories from the organizational (Herzberg et al. 2017) and psychological (Masi et al. 2011; Ryan and Deci 2000) literature, as well as findings from research on remote work (Lascau et al. 2019; Anderson et al. 2015; Bloom et al. 2015) and recommendations by health (NHS 2020a; Danish Health Authority 2020) and work (CIPD 2020) authorities targeted at the general population, from where we derived our independent variables (or predictors). This longitudinal investigation provides the following contributions:

  • First, by including a range of variables relevant to well-being and productivity, we are able to identify those variables that are uniquely associated with these two dependent variables for software professionals and thus help improve guidelines and tailor recommendations.

  • Second, a longitudinal design allows us to explore which variables predict (rather than are predicted by) well-being and productivity of software professionals.

  • Third, due to the current mobility restrictions imposed on billions of people we provided a unique study to understand the effects of working remotely on people’s well-being and productivity.

Our results are relevant to the software community because the number of knowledge workers who are at least partly working remotely is increasing (Gallup 2020), yet the impact of working remotely on people’s health and productivity is not well understood yet (Mann and Holdsworth 2003). So far, we have only evidence regarding to the working activity distribution of developers working from home during the lockdown, compared to a typical office day, which seems to be the same (Russo et al. 2021). We focus on well-being and productivity as dependent variables because both are crucial for our way of living. According to the Universal Declaration of Human Rights, well-being is a fundamental human right, and productivity allows us to maintain a certain standard of living and affect our overall well-being. For this reason, we investigated which are the most relevant factors associated with our two dependent variables. To do so, we started with those factors suggested by the literature (e.g., boredom, anxiety, routines) and validated those associations through multiple statistical analyses (Russo and Stol 2019). Thus, our research question is:

Research Question

What are the relevant predictors of well-being and productivity for software engineers working remotely during a pandemic?

In the remainder of this paper, we describe the related work about well-being in quarantine and productivity in remote work in Section 2, followed by a discussion about the research design of this longitudinal study in Section 3. The analysis is described in Section 4, and results are discussed in Section 5. Implications and recommendations for software engineers, companies, and any remote-work interested parties is then outlined in Section 6. Finally, we conclude this paper by outlying future research directions in Section 7.

2 Related Work

2.1 Well-Being in Quarantine

To slow down the spread of pandemics, it is often necessary to quarantine a large number of people (Rocklöv et al. 2020; World Health Organization 2020) and enforce social distancing to limit the spread of the infection (Anderson et al. 2020). This typically implies that only people working in essential professions such as healthcare, police, pharmacies, or food chains, such as supermarkets, are allowed to leave their homes for work. If possible, people are asked to work remotely from home. However, such measures are perceived as drastic and can have severe consequences on people’s well-being (Brooks et al. 2020; Lunn et al. 2020).

Previous research has found that being quarantined can lead to anger, depression, emotional exhaustion, fear of infecting others or getting infected, insomnia, irritability, loneliness, low mood, post-traumatic stress disorders, and stress (Sprang and Silman 2013; Hawryluck et al. 2004; Lee et al. 2005; Marjanovic et al. 2007; Reynolds et al. 2008; Bai et al. 2004). The fear of getting infected and infecting others, in turn, can become a substantial psychological burden (Kim et al. 2015; Prati et al. 2011). Also, a lack of necessary supplies such as food or water (Wilken et al. 2017) and insufficient information from public health authorities adds on to increased stress levels (Caleo et al. 2018). The severity of the psychological symptoms correlated positively with the duration of being quarantined and symptoms can still appear years after quarantine has ended (Brooks et al. 2020). This makes it essential to understand what differentiates those whose mental health is more negatively affected by being quarantined from those who are less strongly affected. However, a recent review found that no demographic variable was conclusive in predicting whether someone would develop psychological issues while being quarantined (Brooks et al. 2020). Moreover, prior studies investigating such predictors focused solely on demographic factors (e.g., age or number of children (Hawryluck et al. 2004; Taylor et al. 2008)). This suggests that additional research is needed to identify psychological and demographic predictors of well-being. For example, prior research suggested that a lack of autonomy, which is an innate psychological need (Ryan and Deci 2000), negatively affects people’s well-being and motivation (Calvo et al. 2020), yet evidence to support this claim in the context of a quarantine is missing.

To ease the intense pressure on people while being quarantined or in isolation, research and guidelines from health authorities provide a range of solutions on how an individual’s well-being can be improved. Some of these factors lie outside of the control for individuals, such as the duration of the quarantine, or the information provided by public authorities (Brooks et al. 2020). In this study, we therefore focus on those factors that are within the control of individuals. However, investigating such factors independently might make little sense since they are interlinked. For example, studying the relations between anxiety and stress with well-being in isolation is less informative, as both anxiety and stress are negatively associated with well-being (De Castella et al. 2014; Spitzer et al. 2006). However, knowing which of the two has a more substantial impact on people’s well-being above and beyond the other is crucial, as it allows inter alia policymakers, employers, and mental health support organizations to provide more targeted information, create programs that are aimed to reduce people’s anxiety or stress levels, and improve people’s well-being, since anxiety and stress are conceptually independent constructs. For example, stress has usually a more specific cause, is temporary, and easier to treat (e.g., by working less). In contrast, anxiety is more unspecific, longer-lasting, and can require professional attention (Johnston 2020). Thus, it is essential to study these variables together rather than separately.

2.2 Productivity in Remote Work

The containment measures not only come at a cost for people’s well-being but they also negatively impact their productivity. For example, the International Monetary Fund (IMF) estimated in October 2020 that the World GDP would drop by 4.4% as a result of the containment measures taken to reduce the spread of COVID-19 – with countries particularly hit by the virus, such as Italy, would experience a drop of over 10% (IMF 2020). This expected drop in GDP would be significantly larger if many people were unable to work remotely from home. However, previous research on the impact of quarantine typically focused on people’s mental and physiological health, thus providing little evidence on the effect on productivity of those who are still working. Luckily, the literature on remote work, also known as telework, allows us to get a broad understanding of the factors that improve and hinder people’s productivity during quarantine.

The number of people working remotely has been growing in most countries already before the COVID-19 pandemic (Owl Labs 2019; Gallup 2020). Of those working remotely, 57% do so for all of their working time. The vast majority of remote workers, 97% would recommend others to do the same (Buffer 2020), suggesting that the advantages of remote work outweigh the disadvantages. The majority of people who work remotely do so from the location of their home (Buffer 2020).

Working remotely has been associated with a better work-life balance, increased creativity, positive affect, higher productivity, reduced stress, and fewer carbon emissions because remote workers commute less (Owl Labs 2019; Buffer 2020; Anderson et al. 2015; Bloom et al. 2015; Vega et al. 2015; Baruch 2000; Cascio 2000). However, working remotely also comes with its challenges. For example, challenges faced by remote workers include collaboration and communication (named by 20% of 3,500 surveyed remote workers), loneliness (20%), not being able to unplug after work (18%), distractions at home (12%), and staying motivated (7%) (Buffer 2020). While these findings are informative, it is unclear whether they can be generalized. For instance, if mainly those with a long commute or those who feel comfortable working from home might prefer to work remotely, it would not be possible to generalize to the general working population.

A pandemic such as the one caused by COVID-19 in 2020 forces many people to work remotely from home. Being in a frameless and previously unknown work situation without preparation intensifies common difficulties in remote work. Adapting to the new environment itself and dealing with additional challenges adds on to the difficulties already previously identified and experienced by remote workers, and could intensify an individual’s stress and anxiety and negatively affect their working ability. The advantages of remote work might, therefore, be reduced or even be reversed. Substantial research is needed to understand further what enables people to work effectively from home while being quarantined (Kotera and Correa Vione 2020). The current situation shows how important research in this field is already. Forecasts indicate that remote work will grow on an even larger scale than it did over the past years (Owl Labs 2019; Gallup 2020), therefore research results on predictors of productivity while working remotely will increase in importance. Some guidelines have been developed to improve people’s productivity, such as the guidelines proposed by the Chartered Institute of Personnel and Development, an association of human resource management experts (CIPD 2020). Examples include designating a specific work area, wearing working clothes, asking for support when needed, and taking breaks. However, while potentially intuitive, empirical support for those particular recommendations is still missing.

Adding to the complexity, the measurement of productivity, especially in software engineering, is a debated issue, with some authors suggesting not to consider it at all (Ko 2019). Nevertheless, individual developer’s productivity has a long investigation tradition (Sackman et al. 1968). Prior work on developer productivity primarily focused on developing software tools to improve professionals’ productivity (Kersten and Murphy 2006) or identifying the most relevant predictors, such as task-specific measurements and years of experience (Dieste et al. 2017). Similarly, understanding relevant skillsets of developers that are relevant for productivity has also been a typical line of research (Li et al. 2015). Eventually, as La Toza et al. pointed out, measuring productivity in software engineering is not just about using tools; instead, it is about how they are used and what is measured (LaToza et al. 2020).

3 Research Design

There are dozens of definitions and operationalizations of well-being (Linton et al. 2016). In the present research, we adopt a common broad and global definition of subjective well-being, following (Diener et al. 2009) who defined well-being as “the fact that the person subjectively believes his or her life is desirable, pleasant, and good” (p. 1). In other words, well-being can be understood as whether a person is overall satisfied with their lives and believes the conditions of their lives are excellent (Diener et al. 1985). Psychological variables such as anxiety, loneliness, or stress can be understood as parts of general well-being or as determinants thereof (Keyes and Waterman 2003). We consider those variables as determinants and assess the degree with which variables play a role in software engineers’ overall well-being.

The variables we plan to measure in the present two-wave longitudinal study are displayed in Fig. 1. To facilitate its interpretation, we categorized the variables into four broad sets of predictors, partly overlapping. To summarize, while the initial selection of predictors is theory-driven, based on previous research, or recent guidelines, the selection of predictors included in the second wave is data-driven. In other words, we used a two-step approach to select our variables: First, the initial selection of 51 predictors as based on existing theory, which we then reduced based on how strongly they are associated with well-being and productivity for an initial multiple regression analysis and the subsequent longitudinal analysis. This approach helped us to focus on the most relevant predictors while keeping their amount manageable.

Fig. 1
figure 1

Overview of the independent and dependent variables

During the COVID-19 pandemic, many governments and organizations have called for volunteers to support self-isolation (see, for example, NHS 2020b, City of New York 2020). While also relevant to the community at large, research suggests that acts of kindness positively affect people’s well-being (Buchanan and Bardi 2010). Additionally, volunteering has the benefit of leaving one’s home for a legitimate reason and reducing cabin fever. We, therefore, decided to include volunteering as a potential predictor for well-being.

Coping strategies such as making plans or reappraising the situation are, in general, effective for one’s well-being (Webb et al. 2012; Carver et al. 1989). For example, altruistic acceptance — accepting restrictions because it is serving a greater good — while being quarantined was negatively associated with depression rates three years later (Liu et al. 2012). Conversely, believing that the quarantine measures are redundant because COVID-19 is nothing but ordinary flu or was intentionally released by the Chinese government (i.e., beliefs in conspiracy theories) will likely lead to dissatisfaction because of greater feelings of non-autonomy. Indeed, beliefs in conspiracy theories are associated with lower well-being (Freeman and Bentall 2017).

We further propose that three needs are relevant to people’s well-being and productivity (Ryan and Deci 2000). Specifically, we propose that the need for autonomy and competence are deprived of many people who are quarantined, which negatively affects well-being and motivation (Calvo et al. 2020). Further, we propose that the need for competence was deprived, mainly for the people who cannot maintain their productivity-level. This might especially be the case for those living with their families. In contrast, the need for relatedness might be over satisfied for those living with their families.

Another important factor associated with one’s well-being is the quality of one’s social relationships (Birditt and Antonucci 2007). As people have fewer opportunities to engage with others they know less well, such as colleagues in the office or their sports teammates, the quality of existing relationships becomes more important, as having more good friends facilitates social interactions either in person (e.g., with their partner in the same household) or online (e.g., video chats with friends).

Moreover, we expect that extraversion is linked to well-being and productivity. For example, extraverted people prefer more sensory input than introverted people (Ludvigh and Happ 1974), which is why they might struggle more with being quarantined. Extraversion correlated negatively with support for social distancing measures (Carvalho et al. 2020), which is a proxy of stimulation (e.g., being closer to other people, will more likely result in sensory stimulation). Finally, research on productivity predictors while working from home can be theoretically grounded in models of job satisfaction and productivity, such as Herzberg’s two-factor theory (Herzberg et al. 2017). This theory states that causes of job satisfaction can be clustered in motivators and hygiene factors. Motivators are intrinsic and include advancement, recognition, work itself, growth, and responsibilities. Hygiene factors are extrinsic and include the relationship with peers and supervisor, supervision, policy and administration, salary, working conditions, status, personal link, and job security. Both factors are positively associated with productivity (Bassett-Jones and Lloyd 2005). As there are little differences between remote and on-site workers in terms of motivators and hygiene factors (Green 2009), the two-factor theory provides a good theoretical predictor of productivity of people working remotely.

3.1 Participants

Our two-wave study covers an extensive set of 51 predictors, as identified above. Based on the literature mentioned earlier, we expected the strength of the association between the predictors and the outcomes’ well-being and productivity to vary between medium to large. Therefore, we assumed for our power analysis a medium-to-large effect size of f2 = .20 and a power of .80. Power analysis with G*Power 3.1.9.4 (Faul et al. 2009) revealed that we would need a sample size of 190 participants.

To collect our responses, we used Prolific,Footnote 1 a data collection platform, commonly used in Computer Science (see e.g., Hosio et al. 2020). We opted for this solution because of the high reliability, replicability, and data quality of dedicated platforms, especially as compared with e.g. mailing lists (Peer et al. 2017; Palan and Schitter 2018).

Specifically, the use of crowdsourcing platforms allows us to (i) avoid overloading members of mailing lists or groups on social media (e.g., LinkedIn, Discord) with unsolicited participation requests; (ii) recruit participants of the target population (e.g., only software engineers) using automatic screening option, or by running ad hoc screening studies; (iii) recruit only participants who are interested in the research; (iv) have a high degree of control with regards to data quality since participants can get reputed without paying them and lowering their acceptance rate, which will influence future recruitment; (v) compensate participants for their time so that they will take care of the responses due to a contractual obligation; and (vi) minimize self-selection bias, since potential candidates are randomly assigned to each study (if they meet the inclusion criteria), lowering the probability that opinionated individuals take part to the survey. In sum, it is a convenient, fair, and efficient way to recruit survey informants (Bethlehem 2010). For these reasons, crowdsourcing platforms are commonly used in studies published in top-tier outlets (Anumanchipalli et al. 2019; Kraft-Todd et al. 2018; Berens et al. 2020).

To administer the surveys, we used QualtricsFootnote 2 and shared it on the Prolific platform. In order to ensure data quality and consistency, and to account for potential dropout of participants between the two waves, we invited almost 500 participants who were identified as software engineers in a previous study (Russo and Stol 2020) to participate in a screening study in April 2020. The 483 candidates already passed a multi-stage screening process, as described by Russo & Stol, to ensure the highest possible data quality through cluster sampling (Baltes and Ralph 2020).

To run a coherent and reliable investigation, we only recruited software engineers who were living similar experiences both from a professional and personal perspective (i.e., working remotely during a lockdown). Thus, we performed a screening study completed by 305 software professionals who agreed to participate in a multi-wave study. From the 305 candidates, we excluded those living in countries with unclear, mixed policies or early reopening (e.g., Denmark, Germany, Sweden) and professionals working from home during the lockdown less than 20h a week (i.e., excluding unemployed, or developers which had to work in their offices). In both waves, all participants stated that they were working from home during the lockdown (a negative answer of one of these two conditions would have resulted in discarding the delivered responses from our data set).

As a result of this screening, in the first wave of data collection, which took place in the week of April 20 - 26 2020, 192 participants completed the first survey. Participation in the second wave (May 4 - 10) was high (96%), with 184 completed surveys. Participants have been uniquely identified through their Prolific ID, which was essential to run the longitudinal analysis while allowing participants to remain anonymous.

Additionally, to enhance our responses’ reliability, in each survey we included three test items (e.g., “Please select response option ‘slightly disagree”’). As none of our participants failed at least two of the three test items, all participants reported working remotely and answered the survey in an appropriate time frame, and we did not exclude anyone.

The 192 participants’ mean age was 36.65 years (SD = 10.77, range= 19 −− 63; 154 men, 38 women). Participants were compensated in line with the current US minimum wage (average completion time 1202 seconds, SD = 795.41). Out of our sample of 192 participants, 63 were based in the UK, 52 were based in the USA, 19 from Portugal, 10 from Poland, 7 from Italy, 6 from Canada, and the remaining 35 participants from other countries in Europe. A minority of 30 participants reported living alone, with most participants (162) reported living together with others – including babies, children, and adults. Our participants are employed primarily at private companies (156), followed by 30 participants employed at a public institution. Six participants indicated to work either for a different type of company or were unsure how to categorize their employer. When asked in our screening study what percentage of their time participants were working remotely (i.e., not physically in their office) over the past 12 months, 54.7% reported 25% or less of their time, 15.6% between 25% – 50%, 2.1% between 50% – 75%, and 27.1% of the participants to work remotely for at least 75% of their time.

3.2 Longitudinal Design

We employed a longitudinal design, with two waves set two-weeks apart from each other towards the end of the lockdown, which allowed us to test for internal replication. Also, running this study towards the end of the lockdowns in the vast majority of countries allowed participants to provide a more reliable interpretation of lockdown conditions. We chose a period of two weeks because we wanted to balance change in our variables over time with the end of a stricter lockdown that was discussed across many countries when we run wave 2. Many of our variables are thought to be stable over time. That is, a person’s scores on X at time 1 is strongly predictive of a person’s scores on X at time 2 (indeed, the test-retest reliabilities we found support this assumption, see Table 1). The closer the temporal distance between wave 1 and 2, the higher the stability of a variable. In other words, if we had measured the same variables again after only one or two days, there would not have been much variance that could have been explained by any other variable, because X measured at time 1 already explains almost all variance of X measured at time 2. In contrast, we aimed to collect data for wave 2 while people were still quarantined. If at time 1 of the data collection people would still be in lockdown and at time 2 the lockdown would have been eased, this would have included a major confounding factor. Thus, to balance those two conflicting design requirements, we opted for a two weeks break in between the two waves.

Table 1 Correlations r at time 1 and 2, unstandardized regression coefficients B, and test-retest reliabilities rit

We describe the measures of the two dependent (or outcome) variables in Section 3.3. Predictors (or independent variables) are explained in Sections 3.43.53.6, and 3.7. Wherever possible, we relied on validated scales. If this was not possible (e.g., COVID-19 specific conspiracy beliefs), we created a scale. In those cases, we followed scale development guidelines, including avoiding negatives and especially double-negatives, two-statements within one item, and less common expressions (Boateng et al. 2018). The questionnaires are reported in the Supplemental Materials, while the summary of the measurement instruments with their readabilities are listed in Table 9. Test score reliability has been measured using Cronbach’s alpha and reported for each instrument. If the instrument was used in wave 1 and wave 2, we report both Cronbach’s alpha values (i.e., αtime1, αtime2); if we used it only in the first wave, we reported only the result for wave 1 (α1) Additionally, we also explore whether there are any mean changes in the variables we measured at both times (e.g., has people’s well-being changed?), and mean differences between gender and people based on different countries.

3.3 Measurement of the Dependent Variables

Well-Being

was measured with an adapted version of the 5-item Satisfaction with Life Scale (Diener et al. 1985). We adapted the items to measure satisfaction with life in the past week, which is in line with recommendations that the scale can be adapted to different time frames (Pavot and Diener 2009). Example items include “The conditions of my life in the past week were excellent” and “I was satisfied with my life in the past week”. Responses were given on a 7-point Likert scale ranging from 1 (Strongly disagree) to 7 (Strongly agree, αtime1 = .90, αtime2 = .90).

Productivity

was measured relative to the expected productivity. We contrasted productivity in the past week with the participant’s expected productivity (i.e., productivity level without the lockdown). As we recruited participants working in different positions, including freelancers, we can neither use objective measures of productivity nor supervisor assessments and rely on self-reports. We expect limited effects of socially desirable responses as the survey was anonymous. We operationalized productivity as a function of time spent working and efficiency per hour, compared to a normal week. Specifically, we asked participants: “How many hours have you been working approximately in the past week?” (Item P1) and “How many hours were you expecting to work over the past week assuming there would be no global pandemic and lockdown?” (Item P2). Finally, to measure perceived efficiency, we asked: “If you rate your productivity (i.e., outcome) per hour, has it been more or less over the past week compared to a normal week?” (Item P3). Responses to the last item were given on a bipolar slider measure ranging from ‘100% less productive’ to ‘0%: as productive as normal’ to ‘≥ 100% more productive’ (coded as -100, 0, and 100). To compute an overall score of productivity for each participant, we used the following formula: productivity = (P1/P2) × ((P3 + 100)/100). Values between 0 and .99 would reflect that people were less productive than normal, and values above 1 would indicate that they were more productive than usual. For example, if one person worked only 50% of their normal time in the past week but would be twice as efficient, the total productivity was considered the same compared to a normal week.

We preferred this approach over the use of other self-report instruments, such as the WHO’s Health at Work Performance Questionnaire (Kessler et al. 2003), because we were interested in the change of productivity while being quarantined as compared to ‘normal’ conditions. The WHO’s questionnaire, for example, assesses productivity also in comparison to other workers. We deemed this unfit for our purpose as it is unclear to what extent software engineers who work remotely are aware of other workers’ productivity. Also, our measure consists of only three items and showed good test-retest reliability (Table 1). Test-retest reliability is the agreement or stability of a measure across two or more time-points. A coefficient of 0 would indicate that responses at time 1 would not be linearly associated with those at time 2, which is typically undesired. Higher coefficients are an additional indicator of the reliability of the measures, although they can be influenced by a range of factors such as the internal consistency of the measure itself and external factors. For example, the test-retest reliability for productivity is r = .50 lower than for most other variables such as needs or well-being, but this is because the latter constructs are operationalized as stable over time. In contrast, productivity can vary more extensively due to external factors such as the number of projects or the reliability of one’s internet connection.Footnote 3

3.4 Psychological Factors

Self-Discipline

was measured with 3-items of the Brief Self-Control Scale (Tangney et al. 2004). Example items include “I am good at resisting temptation” and “I wish I had more self-discipline” (recoded). Responses were registered on a 5-point scale ranging from 1 (Not at all) to 5 (Very; α = .64).

Coping Strategies

were measured using the 28-item Brief COPE scale, which measures 14 coping dimensions (Carver 1997). Example items include “I’ve been trying to come up with a strategy about what to do” (Planning) and “I’ve been making fun of the situation” (Humor). Responses were on a 5-point scale ranging from 0 (I have not been doing this at all) to 4 (I have been doing this a lot). The internal consistencies were satisfactory to very good for two-item scales: Self-distraction (α = .65), active coping (α = .61), Denial (α = .66), Substance use (α = .96), Use of emotional support (α = .77), Use of instrumental support (α = .75), Behavioral disengagement (α1 = .76, α2 = .71), Venting (α = .65), Positive reframing (α = .72), Planning (α = .76), Humor (α = .83), Acceptance (α = .61), Religion (α = .83), and Self-blame (α1 = .75, α2 = .71).

Loneliness

was measured using the 6-item version of the De Jong Gierveld Loneliness Scale (Gierveld and Tilburg 2006). The items are equally distributed among two factors, emotional; α1 = .68, α2 = .69) (e.g., “I often feel rejected”) and social; α1 = .84, α2 = .87 (e.g., “There are plenty of people I can rely on when I have problems”). Participants indicated how lonely they felt during the past week. Responses were given on a 5-point scale ranging from 1 (Not at all) to 5 (Every day).

Compliance

with official recommendations was measured using three items of a compliance scale (Wolf and Maio 2020). The items are ‘Washing hands thoroughly with soap’, ‘Staying at home (except for groceries and 1x exercise per day)’ and ‘Keeping a 2m (6 feet) distance to others when outside.’ Responses were given on a 7-point scale ranging from 1 (never complying to this guideline) to 7 (always complying to this guideline, α = .71).

Anxiety

was measured using an adapted version of the 7-item Generalized Anxiety Disorder scale (Spitzer et al. 2006). Participants indicate how often they have experienced anxiety over the past week to different situations. Example questions are “Feeling nervous, anxious, or on edge” and “Not being able to stop or control worrying”. Responses were given on a 5-point scale ranging from 1 (Not at all) to 5 (Every day, α1 = .93, α2 = .93). Additionally, we measured specific COVID-19 and future pandemic related concerns with two items “How concerned do you feel about COVID-19?” and “How concerned to you about future pandemics?” Responses on this were given by a 5-point scale ranging from 1 (Not at all concerned) to 5 (Extremely concerned; α = .82) (Nelson et al. 2020).

Stress

was measured using a four-item version of the Perceived Stress Scale (Cohen 1988). Participants indicate how often they experienced stressful situations in the past week. Example items include “In the last week, how often have you felt that you were unable to control the important things in your life?” and “In the last week, how often have you felt confident about your ability to handle your personal problems?”. Responses were registered on a 4-point scale ranging from 1 (Never) to 4 (Very often; α1 = .80, α2 = .77).

Boredom

was measured using the 8-item version (Struk et al. 2017) of the Boredom Proneness Scale (Farmer and Sundberg 1986). Example items include “It is easy for me to concentrate on my activities” and “Many things I have to do are repetitive and monotonous”. Responses were on a 4-point Likert scale ranging from 1 (Strongly disagree) to 7 (Strongly agree; α1 = .87, α2 = .87).

Daily Routines

was measured with five items: “I am planning a daily schedule and follow it”, “I follow certain tasks regularly (such as meditating, going for walks, working in timeslots, etc.)”, “I am getting up and going to bed roughly at the same time every day during the past week”, “I am exercising roughly at the same time (e.g., going for a walk every day at noon)”, and “I am eating roughly at the same time every day”. Responses were taken on a 7-point Likert scale ranging from 1 (Does not apply at all) to 7 (Fully applies; α1 = .75, α2 = .78).

Conspiracy Beliefs

was measured with a 5-item scale as designed by ourselves for this study. The first two items were adapted from the Flexible Inventory of Conspiracy Suspicions (Wood 2017), whereas the latter three are based on more specific conspiracy beliefs: “The real truth about Coronavirus is being kept from the public.”, “The facts about Coronavirus simply do not match what we have been told by ‘experts’ and the mainstream media”, “Coronavirus is a bio-weapon designed by the Chinese government because they are benefiting from the pandemic most”, “Coronavirus is a bio-weapon designed by environmental activists because the environment is benefiting from the virus most”, and “Coronavirus is just like a normal flu”. Responses were collected on a 7-point Likert scale ranging from 1 (Totally disagree) to 7 (Totally agree, α = .83).

Extraversion

was measured using the 4-item extraversion subscale of the Brief HEXACO Inventory (de Vries 2013). Responses were given on a 5-point Likert scale ranging from 1 (Strongly disagree) to 5 (Strongly agree; α1 = .71, α2 = .69). Low scores on extraversion are an indication of introversion. Since we found at wave 1 that extraversion and well-being were positively correlated contrary to our hypothesis (see below), and, in our view, contrary to widespread expectations, we decided to measure in wave 2 what participants’ views are regarding the association between extraversion and well-being. We measured expectations with one item: “Who do you think struggles more with the current pandemic, introverts or extraverts?” Response options were ‘Introverts’, ‘Both around the same’, and ‘extraverts’.

Autonomy, Competence, and Relatedness

needs of the self-determination theory (Ryan and Deci 2000) was measured using the 18-item balanced measure of psychological needs scale (Sheldon and Hilpert 2012). Example items include “I was free to do things my own way’ (need for autonomy; α1 = .72, α2 = .76), “I did well even at the hard things” (competence; α1 = .77, α2 = .77), and “I felt unappreciated by one or more important people” (recoded; relatedness; α1 = .79, α2 = .78). Participants were asked to report how true each statement was for them in the past week. Responses were given on a 5-point scale ranging from 1 (no agreement) to 5 (much agreement).

Extrinsic and Intrinsic Work Motivation

was measured with the 6-item extrinsic regulation 3-item and intrinsic motivation subscales of the Multidimensional Work Motivation Scale (Gagné et al. 2015). The extrinsic regulation subscale measures social and material regulations. Specifically, participants were asked to answer some questions about why they put effort into their current job. Example items include “To get others’ approval (e.g., supervisor, colleagues, family, clients ...)” (social extrinsic regulation; α = .85), “Because others will reward me nancially only if I put enough effort in my job (e.g., employer, supervisor...)” (material extrinsic regulation; α = .71) and “Because I have fun doing my job” (intrinsic motivation; α = .94). Responses were given on a 7-point scale ranging from 1 (not at all) to 7 (completely).

Mental Exercise

was measured with two items: “I did a lot to keep my brain active” and “I performed mental exercises (e.g., Sudokus, riddles, crosswords)”. Participants indicated the extent to which the items were true for them in the past week on a 7-point scale ranging from 1 (Not at all) to 7 (Very; α = .56).

Technical Skills

was measured with one item: “How well do your technological skills equip you for working remotely from home?” Responses were given on a 7-point scale ranging from 1 (Far too little) to 7 (Perfectly).

3.5 Physiological Factors

Diet

was measured with two items (European Social Survey 2014): “How often do you eat fruit, excluding drinking juice?” and “How often do you eat vegetables or salad, excluding potatoes?”. Responses were given on a 7-point scale ranging from 1 (Never) to 7 (Three times or more a day; α = .60)

Quality of Sleep

was measured with one item: “How has the quality of your sleep overall been in the past week?” Responses were given on a 7-point scale ranging from 1 (very low) to 7 (perfectly).

Physical Activity

was measured with an adapted version of the 3-item Leisure Time Exercise Questionnaire (Godin and Shephard 1985). Participants were be asked to report how many hours in the past they have been mildly, moderately, and strenuously exercising. The overall score was computed as followed Godin and Shephard (1985): 3 × mild + 5 × moderate + 9 × strenuously. Missing responses for one or more of the exercise were treated as 0.

3.6 Social Factors

Quality and Quantity of Social Contacts Outside of Work

were measured with three items. We adapted two items from the social relationship quality scale (Birditt and Antonucci 2007) and added one item to measure the quantity: “I feel that the people with whom I have been in contact over the past week support me”, “I feel that the people with whom I have been in contact over the past week believe in me”, and “I am happy with the amount of social contact I had in the past week.” Responses were given on a 6-point Likert scale ranging from 1 (Strongly disagree) to 6 (Strongly agree; α1 = .73, α2 = .77).

Volunteering

was measured with three items that measure people’s behavior over the past week: “I have been volunteering in my community (e.g., supported elderly or other people in high-risk groups)”, “I have been supporting my family (e.g., homeschooling my children)” and “I have been supporting friends, and family members (e.g., listened to the worries of my friends)”. Responses were given on a 7-point scale ranging from 1 (Not at all) to 7 (Very often; α = .45).

Quality and Quantity of Communication with Colleagues and Line Managers

was measured with three items: “I feel that my colleagues and line manager have been supporting me over the past week”, “I feel that my colleagues and line manager believed in me over the past week”, and “Overall, I am happy with the interactions with my colleagues and line managers over the past week.” Responses were given on a 6-point Likert scale ranging from 1 (Strongly disagree) to 6 (Strongly agree; α1 = .88, α2 = .92).

3.7 Situational Factors and Demographics

Distractions at Home

was measured with two items: “I am often distracted from my work (e.g., noisy neighbors, children who need my attention)” and “I am able to focus on my work for longer time periods” (recoded). Responses were given on a 5-point scale ranging from 1 (Not at all) to 5 (Very often; α1 = .64, α2 = .63).

Whether participants lived alone or with other people was assessed by asking them how many Babies, Toddlers, Children, Teenagers, and Adults participants were currently living with. We asked for the specific five groups separately because it allowed us to explore whether, for example, toddlers had a different impact on well-being and productivity than teenagers. However, the number of babies, toddlers, children, teenagers, and adults the participants were living with was uncorrelated to their well-being and productivity, r s ≤ .19. Therefore, we summed them up into one variable, which we called people (i.e., the number of people the participant was living with).

Financial Security

was measured with two items that reflect the current but also the expected financial situation (Glei et al. 2019): “Using a scale from 0 to 10 where 0 means ‘the worst possible financial situation’ and 10 means ‘the best possible financial situation’, how would you rate your financial situation these days?” and “Looking ahead six months into the future, what do you expect your financial situation will be like at that time?”. Responses were given on a 11-point scale ranging from 0 (the worst possible financial situation) to 10 (the best possible financial situation; α = .81).

Office Set-Up

was measured with three items: “In my home office, I do have the technical equipment to do the work I need to do (e.g., appropriate PC, printer, stable and fast internet connection)”, “On the computer or laptop I use while working from home I do have the software and access rights I need”, and ‘My office chair and desk are comfortable and designed to prevent back pain or other related issues”. Responses were given on a 7-point Likert scale ranging from 1 (Strongly disagree) to 7 (Strongly agree; α = .65).

Demographic Information

were assessed with the following items: “What is your gender?”, “How old are you?” “What type of organization do you work in” (public, private, unsure, other), “What is your yearly gross income?” (US$< 20,000, US$20 − 40,000, US$40.001 − 60,000, US$60,001 − 80,000, US$80,001 − 100,000, >US$100,000; converted to the participant’s local currency), “In which country are you based?”, “What percentage of your time have you been working remotely (i.e., not physically in your office) over the past 12 months?”, “In which region/state and country are you living?”, “Is there still a lockdown where you are living?”.

4 Analysis

The data analysis consists of two parts. First, we used the data from time 1 to identify the variables that explain variance in participant well-being and productivity beyond the other variables. Second, we used the Pearson product-moment correlation coefficient (r) to identify which variables were correlated with at least r = .30 with well-being and productivity, to test whether they predict our two outcomes over time. r is an effect size which expresses the strength of the linear relation between two variables. We used .30 as a threshold as we are interested in identifying variables correlated with at least a medium-sized magnitude (Cohen 1992) with one or both of our outcome variables. Also, a correlation of ≥.30 indicates that the effect is among the top 25% in individual difference research (Gignac and Szodorai 2016). Finally, selecting an effect size of this magnitude provides an effective type-I error control, as in total, we performed 103 correlation tests at time 1 alone (51 independent variables correlated with the two dependent variables, which were also correlated among each other). Given a sample size of 192, this effectively changes our alpha level to .0001, which is conservative. This means that it is improbable that we erroneously find an effect in our sample even though there is no effect in the population (i.e., commit the type-I or false-positive error)

We neither transformed the data for any analysis nor added any control variablesFootnote 4 Unless otherwise indicated above, scales were formed by averaging the items. The collected dataset is publicly available to support other researchers in understanding the impact of (enforced) work-from-home policies.

4.1 Analysis of Time 1 Data

To test which of the variables listed in Fig. 1 explains unique variance in well-being and productivity, we performed two multiple regression analyses with all variables that were correlated with the two outcome variables with ≥ .30. In the first analysis, well-being is the dependent variable; in the second analysis, we use productivity as the dependent variable. This allows us to identify the variables that explain unique variance in the two dependent variables. However, one potential issue of including many partly correlated predictors is multicollinearity, which can lead to skewed results. If the Variance Inflation Factor (VIF) is larger than 10, multicollinearity is an issue (Chatterjee and Price 1991). Therefore, we tested whether the variance inflation factor would exceed 10 before performing any multiple regression analysis.

4.2 Analysis of Longitudinal Data

To analyze the data from both time-points, we performed a series of structural equation modeling analyses with one predictor variable and one outcome variable using the R-package lavaan (Rosseel 2012). Unlike many other types of analyses, structural equation modeling adjusts for reliability (Westfall and Yarkoni 2016). Specifically, models were designed with one predictor (e.g., stress) and one outcome (e.g., well-being) both as measured at time 1 and at time 2. We allowed autocorrelations (e.g., between well-being at time 1 and at time 2) and cross-paths (e.g., between stress at time 1 and well-being at time 2). Autocorrelations are essential because, without them, we might erroneously conclude that, for example, stress at time 1 predicts well-being at time 2, although it is the part of stress which overlaps with well-being, which predicts well-being at time 2 (Rogosa 1980). To put it simply, we can only conclude that X1 predicts Y2 if we control for Y1. No items or errors were allowed to correlate. This is usually done to improve the model fit but has also been criticized as atheoretical: To determine which items and errors should be allowed to correlate to improve model fit can only be done after the initial model is computed. Therefore, it is a data-driven approach which emphasizes too much on the model fit (Gana and Broc 2019; Hermida et al 2015; MacCallum et al. 1992). The regression (or path) coefficients and associated p-values were not affected by the estimator type. We compared in our analyses the standard maximum likelihood (ML), the robust maximum likelihood (MLR), and the multi-level (MLM) estimator. As fit indices, we report the CFI, RMSEA, and SRMR. To assess whether the fit indices are sufficient (i.e., from which point onward the data fits well to the model), we relied on the following cut-off values (Hair et al. 2006; Kline 2015): CFI ≥.90, and RMSEA and SRMR ≤.08.

5 Results

5.1 Correlations

The pattern of correlations was overall consistent with the literature. At time 1, 16 variables were correlated with well-being at r ≥ .30 (Tables 1 and 13)Footnote 5 predicting well-being (the independent variable or outcome) is − .60. This indicates that a person who has a well-being level of 6 has a stress level that is of − .60 units lower than a person who has a well-being level of 5.. Stress, r = − .58, quality of social contacts, r = .49, and need for autonomy, r = .48 were strongest associated with well-being (all p <.0001). The pattern of results from the 14 coping strategies was also in line with the literature (Carver et al. 1989): self-blame, r = − .36,p < .001, behavioral disengagement, r = − .31,p < .001, and venting r = − .28,p < .001 were negatively correlated with well-being. Interestingly, generalized anxiety was more strongly associated with well-being than COVID-19 related anxiety (r = − .46 vs. − .25), which might suggest that specific worries have a less negative impact on well-beingFootnote 6. This also suggests that our findings are at least partly COVID-19 independent; namely, if people were terrified by this virus, COVID-19 related anxiety would have been a stronger predictor than generalized anxiety.

Contrary to our expectations, extraversion was positively correlated with well-being, both at waves 1 and 2. The pattern of the associations was similar at time 2. A reason for participants’ misinterpretation of the intensity to struggle with working from home for introverts could be explained by introverts usually having to avoid unwanted social interactions, and due to being quarantined, they now have to put effort into having social interactions actively. The added challenge to contribute more energy than usual to not being too lonely and changing their usual behavioral pattern demands much more from introverts than extraverts (Davidson et al. 2015; Wei 2020).

At time 1, four variables were correlated with productivity at r ≥ .30 (Tables 1 and 13: Need for competence, r = − .37, distractions, r = − .34, boredom, r = − .33, and communication with colleagues and line-managers r = .30. Surprisingly, work motivations were uncorrelated with well-being at α = .001. At time 2, only distraction was still correlated with productivity, r = − .26,p < .001 (see also Table 14). The strength of association of most variables with productivity dropped between time 1 and 2, which means that those variables associated with productivity at wave 1 were no longer or less strongly associated with productivity at wave 2. The strengths of correlations remained the same when we computed Spearman’s rank correlation coefficients rather than Pearson’s correlations (Spearman’s coefficient is a non-parametric version of Pearson’s r and ranges also between − 1 and 1, see Tables 13 and 14).

5.1.1 Additional Analysis Regarding Extraversion

The counter-intuitive finding that well-being and extraversion are positively correlated surprised us. Thus, we added additional questions at time 2 to better understand this phenomenon. The purpose of this further investigation is only to provide a more nuanced interpretation of the results of our quantitative analysis; it is not a stand-alone research about extraversion during the lockdown.

Interestingly, the finding that extraversion is positively correlated with well-being during lockdown is contrary to most participants’ expectations. When asked whether introverts or extraverts struggle more with the COVID-19 pandemic, only 2 participants correctly predicted introverts, where 136 stated extraverts, with 46 participants believing that both groups struggle equally. This highlights the value of our research because people’s intuition can be blatantly wrong.

The explanation became more articulated through an analysis of the participants’ statements about the informant’s (I) choice. We now report selected quotes from participants, including their level of extraversion, in wave 1Footnote 7. Some informants reported their direct experience supporting the feeling that extraverts struggle more than introverts.

I’m introverted, and I don’t feel the pandemic has affected me at all. Rules aren’t hard to follow and haven’t feel bad. I feel for extraverts; they would struggle a bit with the rules.” [I-101, extraversion score= 2.75]

I’m an extravert; my wife is an introvert. I’m really struggling. She’s fine.” [I-92, extraversion score = 5.00]

Nonetheless, a minority of participants also provide alternative interpretations. According to those, both introverts and extraverts have difficulties in reaching out to people, although in different ways. The motivation for such answers is that both personality types struggle with different challenges.

Both types need company, just that each needs company on their own terms. Introverts prefer deeper contact with fewer people and extraverts less deep contact with a greater number of people.” [I-80, extraversion score = 3.75]

Extraverts miss human contact; introverts find it even harder to mark their presence online (e.g., in meetings).” [I-160, extraversion score = 3.50]

Interestingly, there is one informant which provide an insightful interpretation, aligned with our results.

Introverts usually have more difficulty communicating with others, and confinement worsens the situation because they will not try to talk to others through video conferences.” [I-136, extraversion score = 2.75]

The lack of a structured working setting, where introvert are routinely involved, causes further isolation. Being ‘forced’ to work remotely significantly increased difficulty in engaging with social contacts. This means that introverts have to put much more effort into interacting with others instead of their typical behavior of reduced interaction in office-based environments. Whereas extraverts have it easier to find some way to maintain their social contacts, introverts might struggle more. Thus, the lockdown had a more negative impact on the well-being of introverts than of extraverts, as shown in Table 1.

5.2 Unique Influence — Multiple Regression Analyses

To test which of the predictors had a unique influence on well-being and productivity, we included all variables that were correlated with either outcome with at least .30 at time 1. This is a conservative test because many predictors are correlated among each other and thus taking variance from each other. Also, it allowed us to repeat the same analysis at time 2 because all predictors which correlated with either well-being or productivity at time 1 with r ≥ .30 were included at time 2. In a first step, we tested whether multicollinearity was an issue. This was not the case, with VIF < 4.1 for all four regression models and thus clearly below the often-used threshold of 10 (Chatterjee and Price 1991).

Sixteen variables correlated with well-being r ≥ .30 (Table 1). Together, they explained a substantial amount of variance in well-being at time 1, R2 = .44,adj.R2 = .39,F(16,167) = 8.21,p < .0001, and at time 2, R2 = .47,adj.R2 = .42,F(16,162) = 8.90,p < .0001. At time 1, stress (negatively), social contacts, and daily routines uniquely predicted well-being at α = .05 (see Table 1, column 3, and Table 2). At time 2, need for competence and autonomy, stress, quality of social contacts, and quality of sleep uniquely predicted well-being at α = .05 (see Table 1, column 7, and Table 4). Together, stress and quality of social contacts predicted at both time points significantly well-being. Four variables correlated with productivity r ≥ .30 (Table 1). Together, they explained 16% of variance in productivity at time 1, R2 = .18,adj.R2 = .16,F(4,179) = 9.60,p < .0001, and 8% at time 2, R2 = .08,adj.R2 = .06,F(4,173) = 4.02,p = .004. At both time points, none of the four variables explained variance in productivity beyond the other three variables, suggesting that they all are associated with productivity but we lack statistical power to disentangle the effects (Tables 34 and 5). We also visualized the regression coefficients alongside their respective confidence intervals (see Figs. 567 and 8 in the Appendix).

Table 2 Predictors of well-being wave 1
Table 3 Predictors of productivity wave 1
Table 4 Predictors of well-being wave 2
Table 5 Predictors of productivity wave 2

There is an ostensible discrepancy between some correlations and the estimates of the regression analyses which requires further explanations. An especially large discrepancy appeared for the variable need for competence, which correlated positively with well-being at time 1 and 2, r = .41 with p < .001, and r = .38 with p < .001, but was negatively associated with well-being when controlling for other variables in both regression analyses, B = − .20, p = .24, and B = − .33, p = .04. This suggests that including a range of other variables, which serve as control variables, impact the results. Indeed, exploratory analyses revealed that need for competence was no longer associated with well-being when we included need for autonomy. That is, when we performed a multiple regression with the needs for autonomy and competence as the only predictors, need for competence became non-significant. Need for competence also includes an autonomy competent, which might explain this. It is easier to fulfill one’s need for competence while being at least somewhat autonomous (Ryan and Deci 2000). Further, including generalized anxiety and boredom reversed the sign of the association: Need for competence became negatively associated with well-being. Including those two variables remove the variance that is associated with enthusiasm (boredom reversed) and courage (generalized anxiety reversed), which might explain the shift to negative association with well-being. Together, controlling for need for autonomy, generalized anxiety, and boredom, takes away positive aspects of need for competence, leaving a potentially cold side that might be closely related to materialism, which is negatively associated with well-being (Dittmar et al. 2014).

5.3 Longitudinal Analysis

After identifying the independent variables that are more strongly related to well-being and productivity, we are now performing our longitudinal analysis, which will allow us to assess whether any of our sixteen predictors or independent variables predict one of our dependent variables at time 2 or is predicted by it. Test-retest reliabilities were satisfactory for all variables, supporting our data’s quality (last column of Table 1).

5.3.1 Structural Equation Modeling

In total, we performed 20 structural equation modeling (SEM) analyses to test whether well-being and productivity are predicted by or predict any of the 16 independent variables for well-being, including one model in which we tested whether well-being predicts productivity or vice versa, and four models for productivity. Since the probability of a false positive is very high, due to the high number of models analyzed, we used a conservative error rate of .005. We are using a different threshold for the longitudinal analysis than for the correlation analyses since we did a different number of tests for the latter. Occasionally, the model fit indices indicated that the data did not fit well to the models (cf. Table 10, last three columns). This was especially the case for the models, including the need for autonomy, competence, and relatedness, which we do not discuss further.Footnote 8

One example of our SEM analyses is presented in Fig. 2, where we looked at the causal relations between stress and well-being in waves 1 and 2. The boxes represent the items and the circles the variables (e.g., stress). The arrows between the items and the variables represent the loadings, that is, how strongly each of the items contributes to the overall variable score (e.g., item 3 of the stress scale contributes least and item 4 most to the overall score at both time points). The circular arrows represent errors. The bidirectional arrows between the variables represent the covariances, which are comparable to correlations. The one-handed arrows show causal impacts over time. The arrows between the same variables (e.g., well-being 1, and well-being 2) show how strongly they impact each other and are comparable to the test-retest correlations. The most critical arrows are those between well-being 1 and stress 2 and between stress 1 and well-being 2. They show whether one variable causally predicts the other.

Fig. 2
figure 2

SEM analysis of stress and well-being in wave 1 and 2

To provide a better understanding of our SEM analyses, we will guide the reader through the example shown in Fig. 2. The values (of this and all SEM analyses) are displayed in Table 10. Columns 2-4 of Table 10 show that stress and well-being were significantly associated at time 1, B = − 0.75, SE = .13, p < .001. This association was mirrored at time 2, B = − 0.15, SE = .05, p = .001 (columns 5-7). Columns 8-10 show that stress at time 1 did not significantly predict well-being at time 2, B = − 0.00, SE = .16, p = .99. Columns 8-10 of the second part of Table 10 also show that well-being at time 1 did not predict stress at time 2, B = 0.03, SE = .05, p = .55. Columns 2-4 of the second part show the autocorrelation of well-being, that is how strongly well-being at time 1 predicts well-being at time 2, B = 0.71, SE = .09, p < .001. Autocorrelations can be broadly understood as the unstandardized version of the test-retest correlations (reliability) reported in Table 1. Columns 5-7 of the second part show the autocorrelation of stress, which are also significant B = .99, SE = .16, p < .001. The last three columns indicate that the data fit reasonably well to the proposed model, CFI = .93,RMSEA = .07,SRMR = .07. It is worth noting that at time 1 (t1), the coefficient between well-being — stress is − .75, at time 2 (t2) only − .15. This is likely an SEM artifact because the variances of both well-being and stress are larger at t1 than at t2: 2.17 and 0.55 at t1 vs. 0.65 and 0.08 at t2 (see the double-headed arrows in Fig. 2). Because the standard errors also differ for the two coefficients, .13 at t1 vs .05 at t2 (cf. Table 10), both coefficients are significant, p ≤ .001. The correlation analysis supports this view, since well-being and stress are correlated with r = −.58 at t1, and with r = −.54 at t2 (cf. Table 1), suggesting clearly that the relations between the two variables are very similar across both time points.

We conclude our SEM analyses by acknowledging that no model revealed any significant associations at α = .005. Thus, no variable at time 1 (e.g., stress) is able to explain a significant amount of variance in another variable (e.g., well-being) at time 2. We only found a negative tendency regarding Distraction → Productivity with B = − .154, p = .006.

5.3.2 Mixed Effects Modeling

Additionally, we explored whether there are any mean changes between time 1 and 2, separately for all 18 variables using mixed effects modeling. For example, has the well-being increased over time? This would suggest that people adapted further within a relatively short period of two weeks to the threat from COVID-19. Table 6 shows that the arithmetic mean (M) of well-being has indeed slightly increased between time 1 and 2, M = 4.14 vs. M = 4.34. A closer look revealed that 91 participants reported higher well-being at time 2 compared to time 1, 23 reported the same level of well-being, and 70 a lower level of well-being. Further, on average, people’s score of behavioral disengagement and quality of social contacts increased, whereas emotional loneliness and the quality of communication with line managers and coworkers decreased.

Table 6 Within-subject comparisons to analyze mean changes over time

5.4 Exploratory Between Gender and Country Analyses

Further, we tested for gender mean differences by comparing women and men across 65 variables (cf. Table 11). Because of the large number of comparisons, we set our significance threshold to .001. With this threshold, only the coping strategy self-distraction resulted in significant differences with women reporting higher levels of it (e.g., ”I’ve been turning to work or other activities to take my mind off things”). Other comparisons were in the expected direction but not statistically significant. For example, women tended to score higher on anxiety on average, which is in line with the literature (Feingold 1994).

Finally, we explored whether there would be any mean differences between participants based in the United Kingdom (n = 63) and the United States of America (n = 52). We only selected those two countries because there were only 19 or fewer participants in each of the other countries. We again used a threshold of .001. With this threshold, only the work motivation material-extrinsic resulted in significant differences with people based in the USA reporting higher levels of it on average. This means that Americans are more driven by materialistic motivation (e.g., promotions, money) compared to UK professionals.

5.5 Conceptual Replication Analysis

Our finding that office-setup is not significantly related to well-being and productivity seems to contradict a recent cross-sectional study by Ralph et al. (2020) that investigated how the fear of bioevents, disaster preparedness, and home office ergonomics predict well-being and productivity among software developers. In that study, ergonomics was positively related to both well-being and productivity. To measure ergonomics, the authors created six items concerning distractions, noise, lighting, temperature, chair comfort, and overall ergonomics. The first two items are closely related to our measure of distraction, which was negatively associated with well-being in wave 1 of our sample, r = -.23, and productivity, r= − .34. In contrast, the following four items are more closely associated with office-setup in our survey, which was positive but not significantly associated with well-being, r = .14, and productivity, r = .10.

To better understand such inconsistency with our result, we run a replication analysis using Ralph et al.’s data. To test whether ergonomics’ effect is mainly driven by distraction and noise, we combined the first two items into variable ergonomics-distractions (recoded, higher scores indicate less distraction) and the other four items into ergonomics-others. Indeed, ergonomics distractions was more strongly correlated with well-being, r = .25, and productivity, r = .29, than was ergonomics-other, r s = .19 and .19, respectively. This suggests that our findings replicate those of Ralph et al. and emphasize the importance of distinguishing between distraction and office set-up.

6 Discussion

6.1 Implications and Recommendations

The COVID-19 pandemic and the subsequent lockdown have been a major professional change for many software engineers. In the present research, we investigated how a range of relevant variables are associated with and predict software engineers’ well-being and productivity. The first significant outcome of this research is that many variables are associated with well-being and productivity. The strength of the association ranges from small to large (Cohen 1992). Also, well-being and productivity were positively associated. This implies that neglecting people’s well-being might also negatively impact productivity. Together, our findings support (Ralph et al.’s 2020) recommendation that pressuring employees to keep the average productivity level without taking care of their well-being will likely lower productivity. However, we would also like to present an alternative interpretation that having productive employees will strengthen their sense of achievement and improve their well-being. This alternative interpretation derives from the fact that we did not find any causal relations. This is partly driven by most variables’ high stability over time, which leaves little variance to be explained by any other variable. However, it can also imply that many variables influence each other, such as well-being and productivity. Further, some of our predictors can likely be hierarchically organized. For example, introversion can lead to loneliness, resulting in more anxiety, which can cause lower levels of well-being. It will be interesting for future research to develop hierarchical models of emotions and other variables we used as predictors. This would further improve our understanding of the predictors of well-being and productivity. Since we started this investigation only after the pandemic, we could not contrast our results with non-remote pre-pandemic settings. Instead, we are providing evidence-based findings to help software engineers and organizations to work remotely.

In the following, we focus on practical recommendations based on the most reliable predictors of well-being and productivity that we identified through our longitudinal design: the need for autonomy, stress, daily routines, social contacts, need for competence, extraversion, and quality of sleep as predictors of well-being, in Table 7. Distractions and boredom related to productivity are discussed in Table 8.

Table 7 Summary of key findings & recommendations for Well-Being
Table 8 Summary of significant key findings & recommendations for Productivity

Persistent high-stress levels are related to adverse outcomes in the workplace (Bazarko et al. 2013) and people’s well-being. To reduce stress, it could be helpful for some people to practice mindfulness-based stress reduction training and practices as Bazarko et al. (2013) recommend. They can be performed at home, and participating in such a program can lead to lower stress levels and a lower risk of work burnout. Grossman et al. recommended other stress reduction methods (Grossman et al. 2004). Moreover, Naik et al. (2018), who found that mindfulness meditation practices, slow breathing exercises, mindful awareness during yoga postures, and mindfulness during stressful situations and social interactions can reduce stress levels. Together, the results of these studies suggest that mindfulness practices, even when performed at home, can reduce stress, which could also improve software engineers’ well-being while being quarantined. While mindfulness practices seem to be effective methods to impact peoples’ well-being positively; they might not work for everyone. For some individuals, getting physically active by exercising or going for a run, taking time to disconnect and reading a book, letting loose while dancing, or even getting creative and paint might have the same or a similar effect. For example, our exploratory analysis revealed that the coping strategy self-distraction (e.g., reading or watching a movie to unwind from work) was more frequently used by women, which is in line with the literature (Solomon et al. 2005). This indicates that self-distraction as a coping strategy is more effective for women than men. So, more research is needed to find out adequate coping strategies also for men.

As part of the overall quality of life, the quality of social contacts has a significant impact on people’s well-being. Therefore, employers should be interested in enabling their employees to spend time with people they value and encourage them to build strong, meaningful relationships within their work environment. Creating a virtual office (e.g., using an online working environment such as ‘Wurkr’) allows people to work with the impression of sharing a physical workspace online to communicate more comfortably and work together from anywhere. For example, to simplify conversations, the Slack plugin ‘Donut’ (slack 2020) randomly connects employees for coffee breaks to get to know each other better by spending some time chatting virtually. Besides, our finding that quality of social contact, but not living alone is associated with well-being, is in line with the literature. Quality of contact with one’s partner and family independently predicted depression negatively, whereas the frequency of these contact did not (Teo et al. 2013). Together, this suggests that findings from the literature can overall be generalized to people being quarantined.

Organizing the day in a structured way at home appears to be beneficial for software professionals’ well-being. People tend to overwork when working remotely (Buffer 2020). This could be further magnified during quarantine where usual daily routines are disrupted, and thus working might become the only meaningful activity to do. Therefore, it is essential to develop new daily routines not to be entirely absorbed by work and prevent burnout (Brooks et al. 2020). Hence, scheduling meetings and designating time specifically for hobbies or spending time with family and friends is helpful while working from home and helps to satisfy employees’ needs for social contacts.

To fulfill people’s need for autonomy, it is necessary to allow employees to act on their values and interests (Wang et al. 2016). While coordinating collaborative workflows and managing projects remotely comes with its challenges (Buffer 2020). It is crucial for remote workers to have flexibility in how they structure, organize, and perform their tasks (Wang et al. 2016). It is, therefore, helpful to delegate work packages instead of individual tasks. This makes it easier for individuals to work self-directedly and thus to fulfill their need for autonomy.

To fulfill employees’ need for competence, it is necessary to provide them with the opportunity to grow personally and advance their skill set (Legault et al. 2006). Two of the mainly required and highly demanded skills in remote work environments are communication skills and the ability to use virtual tools, such as presentation tools or collaborative project planning tools (Buffer 2020). Raising awareness of the unique requirements of virtual communication is crucial for a smooth working process. Thus, working remotely requires specific communication skills, such as mindful listening (McManus et al. 2006) or asynchronous communication, which allows people to work more efficiently (Järvelä and Häkkinen 2002). Collaborative tools such as GitHub, Trello, Jira, Google Docs, Klaxoon, Mural, or Slack can simplify work processes and enable interactive workflows. Besides the training and development of employees’ specific virtual skill set, it is also recommended to invest in employees’ personal development within the company. Taking action and offering employees the opportunity to grow will evolve their role and strengthen their loyalty towards the employer and, therefore, employee retention (Kossivi et al. 2016).

Introverted software professionals seem to be more negatively affected by the lockdown than their more extraverted peers. This finding is counter-intuitive since extraverted people prefer more direct contacts than introverted people (Ludvigh and Happ 1974). Our interpretation of these results is that it is even more challenging for introverts to reach out to colleagues and friends when contact opportunities are more limited. This is because being introverted does not mean that there is no need for social contacts at all. While in the office, they had opportunities to be involved with colleagues both in a structured or unstructured fashion, at home, it is much more difficult as they have to be more proactive to reach out to colleagues in a more formalized setting, such as online collaboration platform (e.g., MS Teams). Accordingly, software organizations should regularly organize both formal and informal online meeting occasions, where introverted software engineers feel a lower entry barrier to participate.

Quality of sleep is also a relevant predictor for well-being. Although it might sound obvious, there is a robust association between sleep, well-being, and mindfulness (Howell et al. 2008). In particular, Howell et al. found that mindfulness predicts sleep quality, and quality of sleep and mindfulness predict well-being.

Distractions at home are a challenging obstacle to overcome while working remotely. Designating a specific work area in the home and communicating non-disturbing times with other household members are easy and quick first steps to minimize distractions at the workplace at home. Another obstacle that distracts remote workers more frequently is cyberslacking, which is understood as spending time on the internet for non-work-related reasons during working hours (Dictionary 2020). Cyberslacking and its contribution to distractions at home for remote workers were not included in this study but would be worth exploring in future research.

When people experience, boredom it makes them feel “[...] unchallenged while they think that the situation and their actions are meaningless” (Van Tilburg and Igou 2012, p. 181). Especially people who thrive in a social setting at work are in danger of being bored quickly while working in isolation from their homes. The enumerated recommendations above, such as assigning interesting, personally tailored, and challenging work packages, using collaborative tools to hold yourself accountable, and having social interactions while working remotely, also help reduce boredom at work. Ideally, employees are intrinsically motivated and feel fulfilled by what they do. If this is not the case over a more extended period, and the experienced boredom is not a negative side effect of being overwhelmed while being quarantined, it might be reasonable to discuss a new field of action and area of responsibility with the employee.

To conclude, working from home certainly comes with its challenges, of which we have addressed several in this study. However, at least software engineers appear to adapt to the lockdown over time, as people’s well-being increased, and their social contacts’ perceived quality improved. Similar results have also been confirmed by a survey study of 2,595 New Zealanders’ remote workers (Walton et al. 2020). Walton et al. found that productivity was similar or higher than pre-lockdown, and 89% of professionals would like to continue to work from home, at least one day per month. This study also reveals that the most critical challenges were switching off, collaborating with colleagues, and setting up a home office. On the other hand, working from home led to a drastic saving of time otherwise allocated to daily commuting, a higher degree of flexibility, and increased savings. A range of further recommendations of effective self-guided interventions to tackle anxiety, depression, and stress, are summarized by Fischer et al (2020).

6.2 Threats to Validity

Limitations are discussed using Gren’s five-facets framework (Gren 2018).

Reliability

This study used a two-wave longitudinal study, where 96% of the initial participants, identified through a multi-stage selection process, also participated in the second wave. Further, the test-retest reliabilities were high, and the internal consistencies (Cronbach’s α) ranged from satisfactory to very good.

Construct Validity

We identified 51 variables drawn from the literature, and a suitable measurement instrument measured each. Where possible, we used validated instruments. Otherwise, we developed and reported the instruments used. To measure the construct validity, we also reported the Cronbach’s alpha of all variables across both waves. Regarding the two dependent variables, we used a validated scale for well-being and developed a new one for productivity. We made this choice since it related well to the lockdown environment our participants were facing. Thus, we chose the Satisfaction with Life Scale for well-being, and productivity was operationalized as a proportion of time spent working and efficiency per hour, compared to the estimated regular productivity without the pandemic. However, we note that despite many variables in our study, we still might have missed one or more relevant variables, which would have been relevant to our analysis.

Conclusion Validity

To draw our conclusions, we used multiple statistical analyses such as correlations, between-subject t-tests, multiple linear regressions, and structural equation modeling. To ensure reliable conclusions, we used conservative thresholds to reduce the risk of false-positive results. The thresholds depended on the number of comparisons for each test. Additionally, we did not include covariates, nor did we stop the data collection based on the results, or performed any other practice associated with increasing the likelihood of finding a positive result and increasing the probability of false-positive results (Simmons et al. 2011). However, we could not make any causal conclusion since all 20 SEM analyses provided non-significant results, using a threshold of significance that reduces the risk of false-positive findings. Also, we have not measured participants’ perception of the severity of the lockdown measures. Thus, we cannot test whether they moderate the associations we found. However, it is unlikely they would have impacted our findings, as depression and worries were found to be only weakly associated with perceptions of how the government and public reacted to the lockdown measures in spring 2020 (Fetzer et al. 2020). Further, we do not have sufficient participants from different countries in our sample to test whether objective government responses (i.e., the strictness of the lockdown Hale et al. 2020) moderates the associations we found. With our data, we can only provide indirect evidence that this is unlikely to be the case: When comparing participants from the UK and USA – the lockdown was stricter in the UK by the time we collected the data (Hale et al. 2020) – we found little between-country mean differences. Nevertheless, we acknowledge that this is an open research question that we cannot fully answer with our data. Finally, we made both raw data and R analysis code openly available on Zenodo.

Internal Validity

This study did not lead to any causal conclusion, which was the present study’s primary aim. We can not say that the analyzed variables influence well-being or productivity or vice versa. We are also aware that our study relies on self-reports, limiting the study’s validity. Further, we adjusted some measures (e.g., productivity). Participants were not supposed to report their perceived productivity but to make a comparison, which has been computed independently afterward in our analysis. We also underwent an extensive screening process, selecting over 190 software engineers of the initial 483 initial suitable subjects, identified by a previous study of Russo & Stol, through a multi-stage cluster sampling strategy (Baltes and Ralph 2020). Typical problems related to longitudinal studies (e.g., attrition of the subjects over a long-term period) do not apply. The dropout rate between the two waves has been low (4%). We run this study towards the end of the lockdown of the Covid-19 pandemic in the spring 2020. In this way, participants were able to report rooted judgments of their conditions. Waves were set at two weeks distance, which ensured that lockdowns had not been lifted yet during the data collection of wave 2, but was also not close enough so that variability in each of the variables would already be sufficiently high between the two-time points. Since this was a pandemic, the surveyed countries’ lockdown conditions have been similar (due to standardized WHO’s recommendations). However, we did not consider region-specific conditions (e.g., severity of virus spread) and recommendations. Also, lockdown timing differed among countries. To control these potential differences, we asked participants at each of the two waves if lockdown measures were still in place and if they were still working from home. Since all our participants reported positively to both these conditions, we did not exclude anyone from the study.

External Validity

An a priori power analysis has determined our sample size. As with any longitudinal study, we designed this study to maximize internal validity (Kehr and Kowatsch 2015). Accordingly, we focused on finding significant effects, rather than working with a representative sample of the software engineering population (with N ≈ 500, such as Russo and Stol (2020) did, where the research goal focused on the generalizability of results). Additionally, we made an effort to be able to estimate to what extent our findings depend on the current situation and whether they would also be useful to inform researchers and practitioners interested in remote work in general, beyond exceptional circumstances and potentially beyond software engineers (i.e., knowledge workers in general). First, we also measured participants’ previous remote work experience in the past 12 months. This was uncorrelated with well-being and productivity, indicating that the extent to which people were working remotely before the lockdown was irrelevant. Second, we measured both generalized anxiety and Covid-19 specific anxiety. As we now clarified in the subsection ”Correlations”, generalized anxiety is more relevant for people’s well-being than Covid-19 specific anxiety. This suggests that our findings are at least partly COVID-19 independent: If people were terrified by COVID-19, it would have been a stronger predictor than generalized anxiety. Third, many of our findings relate to the findings reported in the psychological literature. This study demonstrates that those findings also hold in a sample of professional software engineers while expanding the literature substantially through our design, including a large set of relevant variables.

7 Conclusion

The COVID-19 pandemic disrupted software engineers in several ways. Abruptly, lockdown and quarantine measures changed the way of working and relating to other people. Software engineers, in line with most knowledge workers, started to work from home with unprecedented challenges. Most notably, our research shows that high-stress levels, the absence of daily routines, and social contacts are some of the variables most related to well-being. Similarly, low productivity is related to boredom and distractions at home.

We base our results on a longitudinal study, which involved 192 software professionals. After identifying 51 relevant variables related to well-being or productivity during a quarantine from literature, we run a correlation study based on the results gathered in our first wave. For the second wave, we selected only the variables correlated with at least a medium effect size with well-being or productivity. Afterward, we run 20 structural equation modeling analyses, testing for causal relations. We could not find any significant relation, concluding that we do not know if the dependent variables are caused by independent ones or vice versa. Accordingly, we ran several multiple regression analyses to identify unique predictors of well-being and productivity, where we found several significant results.

This paper confirms that, on average, software engineers’ well-being increased during the pandemic. Also, there is a correlation between well-being and productivity. Out of 51 factors, nine were reliably associated with well-being and productivity. Based on our findings, we proposed some actionable recommendations that might be useful to deal with potential future pandemics.

Software organizations might start to experimentally ascertain whether adopting these recommendations will increase professionals’ productivity and well-being. Our research findings indicate that granting a higher degree of autonomy to employees might be beneficial, on average. However, while extended autonomy might be perceived positively experienced by those with a high need for autonomy, it might be perceived as stressful for those who prefer structure. It is unlikely that any intervention will have the same effect on all people (since there is a substantial variation for most variables); it is essential to have individual differences in mind when exploring any interventions’ effects. Thus, adopting incremental intervention, based on our findings, where organizations can get feedback from their employees, is recommended.

Future work will explore several directions. Cross-sectional studies with representative samples will test whether our findings are generalizable and get a better understanding of underlying mechanisms between the variables. We will also investigate the effectiveness of specific software tools and their effect on software engineering professionals’ well-being and productivity with particular regard to the relevant variables.