Abstract

In recent years, electronic services delivered via the World Wide Web have become increasingly important to governments. Substantial investments have been made to provide crucial services and information to citizens, businesses, and governmental institutions. This paper presents the development of a short questionnaire to measure user satisfaction with e-Governmental portals. After two validations of the instrument with federal e-Governmental portals in Switzerland, a final set of 15 items remained that were tested with 2498 participants. The final version showed high internal consistency (Cronbach's ) of .91, good item difficulties (.51 to .82), and discriminatory power coefficients (.49 to .81), as well as a moderate average homogeneity of .47. The survey was translated into five languages.

1. Introduction

In recent years, significant progress has been made in the development of e-Government services. The amount of information and public services that are delivered online grows constantly, with many benefits for citizens, businesses, and governmental institutions. In this context, it is crucial to implement methods to measure, maintain, and optimize the quality of e-Governmental portals. It is particularly important to analyze how good such services are perceived from the users’ point of view.

User satisfaction is often measured by implementing online questionnaires. The main goal of this paper is to develop a short questionnaire that can be used to measure user satisfaction with e-Government portals. This survey is named ZeGo (“Zufriedenheit im e-Government”, the German expression for e-Government user satisfaction). For the scale construction, an exploratory approach was chosen. The item analysis is based on classic test construction theories.

2. Theoretical Background

2.1. E-Government and User Satisfaction

E-Government stands for the exchange of information over the Internet, providing governmental services and communication with citizens, businesses, and public institutions. According to Jaeger [1], three different types of interactions in e-Governmental activities can be examined. Government-to-government (G2G) initiatives facilitate the communication between parts of a government, leading to higher consistency and efficiency. Government-to-business (G2B) initiatives, which, on the one hand, enhance the awareness of possibilities for businesses to work with the government and, on the other hand, provide new business opportunities for governments. Both profit from reduced costs and increased efficiency. Government-to-citizens (G2C) initiatives enhance the involvement and facilitate interactions. Jaeger [1] mentions that G2C interactions offer the widest range of information and services. The main purpose is improved relations between a government and citizens. West [2] argues that the exchange is no longer one way. Instead of citizens being forced to go to the government offices, the government now has the possibility of reaching citizens actively via the Internet. At the same time, citizens profit from an online access at any time they want. In a survey undertaken by the European Commission [3], the most frequently mentioned reasons for citizens using e-services are saving time and gaining flexibility.

A considerable number of publications report the development and testing of ways to enhance the usability of e-Government services. De Meo et al. [4], for instance, developed a multiagent system capable of suggesting to users the most interesting services for them. Other authors also propose ways to support the easier selection of e-service by users (e.g., [57]). Horst et al. [8] emphasize the role of risk perception and trust in the context of e-Government.

To ensure e-Governmental success, it is important to evaluate the effectiveness of the information and services offered. Reddick [9], as well as Janssen et al. [10], claim that most approaches analyze e-Government from a supply-side perspective, without focusing on the demand-side interactions. According to Peters et al. [11], most studies still focus on the effectiveness of e-Government websites without considering user satisfaction as a main concept. Wang et al. [12] argue that most studies about e-Government evaluations are assessed without reference to the behavior of citizens using these systems. Stowers [13] mentions that user satisfaction is the least used metric in federal e-Government studies in the US despite the fact that the citizen’s perception of usefulness and ease of use of e-Government websites directly enhance the level of the citizen’s intention to continue to use e-Government websites [14]. However, there are models for evaluating e-Government in a user-centric approach [12], and user satisfaction is recognized as an important factor in most models for developing and maintaining e-Government projects (e.g., [1517]). Still, there is little research that shows how this user satisfaction can be measured in the context of e-Government. These circumstances speak for a better evaluation of user satisfaction with e-Government portals and a more precise consideration of the “User Satisfaction” construct.

The construct “user satisfaction” in relation to computers is most often described as affective attitude. Bailey and Pearson [3, page 531] define user satisfaction as the “sum of one’s positive and negative reactions to a set of factors.” Doll and Torkzadeh [14, page 261] describe it as “the affective attitude toward a specific computer application by someone who interacts with the application directly.” Current authors also describe it as an affective reaction to a given information system [18]. For this paper, user satisfaction is regarded as “psychological tendency that is expressed by evaluating a particular entity with some degree of favor or disfavor” [15, page 296]. According to Huang et al. [19], user satisfaction is the most often used construct to measure the success of an information system. Nevertheless, there is little empirical research that shows under which conditions user satisfaction arises (though see [20]).

According to Fishbein [21], an attitude can be understood as the sum of the products of a person’s expectations and evaluations. To measure user satisfaction, it is theoretically necessary to know all expectations. Web users have formed certain expectations regarding efficiency and effectiveness [22] but also regarding, for example, web design [23]. If these expectations are fulfilled, it is to be assumed that users will be satisfied with the system. In this concept of satisfaction, user satisfaction is regarded as the output of a comparison process of expectations and the perceived performance level of the application [24, 25]. It is therefore expected that user satisfaction with e-Government sites is achieved if the expectations of its users are fulfilled. The items of this questionnaire will focus on capturing the cognitive components of the user’s attitude toward the e-Government services.

2.2. User Satisfaction Questionnaires

In this section, a brief overview of some questionnaires that were developed to measure user satisfaction in different contexts will be given.

Bailey and Pearson [26] developed a tool with 39 items to gather information about user satisfaction with computers. The questionnaire is however more than 20 years old. At the time that it was constructed, computers had very limited possibilities and were mostly used in data processing, therefore, several items deal solely with the satisfaction of the data-processing personnel. Technological advancements and the development of interactive software led to the need to provide usable interfaces. Doll and Torkzadeh [27] developed a questionnaire with 12 items designed to measure ease of use with specific applications. They postulate that user satisfaction is composed of five factors (content, accuracy, format, ease of use, and timeliness). Harrison and Rainer [28] confirmed the validity and reliability of this tool and showed that it could be used as generic measuring tool for computer applications. Lewis [29] developed and validated another questionnaire with 19 items, called Computer Usability Satisfaction Questionnaires (CSUQ). He regards usability as the prime factor influencing user satisfaction (thus the name). The analysis of his data revealed three factors influencing user satisfaction: system usefulness, information quality, and interface quality.

The focus of most satisfaction scales lies in the development of application-independent instruments to enable their use in various contexts. Considering the broad range of tools and applications, this is often a very difficult task. However, it can be assumed that users have context-specific expectancies that arise depending on the information system used. In the last years, satisfaction scales and tools for specific areas such as online shopping [3033], company websites [34], business-to-employee systems [19], mobile commerce interfaces [35], knowledge management systems [36], Intranets [37], mobile Internet [38], mobile banking [39], ERP's [40], and the information systems of small companies [41] were developed.

3. Development and First Validation

This section describes how the first version of ZeGo was developed and validated.

3.1. Development of ZeGo

Based on the screening of the theoretical approaches and empirical data, a first item pool was generated by the authors and a full-time e-Government manager. These items were screened and unified to a first draft of ZeGo. The core element is a short questionnaire containing 15 items that is designed to measure user satisfaction with e-Government portals. The version in Table 1 was used for the first validation of ZeGo.

Four questions (1, 11, 12, and 15) of ZeGo are open ended, and question 14 is binary. These items will be disregarded for the analysis due to their scale. For all other questions, Likert scales were used (see Table 1). In this method, respondents specify their level of agreement or disagreement with a positive statement. Here, a five-point scale was used with labeled extreme points (1 = does not apply at all, 5 = applies completely). A higher number expresses a higher level of agreement, thus satisfaction. Interval measurement is assumed for the rating scale, allowing the corresponding statistical validations. The assumption of interval measurement for a rating scale without prior empirical validation is a widely used research practice [42].

To ensure a high reliability and validity, use of between five and seven categories for a Likert scale is recommended [43]. With the five-point scale, participants have two options for a positive attitude, two for a negative one, and one neutral option. According to Mummendey [44], participants choose the middle option for multiple reasons: (1) they do indeed have a neutral attitude, (2) they do not know how to answer the question, (3) they think that the question is irrelevant, (4) they refuse to answer, or (5) they want to express their dislike of the question (protest answer). Therefore, an alternative option was introduced to the instrument “I don’t know,” diminishing the negative impact of a part of these undesired answers.

It is crucial that participants do not have to spend more than 10 to 15 minutes answering the survey. This is in line with recommendations made by Batinic [45]. Referring to similar instruments [27, 29], a maximum of 15 items was chosen.

3.2. Methodology

To validate ZeGo, it was implemented as an online survey, and tested in cooperation with the portal of the canton Basel, Switzerland. The e-Governmental website of Basel had about 30,000 unique visitors per month. Recruitment of participants was initiated by a banner placed on the corresponding website http://www.bs.ch/. The survey started with a short introductory text that highlighted the importance of participants’ feedback, the type of questions, the length of the survey and the anonymity of this enquiry. On the next pages, all 15 questions (see Table 1) were presented on separate screens. When submitting incomplete questions, users were forced by posterior error messages [46] to choose one of the given options. After the user satisfaction questions had been answered, the survey ended with nine demographic items. The demographic part was put at the end to avoid the user satisfaction answers being influenced by concerns that feedback could be backtracked [47]. The first version of ZeGo was conducted in January 2005 for one month.

In total, 476 citizens participated in the survey, leading to 462 valid responses (14 responses had to be excluded for various reasons; see Section 3.3). Regarding gender distribution, the first version of ZeGo was returned by 71.7% male and 28.3% female participants. This means a slight overrepresentation of male participants: 57% of Swiss Internet users are male and 43% are female (Bundesamt für Statistik, 2006). The overall mean of the age distribution was 44 years (SD = 13).

3.3. Results

Before data analysis was made, 14 participants had to be excluded. Eight responses were discarded because the “I don’t know” option had been chosen for more than half of the items. Six people were excluded because they answered all 13 items exclusively with the best or the worst item score (in these cases, we assume that the participants had no real interest in the survey and just wanted to participate in the raffle). The sample size for the item analysis, therefore, consists of 462 participants. Table 2 gives an overview of the total missing values for each analyzed item after exclusion of the 14 participants. Item 15, with 137 (29.7%) missing cases, differs markedly from the rest. Participants with no knowledge of other governmental sites are not able to answer the comparison item 15—this phenomenon is reflected in the high “missings.”

To avoid sample size reduction with the Listwise and Pairwise Deletion, the Expectation-Maximization Algorithm (EM) was used to replace missing values. EM is an iterative method, which derives the expectation of missing data based on estimates of the variable and computes parameters maximizing this expectation. The replacement of missing values with EM has been proven to be a valid and reliable method and outclasses the listwise and pairwise deletion in many aspects [48, 49]. There are virtually no differences between “All Values” and the “EM” values. As mentioned before (see Section 3.1), not all items can be used for this item analysis. There are open-ended, binominal, and demographical items; therefore, only 11 questions are included in the item analysis (see Table 3).

For interval-scaled item responses, it is advisable to calculate the discriminatory power with a product-moment correlation of the item score with the test score [50]. The discriminatory power of an item describes the items correlation with the total score of the test. Cronbach describes to which extent a group of indicators can be regarded as measurements of a single construct (here: user satisfaction). Table 4 lists the discriminatory power, and Cronbach for each item. The discriminatory coefficients range between  .38 (item 6) and  .78 (item 10) with a mean of  .60 (SD  =  .12). Three items show a coefficient below  .50 (items 2, 6, and 8). According to Borg and Groenen [51], the lowest acceptable discriminatory power is  .30. No item falls in this category. All of the items are in an acceptable to good range.

The homogeneity examines whether all items of ZeGo measure the same construct (“user satisfaction”) and whether there are items that overlap (measure similar aspects of the construct). If the items of a test correlate with each other, it can be assumed that they measure similar aspects of the common construct. This topic can be explored in the intercorrelation matrix. It shows significant correlations for all items ( ) with no negative correlations.

The intercorrelations of the 11 items are relatively moderate. The average homogeneity index for the scale is at  .41, and the homogeneity indices for each item range from  .27 to  .52, with the lowest values for items 2, 6, and 8. One explanation for the relatively moderate indices could lie in the complexity of the construct “user satisfaction,” a circumstance that requires the items to be heterogeneous in order to cover the whole spectrum.

Cronbach for the first version of ZeGo is relatively high ( ), indicating a good reliability for this instrument. Table 4 shows that the internal consistency would increase to the most by exclusion of item 6.

3.4. Discussion

The first validation of ZeGo shows promising results. At the same time, it becomes clear that there are some problematic items that need to be modified or deleted. The discussion of the scale and the items of ZeGo form the basis for the second version of ZeGo (see Section 4).

There is a clear tendency to use ZeGo in the upper part of the five-point scale. This finding is not surprising: it can be expected that Basel, as one of the biggest cities in Switzerland, will develop an acceptable e-Government site. This assumes for ZeGo that the instrument differentiates well for the upper scale range.

Here, only the problematic items will be discussed. An item can be regarded as being problematic if the statistical parameters show insufficient values (see Section 3.3).

Item 6
The statistical parameters of item 6 are a little weaker than the rest. It shows a low homogeneity, discriminatory power, and reliability. It can be argued that design and colors do not have to be connected necessarily to other user satisfaction questions. A participant can be satisfied with an e-Government site while disliking the design and the colors. However, this item is useful if design and colors are so poor that user satisfaction decreases. Because of this and despite the weaker statistical values, the item will be temporarily maintained for the second validation.

Item 8
This item shows a relatively low homogeneity and discriminatory power. Furthermore, there are 12.3% missing values. It seems that the completeness of a website is difficult for users to assess. For the same reasons as item 6, the question will remain temporarily in the survey for the second validation.

Items 9 and 10
The statistical parameters of these two items are good. Qualitative analysis of registered comments raised the issue of whether the questions are sufficient for examining content quality. More information about the content would facilitate the work of webmasters in improving content-specific website aspects. Therefore, two new items were added for the second version: “The information found on the website http://www.website.com/ is credible,” and “I know what content to expect on http://www.website.com/.” Credibility as well as expectations are important aspects regarding user satisfaction. With an average completion time of 6.25 minutes, there was no reason not to add two questions to ZeGo.

Items 14 and 15
These two items stand out due to the high number of users who did not answer the questions. The two items were intended to investigate whether other e-Government sites are better. It seems that many users do not know other e-Government portals. Additionally, both questions nearly cover the same issue as item 13. Due to the similarity to item 13 and the high percentage of missing answers, these items will be discarded from the second version of ZeGo.

The item order was also reconsidered. To facilitate the filling out of the survey, all Likert scales were put together, and all open-ended questions were put at the end. Only item 11 was set at the beginning to simplify the starting of ZeGo. The assumption that every user follows a goal on a website led to the decision that writing down these goals would be an easier task to begin with than thinking about qualitative aspects of the website.

The first validation of the 15 ZeGo items led to the deletion of two items (14, 15) and the development of two new questions concerning the content. The second version of the scale, therefore, contains again 15 items, presented in an optimized order.

4. Modification and Second Validation

This section describes how the second version of ZeGo was developed and validated.

4.1. Modifications

Based on the validation of the first version (see Section 3.3), the second version of ZeGo once again contained 15 items (see Table 5).

4.2. Methodology

To validate the second version of ZeGo, it was implemented again as online survey and this time tested in cooperation with all 26 cantons of Switzerland. The recruiting of participants was achieved in all cantons using banners or textlinks placed on the particular e-Government websites.

Due to the multilingual situation in Switzerland, all questions were translated into five languages: English, French, German, Italian, and Rhaeto-Romanic. The English version of this survey is included in this paper. All other languages can be accessed on http://www.zego-study.ch/. Due to the small sample size of the French, Italian, English, and Rhaeto-Romanic versions, only the German version was considered for statistical validation. The survey started with a short introductory text that highlighted the importance of the participants’ feedback, the type of questions, the length of the survey, and the anonymity of this enquiry. On the next pages, all 15 questions (see Table 5) were presented one by one. The survey ended with some demographic items. The ZeGo survey was conducted in October 2006. Originally, it was planned to have each canton gather data during four weeks. Due to the large variance in the number of participants, it soon became clear that for many cantons four weeks were not sufficient. Therefore, the deadline was extended until at least 100 participants were registered. This led to a timeframe for each canton ranging from 4 to 10 weeks.

Participants who provided no answers (most commonly people who only wanted to participate in the raffle) were excluded from the analysis. In this way, a total of 2524 completed surveys remained in the pool. Regarding gender distribution, the ZeGo population was composed of 63.3% male and 26.8% female participants, while 9.9% did not indicate their sex. This led once again to a slight overrepresentation of male participants (see Section 3.2). The overall mean of the age distribution was 40 (SD = 13).

4.3. Results

In total, 2524 responses were registered. The missing data consisted of unanswered items and particularly “I don’t know” statements. Fifteen participants were discarded because they had not answered at least half of the items. In the case of the other 11 excluded participants, it was assumed that they did not fill out the questionnaire seriously, because they answered all 12 items exclusively with the best or the worst item score. The sample size for the item analysis, therefore, consists of 2498 participants. Table 6 gives an overview of the total missing values for each item after the exclusion of the 26 participants. Again, Expectation-Maximization Algorithm (EM) was used to replace missing values, leading to virtually no differences between all values and the EM values.

Most missing values are due to the “I don’t know” option. The two items with the most missing values are item 8 with 277 missing cases (11.1%) and item 13 with 168 missing cases (6.7%). This is not surprising, because these two questions assume that the user explored many aspects of the website, but not all users had this opportunity or the time to do so. However, the particular missing values are lower than in the first validation.

The average individual mean scale for the second version of ZeGo is 3.86 (SD  =  .71). The lowest mean is 1.08 and the highest 4.93. Figure 1 shows the mean scales for items 2 to 13. The distribution of the mean scale is again skewed to the right (Skewness = −1.01, ) and shows a sharp peak (Kurtosis  =  .71, ).Kolmogorov-Smirnov and Shapiro-Wilk tests confirm that the shape of the acquired data is not normally distributed ( ). The means for items 2 to 13 reside within a range of  0.99 given by the lowest score of item 8 ( ) and the highest score of item 11 ( ) with an average standard deviation of  .99. Table 7 gives a brief overview of the most important statistical parameters.

The frequency distributions show for most items a slightly asymmetric distribution to the right, indicating a negative skewness. Regarding the kurtosis, most items are almost zero with four exceptions. Item 2 ( ), item 7 ( ), item 9 ( ), and item 11 ( ) show sharper peaks than a normal distribution.

The item difficulty indices ( ), shown in Table 7, are again measured with consideration of the variances. They are distributed between  .51 (item 8) and  .82 (item 2). The average index is at  .64 (SD  =  .10).

Table 8 lists the discriminatory power and Cronbach for each item of the second version. The coefficients range from  .49 (item 2) to  .81 (item 10) with a mean of  .65 (SD  =  .12). On this occasion, no questions show too low or high values.

Regarding the homogeneity,the intercorrelation matrix (see Table 9) shows significant correlations for all items ( ) with no negative correlations. The average homogeneity index for the second version is now at  .47 (SD  =  .08), and the homogeneity indices for each item range from  .36 to  .57. Again, the relatively moderate indices are explained with the complexity of the construct “user satisfaction” that requires the items to be heterogeneous to cover the whole spectrum.

Cronbach for the second version of ZeGo is again relatively high ( ) confirming the good reliability of this survey. Table 8 shows that no item deletion brings a relevant improvement to the internal consistency.

4.4. Discussion

The second validation of ZeGo reveals a stable measuring instrument. All statistical parameters (item difficulties, discriminatory power, homogeneity and internal consistency) are within reasonable to good levels. Some items can be critically discussed as follows.

Item 2
The statement of item 2 refers to the use of the governmental site in the future “In the future, I will use http://www.website.com/ again to complete similar tasks.” The difficulty index is relatively high and the discriminatory power moderate. It seems that some users marked a high value, even though they were not very satisfied with the website. This indicates that many users have no alternative to the website—they are forced to use the portal of their canton, even if they are unsatisfied. Considering that the item provides information if the user plans to use the website again and the statistical values are acceptable, it is suggested that item 2 remains in the item pool nevertheless.

Item 6
The statistical values of item 6 are still a little weaker than the rest, once again showing a relatively low homogeneity, discriminatory power and reliability. The retests with more than just one website led to improved parameters; therefore, it is suggested to leave the item in the instrument.

Item 8
This item had low statistical parameters in the first validation. In the second validation, nothing speaks for its elimination from ZeGo, except for the relatively high missing values. Therefore, item 8 remains in the item pool.

The first validation showed the necessity of adding two new items. They intended to examine the content more precisely.

Item 11
Credibility is a crucial subject for e-Governmental websites, but it seems to concern aspects that go beyond “user satisfaction” and probably just has an indirect influence on the construct. This could be an explanation for the relatively low homogeneity index. This item will remain in the item pool, because the statistical parameters are good and it shows a first insight of the credibility of e-Government portals.

Item 12
According to the statistical parameters, asking whether expectations of the content of e-Government sites are fulfilled, item 12 seems to be a good and justifiable question for ZeGo.

5. Conclusions

Both validations of ZeGo show high Cronbach values, evidence of excellent internal consistency. The homogeneity indices have been increased to a moderate level in the second validation. Given that the construct of “user satisfaction” is complex and implies heterogeneous items, these values can be regarded as satisfactory. Thus, the overall reliability and validity of ZeGo are good. The items were corrected, making it very likely that many important aspects of user satisfaction with e-Governmental services were considered, leading to a good content validity. The objectivity concerning the independency of the experimenters can be disregarded in an online study. The validation of ZeGo seems to be government independent, because the e-Governmental websites of the 26 cantons differ, sometimes considerably, and nevertheless they show similar statistical parameters. In this paper, a clear and identifiable need of the citizen-government relationship was addressed: how can we measure user satisfaction of e-Governmental services? With an explorative approach, 15 items were generated and tested. The validation showed that two items had to be excluded. Two new items concerning the content were added, and the resulting measuring instrument was again validated, turned out to be stable, and can be recommended for use.