Enhancing Usability Assessment with a Novel Heuristics-Based Approach Validated in an Actual Business Setting

We have long been committed to improving Usability Evaluation and one of the proposals we have been working on the most is the use and improvement of the Heuristic Evaluation (HE) technique. With this in mind, we proposed an improvement which was tested in an experiment. This article describes an experiment carried out in a real business professional context. Fifteen usability experts from a reputable company evaluated eight websites (four supermar-ket and four bank platforms) using our HE proposal for the first time in real-world scenarios. This experimentation analyzed two main aspects: firstly, whether individual or group evaluations affect the final result, and secondly, whether the heuristic evaluation technique is effective in a real business and professional context. Regarding the Usability Percentage (UP) event, the results indicate that there was little difference between group and individual evaluations. The mean UP for the group was 57.88%, while the mean UP for individuals was 56.66%. The ex-periment provided sufficient information to suggest a new version of our HE methodology, specifically designed to improve results in real-life contexts. Furthermore, the experiment’s findings support the proposal of this new methodology, which is better suited to the business environment.


Introduction
Studying everything related to the Human-Computer Interaction discipline provides significant benefits to the design of any interactive system.This is because everything that is built must ultimately interact with a human at some point Dix et al. [2003] Gerlach and Kuo [1991].Concepts such as Usability or User eXperience (UX), have become as essential as the functionality of any technology-mediated system.Usability is an internal quality of interactive systems defined as "the ability of software to be understood, learned, used and appealing to the user, under specific conditions of use" Bevan et al. [1991].User eXperience (UX), refers to the overall perception of a user when interacting with a product, system or service, and how that interaction impacts on their emotions, attitudes and satisfaction Allam et al. [2013].In this context, companies are increasingly adopting User-Centered Design (UCD) methodologies to improve their products through Quality of Use ISO/IEC [2020] ISO/IEC [2017].In addition, new UX consultancy firms are emerging worldwide to assist in these tasks and guide online businesses.
Since many years, usability evaluation techniques emerged, and still are evolving, to find mechanisms for measuring what usability or UX mean (in terms of real quality of use of interactive systems) Fernández and Macías [2021] Goldbort [2016].The justification for conducting the research outlined in the provided text stems from the fundamental importance of Human-Computer Interaction (HCI) in the design of interactive systems.Given the increasing adoption of User-Centered Design (UCD) methodologies by companies to enhance their products' Quality of Use, there is a growing need for effective usability evaluation techniques.These techniques aim to measure the real quality of use of interactive systems and are continuously evolving to meet the demands of modern digital environments.
In this work, we put our focus in one of the most known and used techniques, that is Heuristic Evaluation (HE) Nielsen [1994] Nielsen and Molich [1990].We will take one of these proposals (developed by Granollers Granollers [2018b] Granollers [2018a] and mentioned in the newest version of Interaction Design book Preece et al. [2015] as a highly interesting proposal) to carry out a set of experiments in a business context in order to validate and improve the above mentioned method.
The article is organised as follows: firstly, background and the context of the study is presented followed by the methodology, results obtained in the evaluation and, a brief discussion.The final part presents potential threats to validity of research, conclusions, and future work.

Background and related work
Technological advances, increased market competition and more sophisticated customer expectations have made usability, once a luxury, a necessity.In this context, HE is a very useful method because its effectiveness and large body of studies, methodological variations and new proposals appeared to reach more and more effective usability evaluations in several contexts: web pages Bonastre and Granollers [2014], Charts of LIS Journals Alcaraz et al. [2021b], Augmented Reality (AR) systems Derby and Chaparro [2021] even, virtual reality (VR) interfaces Patnaik and Adrian [2022], Cheiran et al. [2021],.
However, several author explore about how usable are they to carry on a evaluation between group of UX experts Mutlu [2023] or even how the heuristics could be for them easy of understand to judge the system accurately Hvannberg et al. [2006] In this sense, we have been working for a long time on related aspects to improve usability evaluations.In particular, HE is one of the techniques in which we have been most involved.Members of our research group have created the chapter "Heuristic evaluation" in the book "Human-Computer Interaction" [González, 2001], as well as several research works in which we have studied on improving the heuristic evaluation technique.First, we used classical heuristics to perform evaluations of Spanish university web sites Lorés J [2005] Gonzalez MP [2008] and ecommerce environments Pascual-Almenara and Granollers-Saltiveri [2021].Subsequently, we created lists of heuristic principles to analyze static graphs Alcaraz et al. [2021a].Then, we analyzed and unified in a new proposal the heuristics of Nielsen and Tognazzini Granollers [2018b], this allowed us to obtain a list of heuristics more generic Granollers [2010b] or more directed to e-commerce environments Granollers [2010a].

Methodology of Heuristic Evaluation
In this work, we analyse whether this new proposal heuristic evaluation technique (developed by Granollers Granollers [2018b]) works well in a real business and professional context.The methodology proposed by Granollers have this characteristics: a complete list of 15 principles, as commented before, resulting from analyzing and synthesizing the Usability Heuristic Principles for the Design of User Interfaces by J. Nielsen and the Interface Design Principles by B. Tognazzini (see Table 1).The selection of these heuristics stems from a strategic combination of Nielsen's focus on system usability and Tognazzini's emphasis on interaction design.By integrating both perspectives, a more comprehensive and optimized list of criteria for evaluating the usability and interaction of digital systems can be achieved.This approach ensures a holistic assessment that accounts for both usability principles and design elements, thereby enhancing the effectiveness of heuristic evaluation in assessing digital system performance.
In the end of the analysis the list of Nielse and Tognazzini's, a total of 60 specific questions were obtained.
Each of them have a scored with only 4 answers ("Yes" with value 1, "Yes, but some cases missing" with value 0,66, "Not always" with value 0,33, "No" with value 0).
In the case of questions where the answer is "Not applicable" or "Not a problem", these will not be computed, i.e. it will be as if that question did not exist.Questions where the answer is "Warning (impossible to check)", will be counted, but no marks will be assigned to them (see Table 2).
Finally, with all the values of each answer we could get a value called Usability Percentage (UP) that gives a global idea of the usability level of the analyzed interface.
The resulting heuristic has never been tested before and this document shows how it was used in a real business envi-ronment.

Study context
Sperientia [studio-lab] 1 , a research laboratory in User Experience with its headquarters in Mexico, specialized in evaluating and researching the user experience of digital products and services, they agree to participate in this study as it could be useful for improving their job.To enable Sperientia participation in the experiment, take part in the study.Fifteen usability experts were organized in three teams or "labs 2 ",(LabX, LabY, LabZ) consisted of between 3 and 6 experts, only senior usability experts with 3-5 years of experience were selected.The variation of experts in each lab depended on the number of people working in each of the company's sites.They participate in the study, being responsible for the different evaluations, and the discussion of the results and the final surveys.The experiment consisted of evaluating 8 websites, four supermarket platforms (HEB, LaComer, Walmart and Chedraui) and four bank platforms (Santander, BBVA, Banorte and HSBC) considering two relevant sectors such as food and banking, where millions of users make transactions and purchases online.Each website was evaluated by three Sperientia labs and following with the same instrument, the heuristic methodology proposed in by Granollers.See section 2 for more details.
In order to evaluate whether it is more effective to conduct evaluations individually, and then share the results with the other experts in the laboratory or, on the contrary, it is better to perform them in a group from the beginning, the type of evaluation of each website was alternated between individual and group evaluations.Out of the 8 evaluations carried out, 4 were done in groups and 4 individually (see Table 4).

Launching the Study
The evaluation of supermarkets and banks websites was carried out from February to May 2021 with 15 expert evaluators from Sperientia [studio-lab].To facilitate the knowledge of sites to be assessed by the evaluators before starting the heuristic evaluation, a set of different tasks was proposed for each site: 4 tasks for the supermarkets and 3 tasks for the banks (see Table 3).All the tasks were directly related with habitual uses.
To carry out the study, the usability evaluators first performed the heuristic evaluations, either as a group or individually.Subsequently, they answered a survey in Google Forms to obtain proposals for improving the methodology.

Usability evaluation
To carry on the usability evaluation, all evaluators used the heuristic methodology proposed by Granollers Granollers [2018a] and using a MS© Excel template 3 specifically created for this purpose.The template has the 60 questions organized into the 15 heuristic principles as part of the proposal (see Table 1).The template should be used for each individual assessment.
To answer each of the questions in the list of heuristic principles provided by the methodology, different answers were proposed.As can be seen in (see Table 2), the first four answers have an associated score, which depends on the degree to which the question is fulfilled.Each answer follows the colour metaphor of a traffic light, so a reddish colour indicates low compliance, and a greenish colour indicates high compliance of the criterion on the website.Regarding the last three answers (not applicable, not a problem or warning) they do not intervene in the total score since the fact that a question is not fulfilled is not considered a negative aspect.In addition, it is possible that the evaluator may not be able to check the entire system, which should not be scored negatively either.In next section, we can see different tables on Figures ( 1 to 8 ).It is important highlight that if the score of a principle is of medium value, it is represented in yellow colour; if the cell is green, it indicates that in general good answers have been obtained; if the colour of the cell is orange or red, it means that most of the answers were negative.Finally, if the cell shows a 0 without colour, it indicates that none of 3 Heuristic evaluation MS© Excel template: http://mpiua.invid.udl.cat/wp-content/uploads/2018/04/Evaluaci%C3%B3n-Heuristica-v2018-OK.xlsx the questions in that principle have been considered a problem.Another relevant factor to consider in the methodology is that each question has a cell for the evaluator's comments, something really important to understand the values given by every answer.
The heuristic evaluation results in a percentage according to the evaluated principles, the Usability Percentage (UP), which corresponds to the level of usability that the website has.UP is the result of adding the positive values evaluated in each heuristic question and transformed to percentage.The more green colour in the cells, the higher the UP value.

Survey of evaluation the Heuristic Methodology
One of the aims of this research was to observe whether heuristic principles and methodology were correct for carry out the heuristic evaluation proposal in a business context.Concerning the above, in order to obtain proposals for improving the methodology, each evaluator answered a survey at the end of every heuristic evaluation what he/she did.The survey was launched in google forms format seeking the as honest as possible opinion of the usability experts about the methodology and which aspects should be improved.(See all form questions in the Annex 9).

Results
The results are organized according to quantitative data, and they are presented in individual, group and survey results.
The UP, which stands for Usability Percentage, is a quantitative measure obtained from synthesising the detailed data of evaluations conducted using the heuristic evaluation methodology.

Individual results
Supermarket webpages like HEB, Walkmart, and parcially Chadraui and bank webpages like Santander, Banorte, and pacially BBVA were evaluated individually

Supermarket WebPages -HEB
Several aspects will be discussed on the table shown in Fig 1.
• LabX: As we can see, apart from (eval 3), which has a higher percentage, the rest of the evaluations obtained a Usability Percentage around 50%.And in most of the principles, similar results had been obtained.The heuristic principles #15 (Latency reduction) had the lowest score and the heuristic principles #3 (User control and freedom) and #4 (Consistency and standards) were the best rated.The mean of Usability Percentage of Lab X was of 53,04%.• LabY: In general, the results of each principle of HE was very different among them.The two evaluations that obtained the highest results was those with several questions that could not be evaluated (not applicable or not a problem).And although we found that most of the results were different, they coincide in the result Type of website List of Tasks Supermarket • 1. Create a shopping list and add products (a litre of Apurna lactose-free milk and 18 rolls of Cottonelle toilet paper); • 2. Search for products, identify their nutritional information and add them to the shopping cart (Selecta wheat flour and Gloria unsalted butter 90 grams); • 3. Search for a discounted TV screen and add it to the cart (Samsung 65" UltraHD Smart LED TV); • 4. Review the products added to the cart and check the shipping price and available delivery times.

Bank
• 1. Explore the website to search for a credit card with particular characteristics (minimum income of pesos to apply for it, earning points in supermarkets and cost of card cancellation); • 2. Search for a nearby ATM and find the route and distance to the ATM, hours and services available; • 3. Search for information to apply for a personal loan with the fol-lowing characteristics (minimum amount of $100,000.00Mexican pesos and a maximum repayment period of 36 months) The worst rated heuristic principles were 9 (Aesthetic and minimalist design) and 13 (Autonomy).The mean of Usability Percentage of Lab Z was 37.04%, a very low value.

Supermarket WebPages -Walmart
Several aspects will be discussed on the table shown in Fig 2.
• LabX: In this evaluation, the results obtained were very similar.Only the third evaluation stands out slightly with respect to the others because a higher score was obtained.The highest rated heuristic principles were 4 (Consistency and standards) and 11 (Save the state and protect the work).There is no principle with an outstanding low score, but the principle with the lowest score was 7 (Help users recognize, diagnose, and recover from errors).A mean of Usability Percentage of Lab X score of 73.8% was obtained.• LabY: The results were similar and a Usability Percentage of around 75-85% had been obtained, which is a very good result for a first evaluation of the website.We can highlight that all the evaluators had not considered applicable any heuristic question of principle 11 (Save the state and protect the work) and there was no principle that indicates a serious problem in the evaluated website.A mean of Us-ability Percentage of Lab Y of 80.98% was obtained.• LabZ: There were two clearly differentiated ranges of values, since while three of the five evaluators who participated in the heuristic evaluation obtained a Usability Percentage of between 50-60%, the remaining two evaluators obtained a Usability Percentage of around 70%.The best rated heuristic principles were 4 (Consistency and standards) and 8 (Preventing errors), and the worst rated heuristic principles were 1 (Visibility and system state) and 2 (Connection between the system and the real world, metaphor usage and human objects).A mean of Usability Percentage of 62.24% was obtained.

Bank WebPages -Santander
Several aspects will be discussed on the table shown in Fig 3.
• LabX: The highest scoring heuristic principles were 4 (Consistency and standards) and 10 (Help and documentation).The lowest scoring heuristic principles were 6 (Flexibility and efficiency of use) and 15 (Latency reduction).The mean of Usability Percentage was 53.90 • LabY: The highest scoring heuristic principles were 1 (Visibility and system state) and 4 (Consistency and standards).The lowest scoring heuristic principles were 7 (Help users recognize, diagnose, and recover from error) and 9 (Aesthetic and minimal-ist design).The mean of Usability Percentage of LabY was 60.25%.• LabZ: In these individual evaluations, it could be observed that the first three evaluators marked many answers as "warning".Specifically, the first two evaluators left more than half of the questions unchecked, and the third one left a third unanswered.For this reason, the written comments on each question had been taken into consideration and the quantitative results will not be taken into account, as they were not comparable to the rest.Regarding the three remaining evaluators, two of them obtained very similar Usability Percentage, around 40%, and the last one obtained a much higher Usability Percentage, around 80%.The heuristic principles with the highest scores were 3 (User control and freedom) and 6 (Flexibility and efficiency of use).The lowest scoring heuristic principles were 7 (Help users recognize, diag-nose, and recover from errors), 11 (Save the state and protect the work) and 15 (Latency reduction).The mean of Usability Percentage was 58.27%.

Bank WebPages -Banorte
Several aspects will be discussed on the table shown in Fig 4.
• LabX: The highest scoring heuristic principles were 1 (Visibility and system state) and 12 (Color and readability).The lowest scoring heuristic principles were 10 (Help and documentation) and 11 (Save the state and protect the work).The mean of Usability Percentage was 43.87 • LabY: The highest scoring heuristic principles were 12 (Color and readability) and 13 (Autonomy).The lowest scoring heuristic principles were 5 (Recognition rather than memory, learning and anticipation) and 9 (Aesthetic and minimalist design).The mean of Usability Percentage was 57.03 • LabZ: The highest scoring heuristic principles were 3 (User control and freedom) and 4 (Consistency and standards).The lowest scoring heuristic principles were 5 (Recognition rather than memory, learning and anticipation) and 9 (Aesthetic and minimalist design), the same as LabY.The mean of Usability Percentage was 41.77%.

Group results
Supermarket webpages like LaComer and partially Chadraui and bank webpages like HSBC and partially BBVA were evaluated on group way.

Supermarket WebPages -LaComer
Several aspects will be discussed on the table shown in Fig 5.   • LabX: The highest scoring heuristic principles were 2 (Connection between the system and the real world, metaphor usage and human objects), 4 (Consistency and standards) and 6 (Flexibility and efficiency of use).The lowest scoring heuristic principles were 5 (Recognition rather than memory, learning and anticipation) and 10 (Help and documentation).The Usability Percentage of the group was 59.6%.

Supermarket WebPages -Chedraui
Several aspects will be discussed on the table shown in Fig 6.
• LabX: The highest scoring heuristic principles were 7 (Help users recognize, diagnose, and recover from errors) and 11 (Save the state and protect the work).The lowest scoring heuristic principles were 1 (Visibility and system state), 12 (Color and readability) and 13 (Autonomy).The Usability Percentage of the group was 41.4%.• LabY: This HE was individual.There were principles where most evaluators had agreed on a more positive or negative assessment.The heuristic principles with the high-est scores were 4 (Consistency and standards) and 9 (Aesthetic and minimalist de-sign).As we can see in the results, the mean of Usability Percentage was 52.83% (LabZ) in individual evaluation and the group evaluation had obtained 59.3% and 55.5% respectively (LabY, LabZ).A very similar value was observed between them.• LabX: The highest scoring heuristic principles were 1 (Visibility and system state), 6 (Flexibility and efficiency of use) and 13 (Autono-my).The lowest scoring heuristic principles were 10 (Help and documentation) and 11(Save the state and protect the work).The mean of Usability Percentage of the group was 33.3 • LabY: The highest scoring heuristic principles were 3 (User control and freedom), 5 (Recognition rather than memory, learning and anticipation), 6 (Flexibility and efficiency of use), 7 (Help users recognize, diagnose and recover from errors) and 13 (Autonomy).And no heuristic principle could be highlighted with a low score.The mean of Usability Percentage of the group was 83.2 • LabZ: The heuristic principles with the best scores were 6 (Flexibility and efficiency of use) and 7 (Help users recognize, diagnose and recover from errors).The lowest scoring heuristic principle was (Save the state and protect the work).The mean of Usability Percentage of the group was 66.5

On the table in
As we can see in the results, the mean of Usability Percentage was 52.83% (LabZ) in individual evaluation and the group evaluation had obtained 59.3% and 55.5% respectively (LabY, LabZ).A very similar value was observed between them.

Survey results
A total of fifteen evaluators participated on the survey (See Annex 9 ).The quantitative and qualitative data obtained from the satisfaction surveys completed by the participants at the end of all the heuristic evaluations are presented below.Enhancing Usability Assessment with a Novel Heuristics-Based Approach Validated in an Actual Business Setting Pascual et al. 2024

Quantitative data
The results of the survey allowed see where we could improve the heuristic evaluation methodology in order to be easier to understand.See the charts on Annex 10: Graphic Results of final survey.Regarding Question 1 (Difficulty in understanding the functioning of the heuristic evaluation methodology) (See Fig 11), one third of the evaluators responded that it was of medium difficulty, and the rest of the responses were positive (a score of 6 or more).Regarding Question 2 (Adequacy of the value scale of the heuristic evaluation) (See Fig 12), the results showed that some evaluators were confused about the meaning of the answers, to the point of not knowing which answer to select to answer the evaluation (see Table 4).The results obtained in Question 3 (Are the heuristic principles evaluated sufficient?) (See Fig 13) showed that more than half of the evaluators (66.7%) considered that they were sufficient and adequate to make a complete usability evaluation of a web site.The results obtained in Question 4 (Are the questions asked for each principle sufficient and adequate?)(See Fig 14) more than half of the evaluators (53.3%) considered that the questions included in each principle were not sufficient or adequate for a complete evaluation of a website.
According to Question 5 (Comments are important and add value to the final result) (See Fig 15), 60% of the evaluators considered comments to be essential in the evaluation.The remaining experts also considered the comments important, but not essential.Only 6.7% of the experts did not consider the comments to be important and that the same result could be reached without them.Furthermore, this question shows that, if there were no group discussion after an individual heuristic evaluation, it would be very difficult to understand the responses of the other evaluators.Finally, the evaluators believed that the comments help to identify an error in case you want to resolve it.The results of Question 6 (How much better is this methodology than the one used so far) (See Fig 16) were mixed.According to the comments of evaluators, they indicated that one of the positive points of the methodology is the numerical result (Usability Percentage) as it could be interesting for customers.According to the results obtained in question 7 (Would I use the heuristic evaluation methodology in future evaluations) (See Fig 17), there is a diversity of opinions (33% consider it better, 33% find no difference between the methodology they usually use and 33% do not prefer to use it).About Question 8 (Leave a comment or opinion on aspects to improve the methodology), in general, all the opinions received were positive and indicated that the methodology can be very useful in the business world, although it needs some improvements: for example, adding some principles of the Gestalt law [Graham, 2007] or principles of psychology [Yablonski, 2020] that help to be clearer about the aspects to be improved in each web.It would also be interesting to improve the approach of some questions and clarify the meaning of the answers at the beginning of the evaluation.

Qualitative data
Although the evaluators rated the methodology positively, and that it can be useful in the business environment due to its ease and speed in obtaining a quantitative result of the usability of a website (the Usability Percentage UP), there are several aspects of improvement that can be applied to the methodology to contribute to a better system.The results have been subdivided into three groups:1.Improvements related to general aspects of the method; 2. Improvement about specific questions; 3. Improvements about specific answers.Each of the problems observed is explained below, together with the best solution considered in each case.

Improvements related to general aspects of the method.
• It is necessary to have a section that explains in more detail what each possible question/answer consists of.Improvement: add a comment to the question • Indicate the evaluation process to observe progress in the total evaluation.Improvement: add a progress bar.

Improvement about specific questions
• There are spelling mistakes in some questions.Improvement: correct them.• Some questions may confuse the evaluator depending on the way they are worded: a) some questions are worded in a very similar way and the evaluator does not know what to answer; b) some questions are write in positive and others in negative and make the evaluator doubt which answer to apply; c) some questions are complex to answer because the evaluator does not understand what is to be evaluated.Improvement: in all these cases the question should be worded more appropriately, or comments should be added to help the evaluator in the evaluation so that he/she does not get confused about the answer.• There is difficulty in evaluating principle 12 (Color and legibility) because some evaluators do not know how to evaluate it.Improvement: indicate specific instructions or tools in the comments of the question so that the evaluators are clear about the evaluation process.

Improvements about specific answers
• The answers are only in one language.Improvement: include both languages.• The answer "NO" contrasts with "Yes, but some cases are missing".Improvement: for consistency in the answers, change "NO" to "No, in any case" (see Table 2) ).• The evaluators had doubts when choosing between the answers "Not applicable" and "No problem", because they were not sure if they had correctly evaluated the guideline.Improvement: It is suggested to use an ASQ (After-Scenario Questionnaire) response system [Lewis, 1991] or a Likert scale [Clark and Watson, 2019], as they are standard ratings, or to use the current scale but supplemented with a glossary specifying what each of the answers means.

Discussion
The results of the study were summarised and the highest and lowest rated principles were analysed.The highest rated heuristic principles were: 4. Consistency and standards (7 times), 11.Save the state and protect the work (5 times), 7. Help users recognize, diagnose, and recover from errors (3 times), 2. Connection in the real world (3 times) and 6.Flexibility and efficiency of use (3 times).The lowest rated principles were: 1. Visibility and system state, 12. Color and readability and 13.Autonomy (all of them 3 times).The data indicate that the websites of supermarket analyzed are consistent and store user workspace to ensure better service; the user perceives errors adequately and understands the actions performed by icons and graphic elements; the website interface adapts to different screen resolutions.On the contrary, and according to the data collected in the research carried out, there are interactive elements that users do not perceive (links without underlining, for example); the size and color of the website text makes it difficult to read properly (for example: small and gray text in white font); in some cases the system status is not visible or updated and the user don't know what have to do.About banks evaluation, the highest rated heuristic principles were: 4. Consistency and standards (6 times), 6. Flexibility and efficiency the of use (5 times), 1. Visibility and system state (4 times), 3. User control and freedom (3 times), 12. Color and readability (3 times) and 13.Autonomy (3 times).The lowest rated principles were: 11.Save the state and protect the work (5 times), 15.Latency reduction (3 times), 9. Aesthetic and minimalist design (3 times).Like the supermarket websites, the bank websites analyzed are consistent and the interface is well adapted to small screens.In contrast to the supermarket websites, the websites of banks have interactive elements that the user can easily navigate, the size and color of the text is optimal and the user understands the status of the system.As shown in the table in Fig 9, the best rated supermarket was Walmart with a mean of Usability Percentage (UP) of 72.34% in individual evaluations and the worst rated was HEB with a mean of UP of 49.25% in group evaluations.The best rated bank was HSBC with a group mean of UP of 61.00% and the worst rated was Banorte with an individual mean of UP of 47.56The mean value of the evaluations carried out on the 4 supermarket websites was 59.06% of UP, and on the 4 bank websites was 55.48% of UP.This indicates that the supermarket websites have a better usability than the bank websites.Regarding the analysis of the data obtained individually vs. group, it can be seen that the results have not varied excessively.We obtained 57.88% of mean of UP from group evaluations: HEB: 49.25%, Walkmart: 72.34%, BBVA: 55.88% and HSBC: 61.00%.We obtained 56.66% of mean of UP from individual evaluations: LaComer: 63.93%, Chedraui: 50.71%,Santander: 57.47%, Banorte: 47.56%.According to these results, it is considered that evaluate in group or in individual way is adequate.However, it is important to keep in mind that evaluating individually may reveal more usability problems, but it also requires a larger budget due to the additional time needed.
The survey results enabled a qualitative assessment of the methodology.The evaluators analysed all proposals and comments to identify those that could substantially improve the heuristic evaluation methodology.Solutions were proposed for all identified problems, resulting in a new version of the methodology (version 20214 ).
One conclusion drawn from the survey results is that there are both strengths and areas for improvement in the heuristic evaluation methodology.While a significant portion of evaluators found the heuristic principles and questions to be sufficient for conducting a usability evaluation, there were notable challenges identified regarding the understanding and adequacy of certain aspects of the methodology.For instance, the survey revealed that some evaluators struggled with understanding the functioning of the methodology and the adequacy of the value scale used.Additionally, there were mixed opinions regarding the sufficiency and adequacy of the questions asked for each principle, with a majority indicating that they were not entirely sufficient for a complete evaluation.Furthermore, the importance of comments in the evaluation process was highlighted, with a majority of evaluators considering them essential for understanding and adding value to the final result.This suggests that group discussions after individual evaluations play a crucial role in clarifying responses and identifying errors.In conclusion, the survey results underscore the importance of continuous refinement and adaptation of heuristic evaluation methodologies to address usability challenges effectively in the dynamic digital landscape.Incorporating feedback from evaluators and integrating additional principles from related fields could contribute to enhancing the clarity, effectiveness, and applicability of the methodology in real-world contexts.

Potential threats to validity of research
Potential threats or limitations in this research could include: • Limited Generalizability: The experiment was conducted with a specific group of usability experts from a single reputable company.This may limit the generalizability of the findings to broader contexts or different types of evaluators.Addressing these potential threats through the methodology, transparency in reporting, and careful interpretation of findings can help strengthen the validity and reliability of the research outcomes.

Conclusions and future work
The aim of this research was to analyse the effectiveness of the heuristic evaluation technique in a professional business context.The development and refinement of the New Proposal Heuristic Evaluation methodology mark a significant advancement in usability evaluation within professional business contexts.The experiment carry out provided sufficient information to propose an improved version of the HE methodology, particularly for real-life business environments.After conducting heuristic evaluations of eight websites, including supermarket and banking websites, in a business context, comments were collected from a total of fifteen expert evaluators who used the methodology for a few days.
The results showed that the methodology is useful due to its well defined heuristic principles and corresponding questions, as well as its ease of use.In conclusion, the New Proposal Heuristic Evaluation methodology represents a dynamic and adaptable approach to heuristic evaluation that holds promise for enhancing usability assessment in professional business contexts.By combining individual expertise with collaborative insights, and by embracing iterative refinement and innovation, the New Proposal Heuristic Evaluation methodology framework stands poised to contribute to the ongoing pursuit of usercentered design excellence in the digital age.
Looking ahead, the New Proposal Heuristic Evaluation methodology opens avenues for further refinement and innovation in usability evaluation methodologies.Future research endeavors could explore the integration of complementary evaluation techniques, such as user testing and cognitive walkthroughs, to provide comprehensive insights into website usability.Additionally, the adaptation of the New Proposal Heuristic Evaluation methodology to emerging technologies and digital platforms beyond traditional websites could extend its applicability to diverse contexts, such as mobile applications and e-commerce platforms.

Annex 1: List of questions of survey
These questions were answers for the evaluators on a final survey.The document of the final survey can be found at the URL below: https://forms.gle/YT5vttQPdFiLPBwd6 1. Rate the difficulty in understanding the functioning of the methodology provid-ed for carrying out the heuristic evaluations (1.Very difficult -10.Very easy).

Figure 2 .
Figure 2. Results of individual HE of Walmart: LabX, LabY, LabZ, with mean of Usability Percentage.

Figure 3 .
Figure 3. Results of individual HE of Santander: LabX, LabY, LabZ, with mean of Usability Percentage

Figure 4 .Figure 5 .
Figure 4. Results of individual HE of Banorte: LabX, LabY, LabZ, with mean of Usability Percentage Fig 8, we will comment on several aspects.

Figure 6 .
Figure 6.Results of group and individual HE of Chedraui: LabX, LabY, LabZ, with mean of Usa-bility Percentage.

Figure 9 .
Figure 9. Results of questions 1 (Difficulty in understanding the functioning of the heuristic evaluation methodology).
2. Rate the adequacy of the scoring scale for each question (1.Very inadequate -10.Very adequate).In case you do not find it adequate, please indicate why.What system/scale would you propose (text)?3. Do you think that the evaluated principles are sufficient/adequate for a complete usa-bility evaluation of a user interface?(Yes -No) If the question was NO, what would you change?(text) 4. Do you think that the questions asked in each principle are sufficient/adequate for a complete evaluation of that principle?(Yes -No).If the question was NO, what would you change (text).5. Do you consider that the comments are an important part and contribute something positive to the result of the evaluation?(Yes, they contribute a lot; Yes, although not always; They are good, but I think they do not contribute much to the final result of the evaluation -No, I think they are not important and you could reach the same re-sult without them) Why? (text) 6.Based on your first experience, how much better do you consider this methodology than the one used so far (1.very little -10. a lot) Why? (text) 7. Do you consider using this methodology in future evaluations?(Yes, I find it better than the one used so far; I don't find much difference between this methodology and the one used so far; No, I prefer to use the usual one).8. Optionally, you can leave a comment about your opinion and/or aspects that you think should be improved about the methodology.(text) 10 Annex 2: Graphic Results of final survey Due to lack of space, the image of graphic results of the survey are showed in this section.

Figure 10 .
Figure 10.Results of questions 1 (Difficulty in understanding the functioning of the heuristic evaluation methodology).

Figure 11 .
Figure 11.Results of questions 2 (Adequacy of the value scale of the heuristic evaluation)).

Figure 12 .
Figure 12. Results of questions 3 (Are the heuristic principles evaluated sufficient?).

Figure 13 .
Figure 13.Results of questions 4 (Are the questions asked for each principle sufficient and adequate?).

Figure 14 .
Figure 14.Results of questions 5 (Comments are important and add value to the final result).

Figure 15 .
Figure 15.Results of questions 6 (How much better is this methodology than the one used so far).

Figure 16 .
Figure 16.Results of questions 7 (Would I use the heuristic evaluation methodology in future evaluations).

Table 1 .
List of 15 heuristic principles evaluated in the proposed methodology.

Table 2 .
Score associated with each answer.

Table 3 .
List of tasks by the type of website.

Table 4 .
List of websites evaluated.
The highest scoring heuristic principles were 6 (Flexibility and efficiency of use) and 7 (Help users recognize, diagnose, and recover from errors).The lowest scoring heuristic principle was 11 (Save the state and protect the work).The Usability Percentage of the group was 58.9%.
On the table in Fig 7, we will comment on several aspects.
• Bias in Evaluators: The expertise and background of the evaluators could introduce bias into the evaluation pro-cess, potentially impacting the reliability and validity of the results.•Influence of Familiarity: Since the evaluators were using the HE proposal for the first time in real-world scenarios, their lack of familiarity with the technique may have influenced their assessments and the outcomes of