Heuristic evaluation was used by Jakob Nielsen in 1990 to describe a type of inspection method [5], for identifying usability problems of computer software

University Degree Mathematical and Computer Sciences

Heuristic Evaluation

Stella Mills

Department of Computing & Multimedia

University of Gloucestershire

The Park

Cheltenham

GL50 2QF, UK

e-mail: [email protected]

Heuristic evaluation was used by Jakob Nielsen in 1990 to describe a type of inspection method [5], for identifying usability problems of computer software and, in particular, the user interface [4]. Heuristic evaluation is the most informal method of usability inspection methods and ‘involves having usability specialists judge whether each dialogue element conforms to established usability principles. These principles are normally referred to as the heuristics …’ [1, p.5]. Alternatively, heuristic evaluation is a usability engineering method ‘for finding the usability problems in a user interface design so that they can be attended to as part of an iterative design process. Heuristic evaluation involves having a small set of evaluators examine the interface and judge its compliance with recognized usability principles (the ”heuristics”)’ [4]. Usability engineering applies the principles of engineering to user interface design [6].

There is, therefore, no commonly accepted formal definition of an heuristic evaluation but essentially, a number of heuristics or principles are derived, usually from the literature, and then applied to the artifact to be evaluated, generally as a checklist. Potential problems for users are identified and suggestions made for their solution. The method does not involve the system’s users and is generally completed by at least one, but preferably up to five, human factors’ experts who should not have been involved in the development of the software. The need for experts, but not system users per se, places heuristic evaluation within the ‘expert method’ category of general methods of research in the social sciences. The results from an heuristic evaluation can be tabulated or incorporated into a report for a client. It is fairly cheap to perform and has been branded a Discount Usability Engineering method – a method that is cheap, quick and easy to use [4]. Although the method is generally associated with evaluating software, and, in particular, software interfaces, it can be applied to the evaluation of any artifact. In a general way, it has been used for many centuries as any criteria-based evaluation can be seen to be a form of heuristic evaluation.

Applying heuristic evaluation

The method depends on two factors: first, the derivation of the heuristics and, secondly, the application of those heuristics as part of the evaluation process of the software. If either of these is shallow then the evaluation will be weak and may return false data.

It is important that the heuristics are relevant, sound and applicable to the software in question as otherwise, time will be wasted generating useless data. The original heuristics (given in Table 1), derived by Nielsen and Molich in 1990 [4], are of a general nature and thus may generate errors in interpretation and application as they can be interpreted in different ways. Indeed, Nielsen refined this list, adding ‘help and documentation’ in 1991, and eventually ending up with those given in Table 2. Each of these heuristics in Table 2 was given explanatory notes by Nielsen in an attempt to focus the evaluation and add clarity [4]. This illustrates the problem of using loosely defined heuristics and emphasises the inherent understanding of meaning in these heuristics which may be misinterpreted especially if the evaluator is inexperienced. It can be argued [2] that it is better to use even more precisely defined heuristics than those in Table 2, which relate directly to the software under evaluation as these can be focused to the requirements of the evaluation and the software. A first pass through the interface before deriving the heuristics [4] can help to identify areas for focused evaluation. It is possible, with experience, to define a generic set of heuristics for a specific type of software interface such as a website which can be suitably fine tuned for each website under evaluation; in such a case, money is saved by not having to define the heuristics for every evaluation while the evaluation remains pertinent to the design of the software. Generally though, different types of software interface require a different set of heuristics although there may well be overlap within each set.

Table 1 and Table 2

This is a preview of the whole essay

Table 1 and Table 2

Having derived the heuristics, it is essential that they are applied consistently. Generally, they are used as a checklist, with the evaluator deciding upon compliance or otherwise as s/he goes through typical tasks that typical users may carry out. The results are often presented as a table with three columns [2], the first listing the heuristic, the second column indicating the interface’s compliance or otherwise, while the third column is devoted to comments, including a note of any potential problems and possible solutions that the heuristic raises. A fourth column may be added to rate the severity of the problem (vide infra). Immediately, it can be seen that the expert knowledge of the evaluator influences the problems found and their possible solution. Thus, the evaluator needs at least two areas of knowledge: first, that of the purpose of the software and its functionality together with some idea of the characteristics of typical users and, secondly, knowledge of interface design from a user’s perspective. These ‘double experts’ [4] consistently find more usability problems than evaluators with only one specialism, which indicates that knowledge of human-computer interaction is essential for identifying problems as s/he role-plays the various characteristics of the domain users. It can be argued that the evaluation fares better with ‘triple experts’ (vide infra) who also have expert application domain knowledge.

The number of evaluators can vary from one to as many as are available but it has been found that an upper limit of about five [4] brings an increase in the identification of problems when compared to costs. If the evaluators are all double experts, then this number can be reduced and successful commercial evaluations have been achieved with only one evaluator. It can be argued [4], that if the heuristics are derived by a double expert then they can be applied by any user. However, this is fraught with problems of misinterpretation of the heuristics and the inability of the users to role-play other users; indeed, novice users usually cannot role-play expert users, although some experts can achieve some success with the method.

When the evaluators have finished the evaluating session and the first three columns of the checklist table are completed, it is usual to give some indication of the severity, from the corresponding user’s perspective, of the problems found. If there is more than one evaluator, then the complete list of problems is circulated to all the evaluators who rank, from a usability perspective, all the problems found. It is not always possible for the evaluator to indicate the costs involved or other factors since they have not been privy to the formal design process. Nielsen [4] suggested a rating of 0 to 4 where 0 indicates there is no problem and 4 signals a usability catastrophe (Table 3). These ratings can be used to indicate where the software needs better usability input and which specific problems need fixing as a matter of priority.

Table 3

Timing of use of heuristic evaluation in the lifecycle

Heuristic evaluation can be used at any time in the lifecycle but, as with all human factors/ergonomics evaluation methods, the sooner it is used the more likely it is that problems will be identified and so will be cheaper to remove. Providing sufficient focus in terms of the characteristics of the users and the functionality of the system can be given, heuristic evaluation can be used on the first interface designs, even if these are in paper. If these evaluations are done by double experts then basic interface design errors, such as misuse of colour, too small font size of text etc., can be avoided. In systems which use templates for the interface design, such as is the case with an increasing number of websites, this can be most advantageous in terms of cost and time savings. However, it should be noted that even if heuristic evaluation is used at the beginning of the lifecycle, it should be used again, especially before the α-release (the first version to be tried by actual users in the field of domain) since problems can creep into the interfaces design as it is developed. One of the advantages of heuristic evaluation is that the same heuristics can be used as the interface develops but it is wise to vary the evaluators since familiarity with an interface may lead to errors and omissions.

The commercial use of heuristic evaluation

Heuristic evaluation is quick to use since it can be accomplished in a few hours, especially if interface templates are used and there is, therefore, much commonality between the various screens of the system. However, the deriving of the heuristics can take time, especially if these are derived from the beginning each time. For this reason, many commercial evaluators use a basic list of heuristics such as those in Table 2, which can be quickly focused for use with a particular application. In this case, an evaluation of a small system, together with the resulting report, can be produced in a working day and so the evaluation is seen to give minimal disruption to the project schedule.

This quickness makes the method cheap, which is mirrored by the fact that only expert evaluators are needed, thus keeping personnel costs low too. However, experts are only role-playing other users and while this can be useful for the system’s development, it should not replace testing and evaluating the system with users before it is finally released.

The relatively short time needed and the ensuing small costs of heuristic evaluation make it an efficient and extremely profitable evaluation method for both commerce and industry since the return in customer satisfaction can be increased with such a small outlay of time and money. This makes the method particularly suitable for small companies who may not be able to afford full user testing and certainly would have problems in financing major changes to the system after the α-stage of release. Thus, the use of heuristic evaluation early on in the lifecycle can prevent unnecessary costs, while providing efficient and useful feedback.

Issues arising from the use of heuristic evaluation

While heuristic evaluation can be a useful and efficient tool for interface evaluation, there are inevitably some concerns which must be recognized when using the method.

First of all, it is a method for interface evaluation only; it does not evaluate the efficiency of the system’s functional design and neither does it touch the functionality of the system. In addition, coding efficiency and other technical issues are not evaluated, although where such aspects of the system, such as modem speeds, impinge on the interface, heuristic evaluation may be able to inform the design. Even other interface design aspects, such as task design, are not usually evaluated in a thorough way through heuristic evaluation, although some characteristics of these matters can be included as heuristics. This leads to the necessity of suitably focused heuristics and to the quality of the evaluation which inevitably reflects the experience of the evaluators.

The need for ‘double’ experts has already been identified [4] and it can be argued that these are essential for the deriving of the heuristics. One of the problems with heuristic evaluation is the level at which the heuristics are stated. If they are too precise then this can limit the evaluation whereas if they are too general, different evaluators may interpret them in different ways. ‘Double’ experts, with both computing and human-computer interaction knowledge, should be able to help in achieving a balanced set of heuristics but a third area, the application domain of the software, also need consideration. It has been shown that domain knowledge helps in the designing of software functionality and task structure [3] and this knowledge is also useful when deriving heuristics for interface evaluation, particularly in specialist applications such as safety critical software. Thus, the persons deriving the heuristics should ideally have professional knowledge of computing, human-computer interaction and the application domain and, as such, be ‘triple’ experts. Such persons should also have the knowledge to ensure that the heuristics are sufficiently focused while not being too constraining.

There is general consensus that the number of heuristics for evaluation should be sufficient for an in-depth evaluation but not so many as to make the evaluation unwieldy. While not wanting to be exact about the number of heuristics, somewhere between six and 10 are usual with more than 12 causing unnecessary complexity. Some heuristics evaluators like to have fewer, say around six, overall categories and then derive more focused heuristics within these categories. This is a natural extension to the ‘explanatory notes’ given by Nielsen in Table 2 above. Whatever style of presentation of the heuristics is used, the important point is that they should be easily understood and applicable to the interface under evaluation.

One of the drawbacks of using heuristic evaluation is that it does not indicate any omissions from the heuristics, which again emphasises the need for ‘triple’ experts being involved in the derivation of the heuristics since these experts can focus the heuristics on the users’ needs with respect to the application domain. It is impossible to check for completeness in terms of the heuristics and it is likely that in most heuristic evaluations, completeness is not achieved. However, the use of a first pass through the system by an expert (preferably ‘triple’ but at least ‘double’) will help to ensure that those areas of the interface likely to cause problems for user are covered in the evaluation. Even so, aspects of the system can be missed, even when the heuristics have been well defined [5]; the only way to help to diminish this problem is to use multiple evaluators with both human-computer interaction and application domain knowledge. Indeed, other methods, such as task analysis, may be better at identifying omissions from areas such as navigation.

In applying the heuristics, it is inevitable that the personal bias of each evaluator will influence the results. This is not a problem affecting heuristic evaluation alone, but it can be more visible within heuristic evaluation simply because of the small number of evaluators. If only one evaluator is used, some usability problems may be missed but other may be exaggerated in importance due to the evaluator’s previous experience. Other evaluators may not place such emphasis on these problems; this may be particularly evident when severity ratings are used. In practice, the evaluators often meet after rating the problems in order to reach a consensus about the ratings so that the system’s developers can have clear guidance about the importance of the problems cited. Such a group discussion can help to reduce the bias of each individual and is useful in informing the results of the evaluation.

Group discussion and group working can be difficult owing to the different personalities in the group. It is important, therefore, that any group post-evaluation is formally chaired allowing each evaluator to express their views within the group. A chairperson should also sum up the findings formally so that a consensus of opinion is expressed in the report presented to the client. A full discussion of the problems of group working is beyond this article but such problems can arise within heuristic evaluation and can cause a slant upon the evaluation well beyond that of ordinary personal bias.

Conclusion

Heuristic evaluation is an inexpensive but efficient method of evaluating the interface of a software system and while it may not exhibit every usability problem of the system, it can be enlightening in terms of potential difficulties for different categories of user. Commercially, it is cost effective and consequently is used in industry to provide feedback, often well before the software is released for testing. While it is best achieved using ‘triple’ experts, it can be done by others since any reliable feedback is thought to be better than none.

References

1 Mack, R.L. and Nielsen, J., Executive summary, in Usability Inspection Methods, Nielsen, J. and Mack, R.L., Eds., John Wiley & Sons Inc., New York, 1994, chap 1.

2 Mills, S., Usability problems of acoustical fishing displays, Displays, 16, 115, 1995.

3 Mills, S., Integrating information - a task-orientated approach, Interacting with Computers, 9(3), 1998, 225-240.

4 Nielsen, J., Heuristic evaluation, in Usability Inspection Methods, Nielsen, J. and Mack, R.L., Eds., John Wiley & Sons Inc., New York, 1994, chap 2.

5 Preece, J., Rogers, Y. and Sharp, H, Interaction Design beyond human-computer interaction, John Wiley & Sons, Inc., New York, 2002.

6 Wixon, D., Jones, S., Tse, L. and Casaday, G., Inspections and design reviews: framework, history and reflection, in Usability Inspection Methods, Nielsen, J. and Mack, R.L., Eds., John Wiley & Sons Inc., New York, 1994, chap 4.