Bug severity explained

Recently I got an email asking me about how I classify bugs and issues and how the different categories for a bug’s severity, like Critical and Major can be explained clearly.

I’m a software engineer and for the longest time have I approached everything in my work as a software engineering problem. Bug reports are one of them. Any bug report is a report on how software does not behave like it should.

The overview page does not load < 2 seconds

Fix the calculation of the thing to reflect real world exchange rates

Bug reports are not about your software. Bug reports are about users and stakeholders using your software to perform a task. And a bug report tells you that a user cannot perform that task in an acceptable manner.

How do real engineers do it?

Computer programmers have a tendency to reinvent the wheel every few months. It’s surprising to see new ecosystems (looking at you, JavaScript) reinvent a lot of unix principles that have been around for over forty years.

So, let’s take a look at how the non-computer engineers tackle the categorization of bugs and issues. I found that ProQC has a nice write up on the topic. Note that their guidelines are aimed at physical products, but there’s a sensible parallel to software products.

Critical

Any condition found which poses the possibility of causing injury or harm to, or otherwise endangering the life or safety of, the end user of the product or others in the immediate vicinity of its use.

Luckily for most software engineers, nobody dies or gets injured from your software not working correctly (although many stakeholders want you to think otherwise). This is probably true for 99% of all web applications. That does not mean that bugs cannot cause harm in other ways.

Critical issues result in (most likely) financial damages for you or your customer.

Missed business opportunities
Waste due to product inavailablity (imagine an office of 100 people thumb twiddling because the website is down)
Fines because of failure to comply with rules and regulations
Customers leaving or seeking compensation for their damages

Basically any issue that results in real word damages should be classified as critical. Of course, this is related to your terms of service or service level agreement.

Major

Any condition found adversely affecting the product’s marketability and sale-ability or adversely affecting its required form, fit or function and which is likely to result in the end user returning it to the source from which is was purchased for replacement or refund.

Imagine you buy a new iPhone and when you unpack it, it has a scratch across the screen. You don’t want that, although the phone probably works just fine.

In software I would categorize this as any issue that prevents a user from correctly or efficiently performing a task. It’s closely related to critical issues, but the financial impact is limited.

Minor

Any condition found which while possibly less than desirable to the end user of the product, does not adversely affect its required marketability, sale-ability, form, fit or function and is unlikely to result in its return to the source from which it was purchased.

This will include most software issues that you would call annoyances or improvements.

Trivial

Trivial is not mentioned in the ProQC list, which kind of makes sense. A physical product is impossible to update without substantial cost. Software, however, can be updated frequently and strategies like continuous deployment make rolling out these changes almost painless.

I’m not a fan of the trivial category. It’s just too confusing for everyone involved. Trivial has too many different meanings to different people.

this is a huge issue and should be fixed asap
this fix is so easy
this issue is not a big deal, but I had to report it anyway

Any bug you’d classify as trivial has either a critical, major or minor impact on the user, so classify it as such.

Quantitative measurements

It’s tempting to add metrics to each category’s description. Of course you need to specify what you expect your application to do, but a description like, “It’s a major issue if the overview page takes longer than 10 seconds to load,” will not hold up. Aside from how and who measures this, what if the load time is a consistent 9.5 seconds?

The question should always be: in what way if the user impacted with this defect? Does it cost them money, will they stop paying for your SaaS subscription or is it an annoyance that can be optimized later?

I hope this post was helpful to you to get a grip on how to categorize bugs and issues. This is by no means a definitive list and I believe that it’s a good practice to regularly evaluate which categories you use and how you use them.

Written by