Why Engineers Should Try to Reproduce Production Issues Locally23 Jan 2023
As engineers, one of our primary responsibilities is to ensure that the systems we build are stable and reliable.
However, despite our best efforts, issues and issues will inevitably arise in production environments. When this happens, it can be tempting to try and patch the problem and move on quickly.
However, recreating production issues locally is a critical step in the debugging and resolution process.
In this post, I'll explain the benefits of reproducing production issues locally.
One of the main benefits of recreating production issues locally is that it allows us to reproduce the issue in a controlled environment.
By doing so, we can more easily isolate and identify the root cause of the problem. Trying to do this is much more challenging when the issue is only present in a production environment where many other factors are at play.
Recreating production issues locally allows us to test and validate any proposed fixes before deploying them. This is essential for preventing regressions and ensuring the fix resolves the issue. Recreating the issue locally can also help identify and resolve other problems that may have been overlooked.
Insights into the system
Another benefit of recreating production issues locally is that it can help improve our understanding of the system. By closely examining the issue, we can gain valuable insights into how the various components of the system interact, which can help us prevent similar problems from arising.
Improve communication and collaboration
I've also found recreating production bugs locally has helped improve the engineering team's communication and collaboration. By working together to reproduce and resolve the issue, we shared knowledge and learnt from one another, which helped improve the team's morale and motivation.
Is it necessary to reproduce every defect before identifying and fixing it?
Of course, not every issue will be reproducible locally because of certain complex conditions and environments, but engineers should at least try to do so. An example of these situations could be the hardware the software is running on, extensions the users' browser has installed or, as I once encountered, how users within China had a different experience of network conditions from those outside when connecting to an AWS instance within a Chinese region.
When teams cannot reproduce issues locally, then teams and businesses are put into a position where they have to decide if blind patches are a good idea and will not make a situation worse. One position you don't want to find yourself in is looking for something similar with the same consequence but a different root cause. Adding more instrumentation can help to prevent this.
Then, of course, are the pressures on engineering teams to produce a fix.
These pressures could be internal, from management wanting to see action, or external, to calm down an angry customer.
Engineers should attempt to recreate production bugs to improve reliability and stability of software and not damage the company's credibility or reputation.
One of the best ways of recreating production issues locally is by tests, which I'll discuss in the next post.