Thoughts about Refactoring or Rewriting when dealing with Legacy Code

Developers are often have to deal with this issue when inheriting code that was written in what seems like the dark ages. The design pattern may be poorly chosen, or non-existent. The code was written when before newer featured of the language were released. The code may have been developed before modern design methods were known.

When I was originally starting my career as a developer, I would never have expected to be dealing with code that was written 10, 15 or even 25 years ago. The first time I dealt with old-school code was when I was adapting an existing backup product for use as a microservice in a SASS product.

Government systems are also full with legacy code which may be written with less than ideal practices. When I’m adapting current components to a modern architecture I’m daily faced with this issue on whether to refactor or just rewrite from scratch.

Before I go further, I want to nail down some definitions. Refactoring involves minor changes to the code while keeping the same functionality. It usually occurs after unit tests and regression tests are developed. Refactoring should occur regularly in the process of maintaining existing code. Rewriting involves starting from scratch and recoding an existing application with the same functionality. Reworking involves redoing an application with significant functional and design changes.

Here the major considerations I use when deciding whether to rework or rewrite legacy code.

High coupling and low cohesion which becomes a significant obstacle to writing clean unit tests.

Coupling refers to the degree of interdependence software modules have on each other. For example, if functions or even modules share a significant number of global variables, that would be a case of high coupling. Cohesion refers to how related the functionality of a single module is to each other.

There have been instances where I’ve dealt with 30 year old code that contains methods with no parameters and the application state consisted of global variables. The dependency of the separate functionality may not be clear and the method might as well be called doStuff1(), doStuff2() and doStuff3().

This code may have been in place for years but the unit tests may be non-existent. In a case such as this, I usually would end up rewriting the code with the appropriate unit test. If I have to add additional functionality but the design prevents me from writing unit tests, than a rewrite is in order.

Hard coded settings, magic numbers

Magic numbers are the use of undefined hard coded numbers in code. Hard coded characters were often found in languages that didn’t implement a enumerated type. For cases such as this, I may refactor by externalizing the strings and magic numbers into a properties module or test file.

Extensive Violations of the DRY principle

You are reading a piece of legacy code and find the same statements repeated over and over with no or minor changes. Assuming the code still has low coupling you may be able to avoid a rewrite by placing the repeated code in it’s own method.

Final Thoughts

Finally, deciding to rewrite code is a business decision. However, adding new functionality to legacy code without regression tests in dangerous and therefore a rewrite may be necessary. So you would have to consider the value of the new functionality over the costs of rewriting the module.

However, remember that legacy code is there because it’s been working. It’s been proven reliable through it’s years of use.

Therefore, if the legacy code is being imported as a separate API. You may be able to use an adapter design pattern and build a wrapper around the legacy component to separate the legacy code from the new functionality.

For further reading check out Mitch Pronschinske’s blog where he provides 17 essential reads for developers on this issue.

My own favorite book on structuring code at the lower level is Code Complete by Steve McConnell. I try to reread this book every few years.