Detecting Shoved Code
I’m constantly amazed at how biological code seems. It grows and it sprawls. It becomes easier to patch than change deeply. I have this theory that any system the where the cost of changing old structure is high has a strong biological feel. Code, in the worst cases, is like a decaying city. It's easy to add a new sign or a clean up a single storefront but hard to replace the sewer or the streets.
The fact of the matter is that the industry has to start taking refactoring much more seriously. Not the big “let’s spend the afternoon restructuring” refactorings but rather the simple “let’s clean up and extract” refactorings.
A particular way that code can go bad is through the addition of large blocks of code to existing methods. Imagine a method of ten or twelve lines that is well-factored and easy to read. It’s hard to imagine a case where a commit just simply adds five or more contiguous lines of code to that method is ok. Developers who take the work seriously often do something more involved than that. They may add that code but they also assess the intent of the method and attempt to clarify the work that happens within it - likely deleting nearly as many lines of code as they add and often creating new methods to delegate to. It seems that there ought to be a way to detect when we use the former mode of change or the latter.
I haven’t done any investigation yet, but I think I’m going to try to find detectable patterns for shoved code - code that was added into a code base without sufficient refactoring. At the minimum it seems that doing a diff on changed methods in a commit and using a threshold for added contiguous lines should provide a good start. When I’ve written other detectors I’ve started simple and tuned them. The ideal for me would be to have a pre-commit hook that just asks a question - are you sure you wanted to shove that code?