Unary Call Sites and Architecture
Using Static Call Counts to Explore Internal Structure
In an ideal world we’d be able to see more in our editors and IDEs. We see our code, and that’s just about enough to work but there’s so much more that we can know.
One of the things I’d like to know is the number of call sites that each method has. In frameworks or libraries, we’ll have methods that are never called internally - other code calls them but we don’t. Inside a component or a repo, most methods are called from more than one place or call site. Other functions are called just once. What does that mean?
Over this history of programming we’ve used many guidelines to decide when we need a new function or method. I once heard one that was very peculiar. The advice was to create a method only when it will be called from more than one place. The idea was that methods existed solely to deal with code duplication and that they were useless overhead otherwise.
Most people today have a different opinion. We see functions as intentions. We give them names and our code becomes clearer when we use those names to tell a story. We might have many functions that are called just once in our code.
I wanted to investigate this the other day so I wrote a little Ruby script to count the number of call sites for each method in a Ruby repository. It has some simplifying assumptions. It assumes that methods with the same name, regardless of whether they are on different classes, share call sites - and that can be the case when we use polymorphism. The numbers the script calculates might not be completely precise, but they are a good probe for this idea.
I ran the script for a few repos: rubygems, rake, and a few others, and I discovered something surprising. In most of them the percentage of methods with single call sites was about 20-23%. There are some outliers. There’s a code base developed by a friend that is 45% methods with single call sites. He and his team are very zealous about their design so I’m not surprised and I am going to investigate further.
What I really would like to know is whether methods with single call sites are candidates for decomposition of components in high level architecture. The top level of systems is often a set of entities that are not "instantiated" multiple times. Rather, they are accessed in one place by other pieces. Maybe call site analysis is a decent probe for finding places where those relationships have developed organically.
By the way, the fact that the heart of the script uses a programming style without method definitions is purely coincidental.