The Hidden Cost of Software Reuse
I have a confession to make. I don't automatically reuse code. I've been met with incredulous stares when, for example, a co-worker finds out I wrote my own drag-and-drop handling for my web page rather than just drop jquery in there and be done with it. But it seems to me that those same co-workers suffer from selective memory with regards to the hidden costs of software reuse.
Back when I started coding in the late 80's, "open source" was all-but unheard of outside of universities. There were reusable code libraries like RogueWave to be had, but those came with a monetary cost and it was practically impossible to demonstrate ROI well enough to get approval for a purchase request. But as the decades wore on and the internet began to pervade our daily lives and especially our work, more and more of these reusable libraries were made freely available by hobbyists or just plain fanatics. Now, there was no cost-justification that needed to be made — you still had to get security sign-off to use the things, but that was it. And as you surely know, they became incredibly popular. I use them quite a bit myself. But I do think that we've lost sight of the time cost of software reuse.
Now, don't get me wrong, I love open source. I love being able to run Linux on my computer and
gdb every single piece of software I have. I got started on a Commodore 64 and I played a lot of games that were written in C64 Basic and distributed in source code form. When I wondered how the writer made a certain effect happen, I could open up the program and see how they did it. I learned quite a bit about C64 Basic that way. I still crack open the occasional open source application to see what makes it tick. But when coding professionally, I always consider it a trade-off whether I want to reuse somebody else's code or write it myself.
For example, I was working recently with a REST-based service that uses the HTTP 'PATCH' verb to process updates to objects (as the REST standard states it should). However, when I tried sending a PATCH command through a
HttpURLConnection.setRequestMethod, I got the exception:
Invalid verb 'PATCH'
So here, I'm faced with a conundrum. Do I start researching how to make
HttpUrlConnection work with the 'PATCH' verb, or do I just write my own HTTP connection code? On the one hand, it may turn out to be a simple thing to make
HttpUrlConnection do what I need it to do — although a quick Google search makes it clear that it will take more than a few minutes. (Part of the problem with the "just google it" approach here is that the word "patch" means "update software", so I get a lot of irrelevant hits when I try to search that way).
If I just go ahead and write my own HTTP connection code, I know two things: how long it will take (within an order of magnitude) and that it will do exactly what I need it to do. If I stick with
HttpUrlConnection, I don't know how long it might take, and I don't know if I'll be successful. Now, if I write my own HTTP logic, my code won't handle gzip. It won't accept bare LF line feeds. It won't handle multiple unicode encodings. But in this case, that's OK. I'm not trying to write the world's most generic HTTP handler. I'm trying to write one that correctly handles the PATCH verb for a specific server for a specific use case and will likely never be used in any other context.
What about Jakarta Commons HttpClient? That's something of a de facto standard for low-level HTTP related work in Java (more standard than the
java.net classes that come bundled with the JDK when you install it, in fact). Well,
HttpClient is great when you want to cover the whole breadth of the HTTP specification. Unfortunately... it covers the whole breadth of the HTTP specification. It relies on yet more third party libraries. It introduces quite a few transitive dependencies, and it also makes unit testing difficult — so much so that I have to write a wrapper over
HttpClient so that my JUnit tests can mock it out! It has a lot of very useful functionality, like auto-following redirects, Mime encoding for file uploads, digest authentication... if I were doing more sophisticated HTTP interaction, it'd probably be worth the time to get it up and running. In fact, if you're just doing simple HTTP, but you don't actually know (or care) how HTTP works,
HttpClient is an obvious choice. But — it does take time to get up and running, and a "reuse first" developer always seems to gloss over that. This is what I mean about the "hidden" cost of software reuse. Software project management philosophy appears to be full of proponents who think nothing of spending weeks or even months trying to find a pre-built solution in order to save what would probably be a few weeks of coding. "It exists, therefore it must be used." If I had a nickel for every framework that promised to "take care of the plumbing so that I could get to the business of developing my application" that required weeks of study and experimentation before I could get around to using it to develop my application, I'd have too much money to be worrying about whether or not I was using the right framework or not.
Reuse orthodoxy says, "You must always use a pre-built solution when one exists". There seems to be this odd meme floating around: "A programmer... writing programs? That other people use to get work done? Preposterous!". Sort of the inverse of the "Not Invented Here" syndrome.
There are a few (very) good reasons to reuse existing code:
- You literally don't know how to implement something. SSL is a good example. It would take months to even understand the intricacies of SSL, much less write a usable instance of it.
- You know how, but you know you're going to be throwing so many different use cases at it that you know with absolute certainty that you'll be spending a lot more time debugging it than you have available. I know how to write my own web server. For production work, I'll stick with Apache, because I know that I'm going to be using a fair amount of what Apache has to offer, so it's worth the administrative overhead of getting it up and running.
But there are also good reasons not to. I always ask myself — am I using this to do something that I don't have time to do, or am I using this to do something that I don't understand how to do? Because if it's something I don't understand how to do, I'm asking for trouble if it doesn't work the way I expect it to. When somebody else's code doesn't behave the way you expected it to (likely because you didn't configure or call it correctly), you're left spending time figuring out what assumptions the other person made - often more time than you would have spent just writing the thing yourself. Now you've got the worst of both worlds — the thing's taking forever to get up and running, and you don't understand the internals of it well enough to pinpoint why or where. At least if I'm debugging my own code, I know what the original author was thinking when he wrote it (even if I occasionally curse his boneheadedness in writing it that way). If it was the exception rather than the rule that I found myself debugging code that I reused, I'd always default to that option — but I seem to spend just as much time debugging open source solutions as I do debugging my own code.
I know I'm not alone in this. The LinkedIn API website admonishes, for example: "We strongly suggest that you use one of the existing OAuth libraries rather than trying to "roll your own"." However, the OAuth library they recommend actually rolls its own HTTP connectivity code! Google Drive does the same thing - they implement their own
com.google.api.client.http.HttpTransport interface for OAuth 2.0 integration. The fear of "rolling your own" seems to be that there will be subtle gremlins lurking in the code that you won't uncover during testing that will only break horribly when you unleash it upon real users. Here's the thing. In twenty years as a professional developer... I've never had this happen to me. I've never even seen this happen. I've never seen somebody else's subtly incorrect implementation of an open standard cause problems. Don't get me wrong — I've seen lots of subtle, difficult-to-find problems in production code, but when I do, it's usually related to transaction handling, resource-sharing or thread management — things that code reuse doesn't particularly help you with.
Jeremy Kun, writing about would-be students of mathematics who don't want to do proofs offered this analogy: "I'm really interested in learning to code, but I don't plan to write any programs and I absolutely abhor tracing program execution. I just want to use applications that others have written, like Chrome and iTunes." He meant it as a joke, but this is really the cult of "always reuse" boiled down to its essentials.
I also believe that the "always reuse" problem is starting to exacerbate itself as writers of reusable software themselves reuse other software, creating layers upon layers of hard-to-understand dependencies. If X is built on top of Y, then it's not good enough to just be an expert in X — Y has its own idiosyncrasies, too, and you have to be familiar enough with them to interpret the error messages that it returns. If you know, for example, that
$(document).ready(). Of course, this makes perfect sense once you've seen it, but it's an extra layer of documentation and functionality you have to be aware of.
My advice? Reuse when you think it makes sense, but don't let anybody give you a hard time for writing your own code. But if you do — own up to it and fix it if it breaks.