For over six months, we've been toiling to create the latest version of Netsparker. It didn't start out as a six-month timeline - our development cycles are usually two to three months. But along the way, we seduced ourselves into adding a couple of unpredictably complex features and, before we realized what was happening, the holiday season was approaching.
Cautious about our announcement going unnoticed in all the pre-Christmas noise, we decided to postpone it until January 3rd, hoping to take advantage of a few days of much-needed rest and looking forward to starting the new year with a bang.
We're fanatical about product quality, so we go to enormous lengths to avoid pushing a release that contains any embarrassing bugs. In addition to unit testing, we exercise every build against a rig of more than 2000 functional tests that simulate real-world use cases. And then, in the final week before publication, the entire company (including all the non-tech staff), has the pleasure of dogfooding Netsparker until they're ready to scream.
Unfortunately, for Netsparker v2.1, this "final week" has lasted for more than a month. Time after time our hopes were dashed just minutes before pushing the button as we discovered yet another frustrating edge case. Every day, we awoke with the carried-forward depression and uncertainty of yesterday's latest failed attempt. And every evening we simply confirmed our own sense failing by carrying forward yet another set of unsolved clues.
We have to acknowledge that some of our problems were self-inflicted, mostly by trying to cram just a little too much into a development cycle and occasionally by working when what we most needed was sleep. But the real pain, and the most significant reason for our lost month, was caused by fleeting issues in our 3rd party technology stack that occurred only in the most unlikely combinations of circumstances.
Building a software product can be a joy, but when you factor in the unexpected effects of externals tools, life sometimes becomes unpredictable. We use 3rd party components for licensing, obfuscation and deployment (among many others) and we usually expect these to play nicely together. But, when you have a bug that only occurs on a licensed, obfuscated, release build, and only when installed on a particular platform in a particular culture (and then only intermittently), tracking it down can become a nightmare
There was no single moment of epiphany that ended our ordeal; it was resolved only after an arduous campaign against a seemingly endless list of unrelated issues. Here are some of the painful lessons we learned along the way:
1. A chain is only as strong as its weakest link
As developers, we are accustomed to using external libraries that solve problems outside of our domain of expertise. This is a great time-saver, but it can leave us dangerously exposed to the mistakes of others.
2. Every black box added to a project compounds its complexity
Debugging our own code is (usually) easy. Getting past bugs in other peoples' closed-source libraries is more of a challenge. But the real contest starts when you have several black boxes, each adding its own anomalies to the equation. In our case, we reached the point were every member of the team was able to offer unsubstantiated theories about what was going wrong, but nobody could prove a thing.
3. Due diligence goes way beyond product appraisal
When you use a 3rd party product as a core component in your application, you are at the mercy of the vendor. We know that we have made a poor decision in the choice of one of our components, not because the product is bad, but because the vendor's support sucks. Replacing this component is likely to be a top priority in the future, except for one important detail …
4. Quick and easy choices can have exorbitant switching costs
As much as we'd love to dump our nightmare component vendor, we're stuck with them for the long haul, because doing so would break compatibility across our entire user base. What started as an easy decision to buy a relatively insignificant component has turned into a major hindrance in our development process and one that adversely biases our strategy in so many ways.
5. If it ain't broke, don't upgrade it
In addition to our 3rd party component nightmare, we added to our own problems by upgrading not one, but four major platform / architecture components in the same development cycle. Needless to say, when we were knee deep in uncertainty, it was so easy to speculate about which of these changes might have been a contributory factor. Of course, every bad decision is so obvious with hindsight.
Despite the anguish of the last month, we eventually won through and Netsparker v2.1 earned a clean bill of health this morning.
During our interminable period of pre-release, team emotions occasionally ran high and our resolve was tested almost to breaking point, so it was inevitable that somebody would eventually throw in a wisecrack about Groundhog Day. But who could have predicted that, after more than 40 days of iterating the debug-fix-test loop, Netsparker would finally be deemed fit for purpose and released to the waiting world on February 2nd - the real Groundhog Day.
Happy Groundhog Day!