Continuous, Incremental Improvement
Part of The Inner Chapters Unbook.
Originally part of podcast episode number thirty four].
Dedicated audio available from Podiobooks.
- Others have spoken about this at length
- Pragmatic Programmers
- XP, flattened cost of change
- Even Kernighan and Pike, No Broken Windows
- Seems obvious
- Also seems like their would be valid exceptions
- This is a pitfall, tantamount to hubris
- Software complexity will rise on a geometric curve
- Our first job is keeping that curve as close to linear as possible
- Continuous, incremental improvement is the most proven, low-risk way to do this
- First example
- Let's re-write some or all of our software
- 3 jobs back, decided to do just that
- Engineering agreed because we always want a blessed opportunity to improve design and implementation
- Management don't take customers into account
- Re-write took a lot longer than anyone imagined
- Marketing used it as an excuse to include all kinds of new features
- No one could ever admit to the customers that it was either going to be considerably later or with less features than originally promised
- Company essentially ended up standing still from a competitive standpoint for the better part of six months
- Second example
- We're going to change the world
- Last job, to a lesser extent during the bubble
- Founder did not already have the idea
- Wanted to disrupt existing telecommunications industry
- Knew how to do it, better compression, but didn't have a feasible scheme for the technology
- Became so fixated on the win big scenario, missed any number of real opportunities to start realizing revenue
- Everything always takes more time and effort than anyone predicts
- Predictions are inherently faulty
- Re-writes are worthwhile
- Take an incremental approach to re-writes, never attempt all at once
- Pick the best candidate component
- Highest risk yields highest reward
- Lowest risk protects other components
- Demands on situation, risk aversion
- Better than a whole sale re-write
- Shorter more predictable delivery, a la XP
- Learn more early and improve the entire re-writing effort as you go
- Never, ever lose sight of the bigger picture for one task or project
- Get the best people, give them the best equipment and put them in the best environment and they will succeed
- From Spolsky
- You do not need a moment of greatness fixation to be great
- Be open to what you already have, can already achieve and capitalize on it
- Predicting what will succeed is also tricky, diversify!
- If any startup looks too good to be true, it is
- Demos may be necessary but they are evil
- May "innovative" startups end up doing nothing but producing demos
- Turned down an opportunity after last job because they were in the demo rut
- At some point, you need to produce results
- The opportunity is less important than the people and the place
- Everything always takes more time and effort than anyone predicts
Many other people have spoken very well about continuous, incremental improvement, among them the Pragmatic Programmers, XP, Kent Beck when he talked about flattening the cost of change, I think a lot of the things that he originally lays out when he's talking about that phenomenon fall under the category of continuous, incremental improvement. I think even Kernighan and Pike, in The Practice of Programming when they talk about "no broken windows", that really is informed by or informs an idea of continuous, incremental improvement. And what I mean by that is to say that software systems get more complex over time. Pragmatic Programmers use the term bit rot to talk about this. And the only thing we can do is to bring continuous discipline to the table and improve our practices and get better and better and better at our job of programming, and invest, re-invest it all back into the code base to offset that bit rot.
The complexity curve that bit rot speaks to is a geometric curve, a doubling curve you can think of very simply, and there's an asymptote where it goes nearly vertical. And at that point, that's when projects fail, that's when code bases get so bloated, so overly complex, that they get scrapped and they get re-written from the ground up. I would argue that one of the priorities that we have as hackers, as professional programmers, is to use all of the tools available to us to keep the complexity curve somewhere between that geometric curve and a straight, linear, mathematic curve, somewhere in that space. The closer we can get it to a flat, linear curve, the better off the software is going to do over time. I don't think you're going to talk about a complete horizontal line when it comes to complexity because you just have to address over the time the input from your users that want more features, more sophistication, and they never want less. And you may be able to collapse and reduce, depending on what domain you're in, but ultimately you're still going to be driven toward a more complex solution at the end of the day.
This may all seem kind of obvious to you. What I really wanted to talk about more were to share some of my experiences, some of my anecdotal experiences with what might seem like valid exceptions at the outset, where you'd say, given the definition that I just gave of incremental improvement, given the people who have endorsed that and put together great works on practices and principles for effecting incremental improvement, or as like I said Kent Beck phrases it in eXtreme Programming, flattening the cost of change, there may be some scenarios where you look at them and go, "Well... That's all well and good, but I think this one time I can, I'm going to do something big. I'm going to make a huge change, going to try to effect a huge change, instead of trying to do things in well considered, well defined, well controlled steps". And I think that that thinking, that "Huh, this seems like a safe thing to do" is a trap, it's a pitfall. I think it's tantamount to hubris, and hopefully you'll agree with me as I get into this, why I say that, why I think it's a sin of pride, if you will. I think what that boils down to is, we fool ourselves. We think that we're — not necessarily consciously, but we think that we're better at doing something than we actually are. We think something is more controllable than it actually is. We all know that schedules — Whew! if you've worked on a real schedule, you know, with business drivers behind it and stakeholders looking at those dates very closely and holding your feet to the fire, schedules tend to inflate. They don't tend to get easier, and it takes a lot of work to pull them off effectively and to be repeatedly successful.
So, the first example I'm going to talk about is probably a little bit easier to approach and follows on very easily from those points. It was about three jobs back, and I was working for a software product shop that made... Well, they were an application service provider and they had a managed-service application. It was a web application backed with Java Enterprise edition, running on an application server with a big Oracle instance. It was a shared instance, subscription model, worked pretty well. It was started three or four years before I got there by some conceptual purists. It was a sourcing application for online procurement. And sourcing, if you don't know, is really just the first step in online procurement. It's the negotiation of who you're going to buy from, what quantity you're going to buy and what price, to put it in simplest terms. And the way that this company approached that space and approached that problem was with an auction engine. Unfortunately, their original domain model for representing commodities for procurement, like I said, was informed more from an academic or a purist's view where you would source similar items, you source like things with like. You would source pens and pencils together, but you would not source those with legal pads. And just saying that out loud, you realize how ridiculous that sounds in the real world. Who's going to be constrained by that? If you have an office manager that needs to buy all manner of supplies &emdash; whether they're staples, staplers, legal pads, white-out bottles, whatever &emdash; they're going to want to put everything together into one order, and they're going to try to want to source it from one person, or they're going to want to source it in whatever way makes sense to them. And they're not necessarily going to want to be constrained artificially. They want to be more flexible. When you get into discrete manufacturing like this company did, that model breaks down even further because you may want to package things together. You may want to package all of the parts to build a particular end product together and source them all together. You're not going to break them up and do the wheels for a lawnmower in one allotment and do the engines in another and the chassis in a third, and then the labor to manufacture them all and put them all together in a fourth. It just doesn't make sense. You want to make a package that's very natural to your business.
So, I guess the last year I was there the CTO, the head of the engineering group though, "Well, we should rewrite the auction engine to make it a better fit for how our customers tell they're using it". And being engineers we all thought, "Wow, this is great! We get a blessed opportunity to improve the design and implementation of the project. We get to rewrite code that we're not very happy with." Unfortunately I don't think management actually took the customers into account. This was at the outset nominally a six-month rewrite, and it — to my earlier point about hubris and sin of pride — took considerably longer. It took the better part of a year to get anywhere near even close to being done with the rewrite. Marketing kept using it as an excuse to include all kinds of new features. They'd say, "Well, as long as you're in there, and you're reworking this, you know, we have this whole RFI component that we built out, and you guys are doing that to integrate that more with the actual sourcing of physical goods as well as acquiring information from suppliers and partners. Oh gee, wouldn't it be nice if we had a spreadsheet-like question we could use as a component in an RFI or a mixed document? Gee, wouldn't it be nice if we had this, that, or the other?" And it just went on and on and on like that, so while we were struggling just to rewrite the core implementation and achieve a rewrite that was parity feature-wise, they just kept heaping more features, and the excuse they used was that the customers just won't stand for sitting still. That if we take six months, which was the original duration that we projected for this, and just had parity, that that would be no good. That wouldn't fly with our customers. We'd have to have something new on top of that.
Of course they were right. The real problem was you just, you can't take six months to do anything, and it was an abortion, it was a nightmare from a project-management perspective. The customers got very angry, and then no one in the management team could admit to the customers that things were going to run considerably longer than anticipated or that &mdash or were willing to make the decision to pull some of those features back out just to get the damned thing out the door. So the customer started to suspect that they were being lied to, that we were being — that as an organization we were being disingenuous to them, we weren't being entirely forthcoming, we were holding some things back. And that was true. You know, people sense these things. You really can't lie on a large scale to your customers. You can tell white lies about "Oh, it must have been a network hiccup why our server went down" while you scramble in the background to fix some transient problem that you know full well has to do about that. That kind of stuff is fine, to be expected. But at this level, when you're talking about the road map of the business, when you're talking about product releases for a business — they know. They can smell bullshit. So the company essentially ended up standing still for six months — longer than six months from a competitive standpoint in a very, very competitive space. Towards the end we had the SAPs and the Oracles moving into this space, looking to either buy up or build out their own sourcing applications that were comparable to what we had to offer. We originally in that space were market leader, and it just was a horrible, crucial stumble. There were some things that followed on from that that had nothing to do with my point about continuous incremental improvement and, by contrast, trying to do these big, massive rewrites. But I think that things — we wouldn't have been in such a bad spot if when it came down to the very end if we hadn't made that critical misstep, that mistake in judgment, of thinking that it was okay to do such a huge piece of work all at once, to do such a systemic rewrite all at once without thinking it through. And I'll talk a little bit about lessons learned after I talk about my second example here.
The second example was actually my last job, and this was a company that, when I was hired, promised they were going to change the world. Overnight! Revolution! Big words, huge promises. I think to a lesser extent, during the dot-com bubble we saw a lot of the same kind of behavior, a mentality. I should have learned my lesson there, should have steered clear of my last employer as a consequence, given how the whole "we're going to disintermediate the supply chains so Levi's jeans is going to sell their jeans direct to the consumer, and there won't be a need for outlets like JCPenney and Sears anymore, and these businesses will go under overnight if they don't transform!" All that new economy garbage, if you remember all of that from back in the late 90s. Same kind of stuff. Unfortunately here, at least during the bubble, arguably some people had some good ideas about how to effect transformations or how it would work, and some of their ideas — yeah, Amazon has proven out in the long run, and then online shopping has become part of how we conduct our daily lives. Here in the States at least with how big holiday sales are, online sales are a huge component of that, and you hear your local news expert talking about holiday sales for the season, they talk about online sales as a component of that. So there was some lasting change, but this — my last employer before the one I'm at now didn't really have much of an idea here. He knew what effect he wanted to achieve, but really didn't have anything concrete with which to make that change. He really just I think had a chip on his shoulder and wanted to disrupt the existing telecommunications industry who, I learned very much towards the end, that he had worked with as a consultant for years and years and years. So I think definitely a chip on the shoulder or a grudge is a fair characterization, based on my experience and based on what I understand of his personal and professional history.
They had, like I said — to be fair, I mean they had some notion of how to effect the change, to try to make you understand this a little bit better in terms of — I've talked about working in compression systems, so it's just to develop some radically new compression system. But they didn't have a feasible scheme that could actually implement it that would achieve the desired levels of compression. They would just say, "this is what we can do for you", and of course anybody would look at that and say, with any large n multiplier on your effective bandwidth, "Yeah, that's usually transformative". You know, we talked to a lot of telcos that were trying to stave off the cable industry, the — what are those? The cable MSOs (I'm trying to remember all the terminology I picked up there &mdash "RBOCs, Regional Bell Operating Companies, are definitely afraid of the cable MSOs, multi-service organizations") because the cable companies have invested. All the huge margins they've been making for years, in the last decade, they've been reinvesting in a fairly sophisticated, very high-bandwidth IP backbone. And now they're able to offer voice and data services alongside video services, and they're able to offer advanced video services. And so who wouldn't be scared if you thought that that was your purview and your purview alone as a long-haul carrier, as a data service provider, as a voice service provider. So they wanted to do video to be able to compete on the same footing. And you go in and you say, "Well we can get you a (whatever multiplier) on your effective bandwidth, and you have limitations, you have X amount of dark fiber in the ground, and you've got the attenuation limits of the copper loops that you're dependent on, at least here in the US with the physical infrastructure that we have here". Who wouldn't jump at that? Who wouldn't go "Yeah, absolutely! If you can deliver that, then we'll give you scads and scads of money"? That was the big "if", though.
The founder of this company, the leader of this group, became so fixated on this win-big scenario, however, that he just lost grasp, lost touch with how hard a problem it was we were trying to solve and missed out on any — in my opinion — on any number of real opportunities to start realizing revenue along the way, where we made — I've gotten to the point of the title of this piece — where we made incremental improvements beyond what other people were doing in the industry. We made small steps beyond the things that were maybe the same compression level but were faster. Or a slightly better compression level with different kinds of media. He wanted a general solution that would wow anybody that would be sold anywhere, instead of realizing there were real opportunities to go into post-production. That was one of my favorites, so all this digital post-production [unclear, 15:37] work: King Kong, Lord of the Rings, and the huge bandwidth and storage requirements they have. So you're talking about modest improvements over what they have today at no loss of quality. And that was hugely compelling to them. I saw their eyes light up when we went to talk to these people, and it was just whsssht! went over the head of my boss, just lost on him that there was a great opportunity to capitalize on here.
You could make real money, you could be solvent as a startup, and you can continue to fund the more long-range research if you still want to at the end of the day transform things, again. So, great counter-example about, you know, incremental improvements, about incremental opportunities. The little steps are not a bad thing, even on the business side of things.
So let me get into lessons learned. I mean, it's all well and good to share these experiences, these anecdotes, under the umbrella of incremental improvement. But let me try to tease out some relevant points here based on what I've shared. I think the point of the first experience is really to point out — I mentioned this repeatedly — this hubris, this misjudgment of our own capabilities, that everything in software always takes more time and effort than anyone predicts. If you've worked with any project manager that's at least halfway competent, you'll see that he pads all of his schedules. It's because he knows this, instinctively, that you can come up to an engineer and say, "In ideal circumstances it will take three days to complete this component". And he's going to look at you, and he's going to be thinking, "Okay, add two days in for the fact that you're going to get distracted for half a day on something, you'll get pulled into a sales call another day. I'm just going to pad it out". And he's going to be right. It's going to come out in the wash in the end, and it's going to come out more realistically to his schedule than your schedule because things always take more time.
Rewrites are worthwhile, to the point of my first story, that was the big thing that we misjudged, that we underestimated. I'm not saying otherwise. We're, at my current job, undertaking essentially a systemic rewrite in the entire system, and I'm leveraging my experience from the last one to say, "You know what? We'll leave all of our existing code in place, we'll write a new framework, that much we can write in isolation from the ground up for new code, and drive it for new code, and then we can slowly move pieces over. We can start in the data layer, and we can move this package and then the next package, and the next package until all the data services are on the new framework, and we get, you know, optimized throughput to the database, better and more consistent transaction and connection handling, and then we can look at the next layer up, and then we can look up at the next layer up, and we can do this step-by-step, we can grade everything out".
I think that makes a lot more sense. I think that, like I said, if you grade it out, then you can actually pick the candidates, you can pick your next moves based on where you sit, even from component to component, your risk aversion may flip-flop, it may change. Today, you may be highly risk-averse, so you're going to look at low-risk components to migrate over to a new architecture to rewrite. But you're compartmentalizing, even when you take those low-risk components, there's no risk to the things you're leaving in place. If you're less risk-averse, you look at the high-risk stuff because typically that's going to yield the highest reward. You go, "What's the flakiest, worst, you know, what's the most used component on our system that sees the most bugs?" not because it's buggiest — I'm sorry, I shouldn't have said 'worst' — but just because people beat the hell out of it constantly, day-in day-out. "Let's rewrite that, yeah!" Any bugs that we introduce in that process, any regression issues that pop up, they're going to be much more high-profile by definition, but man! If you can realize those improvements that come from a rewrite, you're going to get the most love from your users when you do that. When they see it's stabilized after that one component after the rewrite and they see how much faster it is, how much better, they see how much more quickly, you can layer in more new features. It's going to pay off!
My point is, though, you have to find ways to keep these chunks small, and you have to be realistic about, if they start to overinflate you might have to step back, you might have to put the rewrite aside and address more pressing concerns from a competitive standpoint and from a customer standpoint. I think this is overall, like I said, better than a wholesale rewrite.
The other advantages I've got here in my notes (that's why I'm stuttering a little bit) are that you get at short, shorter, much more predictable deliveries like in eXtreme Programming. You can think of your component by component graded-out rewrites as iterations. You can treat your rewrites the same way you would treat your new features. You're not going to pick a six-month iteration for new features in XP. You're going to pick three weeks, six weeks, something of that scale, something you can deliver quickly, you can deliver concretely, that's testable and usable out of the gate. You use that same mentality: why should we be rewriting the core of a system be any different. I think you also have a greater opportunity to learn more earlier on in the rewrite process and improve the entire effort of rewriting things as you go. If you do break it up and you use a much shorter cycle time, you do it component by component rather than trying to do the whole thing like in my first counter-example.
Last lesson on that first story about the sourcing application and rewriting the auction engine inside of that is: never, ever lose sight of the bigger picture for just one task or one project. Never lose sight of your competitive landscape. Never lose sight of your customers. Never lose sight of your business drivers because your lunch will get eaten if you do.
Lessons from my second example. I that think one of the things that I'd like to draw out of this is actually, I have to give credit to Joel Spolsky. He had a great essay where he talked about, he founded Fog Creek Software on the contention that if you get the best people, you give them the best tools and equipment, and you put them in the best environment, they can't help but succeed, and everything else is immaterial. What you actually do, what you actually build, what you actually deliver doesn't even matter. You get the best of the best of the best, and you're going to succeed just by definition.
Now, I think that's maybe a little simplistic, but not too much. I think, you know, you have to have some good management, you have to be, to my point in the second example of the compression R&D company, opportunistic about openings and opportunities that come up to take advantage of them. But to, I think, Spolsky's point and the character and the nature of his essay about "the best of the best of the best" is that you can't be locked in and say, "We're going to do X, and we're always going to do X forever and ever and amen". If your business and your people lead you to something else, you have to be receptive to that, and you have to be — well, maybe a little mercenary to say, "Well hey, look, here's another opportunity to be successful, and it's not what I planned, but who's going to say no to success? And we have a really great shot at this because it's a natural alignment with the people that we have, with the resources that we have, the center of gravity, thought leadership". You don't have to have a Moment of Greatness fixation to be great, you know. That's another point I'm trying to make here. And that's one of the things that just drove me crazy about my last boss, was that it always had to be "win big — we're going to change, we're going to transform, nah nah nah... And we're going to make the world different overnight". You don't have to do that to be successful. You don't have to play the lottery to plan for retirement for crying out loud. You put a little money away, you leverage compound interest, you diversify your investments, right? It's little things that you do: continuous, incremental improvement. See how it keeps circling back to that? Yeah, that's the good thing about having a theme, right?
A Moment of Greatness doesn't factor into that. That's not — you know, it's nice to have dreams, don't get me wrong. It's similar to my note about wholesale rewrites: you know, it's okay to entertain big thoughts. But that's not how you win, that's not how you succeed at the end of the day. You succeed by having the best, by doing the best — best planning, best implementation, best strategy, best tactics — regardless of what it is you're actually doing. Like I said, if it's not what you set out to do, who cares. I mean, if any start-up looks too good to be true, it probably is. Because that's how my last job seemed to me when I went in. It just seemed too übercool for words. And it was.
Another good indicator when you're looking at start-ups when you're talking about the appearance and first impressions of start-ups, because actually, after I left that job, and before I started the one I'm at now, I went to another start-up, another company, kind of an R&D, kind of a little bit out there. Maybe a little more realisable, building novel set-top systems, UIs and operating systems set-top boxes — think TiVo, but revolutionary, graphical intuitive, whatever whatever. Again, you know, revolutionary: their words, not mine. That kind of put my back hair up a little bit. They seemed to have good people, and I wish them luck if any of them are listening. I think they had good people; that was one thing. I have no idea how opportunistic their management was, but at the end of the day, the fact that the interviewer I was talking to kept talking about building demos, and the next demo, and that they were still early on in their funding, and they didn't have any real traction and proof of their ideas that the market would accept this so that they could compete on these kind of innovations. I found that off-putting. And so I decided to go elsewhere.
I think the upshot is the opportunity in terms of technology or what the start-up company is looking to accomplish is really far less important than the people and the place. If the people impress you and the environment impresses you, and the work is exciting regardless of what they're trying to accomplish, that's probably worth a lot more. I think the experience I had with my last employer, those are the positives that I took away. I got to work with some genuinely bright people, as frustrated as I was with the business management at that organization. I got to work on some interesting projects. I expanded my knowledge and my learning, and you know, I have to count my blessings when it comes to that: that things could have gone far worse before I decided to leave. They went, unfortunately, a little roughly for some of the people I worked with, just as the organization got very defensive when it became apparent they really lacked focus and didn't know what they were doing, but...
Anyway, hopefully you can learn something from these examples that I put forward, these counter-examples of organizations looking to buck what I think makes more sense across the board from a business management perspective, a technical management perspective, and even a technical implementation perspective: that it's the slow and steady that wins the race, as they say. You want to look for opportunities to improve what you already have through innovation, through consolidation of existing features. These Win-Big moments, these rewrites, these huge opportunities are, I think, a siren call to be very very careful. If they have some genuine value, some genuine merit, you have to find a way to approach them from that kind of continuous perspective that I talked about. Break them down. Make them work with the tactical approach of your organization. Make them fit into strategic pictures so that you don't introduce more risk than than you're willing to bear.