Part of The Inner Chapters Unbook.
Originally part of podcast episode number thirty-two.
Dedicated audio available from Podiobooks.
- Anecdote, talking through connection pool exhaustion
- Assumed panels could not re-enter server
- Convinced that it could happen
- Validated test results
- Led to a simple solution
- Different modes of thought
- Visual, verbal
- Visual, image
- Spoken, verbal
- Shared visual - white boarding
- Different modes sometimes help expose blind spots
- Value of sounding board
- Expert within ear shot
- One way communication
- Can also be a pitfall
- Too easy to get an answer
- Reduce tendency to figure it out on your own
- Intangible benefits
- Team cohesion
- Enhances humility
- Validate and verify before committing to a specification
I know, conversation seems like kind of an odd topic to talk about in the Inner Chapters. This series to date has been about practices and principles of the active software developer, both hacker and professional. But what put me in mind of this was a, an experience, well, two weeks ago now with a coworker we were talking through looking at a problem with the panel server. I've talked about this in the past. The company I work for makes physical access control systems. And those come in the form of control panels that hook up to a Wiegand, that is wireless readers and, um, key code push button readers, ah, to control security doors, front doors, lobby doors, that kind of thing--elevators, key card swipers, all that kind of good stuff.
What differentiates us is we have a central web server based offering where you can set up your schedules, rules, users, what have you for your control panels on the web server. That gets pushed down to your actual control panel through a variety of physical media, both wireless and wired. And there was a, uh, we started to make some changes to that. I talked about that when we talked about, uh, maintaining code, a rant about the quality of other people's code when maintaining it. That's what I was talking about, was this panel server. What handles the pushing down of these rules that I talked about, the users configured on the web server, down to the actual physical panels. But also processes events coming up from the panels, in terms of the panel's report when there has been a valid access to a particular door or reader, or when there has been an invalid or out of schedule access, or door's been forced, or request to exit has been triggered, something like that.
And we started to make some improvements on the panel server, improved its ability, improved its performance. But we've noticed in production over the last few weeks that occasionally, occasionally the panel server deadlocks on itself. So myself and the head of the embedded engineering group, this is a guy who writes the firmware and puts together the Linux distribution for the actual phsyical control panels--that's Trithemius by the way, that's one of my listeners/readers--were talking through these deadlocks and pool exhaustions and realizeing that in disccussing this that something that shouldn't be happening was happening . That the same panel was talking to the panel server more than once simultaneously.
Now looking through the Java code and being the subject matter expert on not just Java but in this case servlets because the panel server is predominantly a servlet; I looked, this just shouldn't be happening. We should be getting a broken pipe, we should be getting some sort of an i/o exception. The timings for processing work, processing a batch of upstream events just shouldn't be taking this long.
As we talked and we talked through it, we talked through the small simulation, the small case simulation that I had set up locally on my box. And he, it turns out through that conversation not only did he convince me that the impossible was happening and it wasn't too hard to convince me of that. Clearly something was going on here that shouldn't have been going on here. But we also started to work through what would make sense in terms of a fix to try to block this re-entrance on a panel-by-panel basis. You know, we have a unique identifier for every single panel as part of its request coming in and that would be something we could key of of and use something like a semaphore. Actually something very much like a semaphore.
As I've gone on to do the design work for that, I'm actually going to built a map of semaphores keyed on those unique identifiers for the panels. And this will give us the ability to not only block a secondary incoming request from the same panel so we don't start processing work for a transaction that is already in flight it will allow that original request that is now invalid at that point because the second request will have everything that is in the first request and may have extra data in it. We don't want to process anything more than once. Our contract to the panel is, you send us data, we only process it once, exactly once, we won't process the same event more than once. That can cause a problem. We have a security log that the end users can review to see what the panels are reporting back to the central server. And you really don't want to have the same set of entries show up more than once. It is very obvious, it's an exposure we can't live with. We can't have that happen.
So with this pool exhaustion and the deadlocks with these panels reentrying, thankfully at least these things would start to error out, the timing was such that these things would start to error back, and coincidentally we'd get a rollback on all of the relevant transactions. The load on the server would back off to the point where it would be handling single requests from the panels at a time, or we'd have to restart the server, which would result in the same thing happening. But we wanted something that would control that. We didn't want to have that behavior by coincidence, we'd have that intentionally. And also, something that would give us some visibility into when this was happening, and maybe allow us even to do some manual interaction to say, "Hey, we've got waiters on the semaphore. Let's kill all the waiters on the semaphore, let's kill the active requests that's at the head of that list of permits, and just clean everything up so the next request that comes in is a clean request that's going to have everything that all the prior requests had in it. We'll just let that run."
So we worked through that and came up with a very simple solution and realized, like I said, that a Java util concurrent semaphore would actually do the job quite nicely. Allowed us to synchronize on the acquisition of a permit, which would then... So if we configured the single permit then every other request that came in that tried to reenter on that original request would just block naturally because there wouldn't be enough permits for them to acquire. But it would also allow us to set up checkpoints for that original request as it's processing, as it came up for air in between sending emails, in between each event in a large batch to say "Hey, is there anybody that's blocking, that's waiting to acquire a permit off the same semaphore that applies for me? Yes, there is? Well okay, let me throw in a legal state exception, let me bubble all the way up to the top request, where we have our nice, clean rollback handling, log that this happened, and roll back nice and cleanly."
So the point of sharing this anecdote is, conversation had a lot of value, and talking through the original problem and looking at the simple test case that we set up, and then validating the simplest solution when I came across it, saying, "Hey, does this make sense? Based on what we had talked about, based on the behavior that we had pinned down in our conversations, does this solution make sense and seem applicable?" Now I think the value in these conversations, and conversations in general, are pretty broad and not to be understated.
Like I said at the front, it may seem this is kind of a silly topic to talk about in software programming and software development. But think about this: when you're working on code by yourself, even when you're just working on just the design or you're thinking about something yourself, you apply different modes to that, different modes of thought. When you're working on a problem by yourself, when you're working on code by yourself, you're thinking visually, maybe you're thinking verbally, you're thinking about the code visually in your head, you're thinking visually in terms of images, maybe like some sort of a UML diagram, or sort of a design diagram, "This block talks to that block", and you're thinking conceptually, I think, really at its core because there's no need for you to articulate what you're working on in any other way other than the raw concepts that you're using to craft your solution.
When you're collaborating, however, you're forced into different modes of thought. Like I said, you're forced to actually articulate what is the problem: you have to put that into words. Even if you get up to a whiteboard, you're going to put that up visually in a way, maybe differently than you would you're own head. Somebody's going to ask questions. "What does that block mean? What are its functions? What does it all represent?" You know, things that you take for granted. The point is that when you work collaboratively, that's going to come out: the implicit knowledge, the assumptions, like I said, the things you take for granted are going to come out when you're talking to somebody else, and they may not. You're going to develop blind spots I think when you're working alone, and you're using the modes of thought that you use when you're just working by yourself, you're just not going to be able to find those blind spots very well.
There's also value in that sounding board effect, that even if the other participant in the conversation isn't particularly active in the conversation — they're not asking a lot of keen questions, they're not making a lot of keen suggestions... and I'm not saying Trithemius doesn't have good questions and doesn't make good suggestions, not at all! I'm just saying that conversations differ. They're definitely two-way, and you're getting good input from the other participant. But sometimes even when you have a passive listener, that has value just in forcing you to articulate your thoughts differently. There's a related practice, or pattern of practice, I don't remember where I first read about this, and if this is something one of you recognize, please do write in and help me in the attribution of this notion of Expert Within Earshot, which is related to conversation, but it's one-way communication. It's ask a question and receive some knowledge, receive quite a bit more knowledge than is inherent in the question. I think that's valuable as well, and it's worthwhile to think about when we think about conversations, about knowledge sharing. Conversation is also a mechanism for knowledge sharing, but this can also be a pitfall in my experience, that sometimes when you know that there's an expert within earshot, it's just a little too easy to get an answer, and it may tend to over time reduce your drive to figure things out on your own. You just grab the local expert and say, "Hey! How do I do this?" instead of sitting down and reading the documentation, crafting a little experimental code and figuring it out. So, mmm, you know. That's one potential downside of communication and conversation as a technical tool.
There are also some, the last thing I'm going to talk about here, intangible benefits to conversation, especially to good conversation when you have a good personality fit in the team, that it can improve cohesion amongst team members, that when you feel like that anybody that you talk to on the team is going to understand what it is you're talking about, provide value in the act of talking, then that's a pretty good feeling, and that encourages you to go and collaborate and talk more with the people that you work with. I think that there's an opportunity to enhance humility, and I think that as programmers that's something that we need a lot of help with. It's very difficult, especially as you get more mature and more expert in your practice, to remain open-minded, to remain humble, to remember our limitations. You know, you master a particular technology, it's very easy to go off on a tear thinking, "I'm king of the world! I know this: backwards, forwards, frontways, sideways! I rule!" Conversations can be a good way for people to remind you very gently that there are still things that you need to learn and still things that you don't know, and that other people have good ideas as well, and you're not necessarily a source of all good in your environment.
Also, I think that in the documentation process, that there's an intangible benefit here. And I think this is one that I've had good experience realizing this benefit, and I'm going to encourage you to seek this out. If you have to write specifications, if you have to capture your design in an explicit way into a document, that — again, if you have a good team, if you have good collaborators, maybe even people maybe outside of your workplace — just to validate your design thinking, conversation can be a good tool to do that. So there are some things to think about there on the topic of conversation, on the practice of conversation. So hopefully, that makes some sense to you.