Conversation

From TheCommandLineWiki

Jump to: navigation, search
This transcription is as of yet incomplete.

Part of The Inner Chapters Unbook.

Originally part of podcast episode number thirty-two.

Dedicated audio available from Podiobooks.

Contents

Original Notes

  • Anecdote, talking through connection pool exhaustion
    • Assumed panels could not re-enter server
    • Convinced that it could happen
    • Validated test results
    • Led to a simple solution
  • Different modes of thought
    • Alone
      • Visual, verbal
      • Visual, image
      • Conceptual
    • Collaborative
      • Spoken, verbal
      • Shared visual - white boarding
    • Different modes sometimes help expose blind spots
    • Value of sounding board
  • Expert within ear shot
    • One way communication
    • Can also be a pitfall
      • Too easy to get an answer
      • Reduce tendency to figure it out on your own
  • Intangible benefits
    • Team cohesion
    • Enhances humility
    • Validate and verify before committing to a specification

Transcript

I know, conversation seems like kind of an odd topic to talk about in the Inner Chapters. This series to date has been about practices and principles of the active software developer, both hacker and professional. But what put me in mind of this was a, an experience, well, two weeks ago now with a coworker we were talking through looking at a problem with the panel server. I've talked about this in the past. The company I work for makes physical access control systems. And those come in the form of control panels that hook up to a Wiegand, that is wireless readers and, um, key code push button readers, ah, to control security doors, front doors, lobby doors, that kind of thing--elevators, key card swipers, all that kind of good stuff.

What differentiates us is we have a central web server based offering where you can set up your schedules, rules, users, what have you for your control panels on the web server. That gets pushed down to your actual control panel through a variety of physical media, both wireless and wired. And there was a, uh, we started to make some changes to that. I talked about that when we talked about, uh, maintaining code, a rant about the quality of other people's code when maintaining it. That's what I was talking about, was this panel server. What handles the pushing down of these rules that I talked about, the users configured on the web server, down to the actual physical panels. But also processes events coming up from the panels, in terms of the panel's report when there has been a valid access to a particular door or reader, or when there has been an invalid or out of schedule access, or door's been forced, or request to exit has been triggered, something like that.

And we started to make some improvements on the panel server, improved its ability, improved its performance. But we've noticed in production over the last few weeks that occasionally, occasionally the panel server deadlocks on itself. So myself and the head of the embedded engineering group, this is a guy who writes the firmware and puts together the Linux distribution for the actual phsyical control panels--that's Trithemius by the way, that's one of my listeners/readers--were talking through these deadlocks and pool exhaustions and realizeing that in disccussing this that something that shouldn't be happening was happening . That the same panel was talking to the panel server more than once simultaneously.

03:29

Now looking through the Java code and being the subject matter expert on not just Java but in this case servlets because the panel server is predominantly a servlet; I looked, this just shouldn't be happening. We should be getting a broken pipe, we should be getting some sort of an i/o exception. The timings for processing work, processing a batch of upstream events just shouldn't be taking this long.

As we talked and we talked through it, we talked through the small simulation, the small case simulation that I had set up locally on my box. And he, it turns out through that conversation not only did he convince me that the impossible was happening and it wasn't too hard to convince me of that. Clearly something was going on here that shouldn't have been going on here. But we also started to work through what would make sense in terms of a fix to try to block this re-entrance on a panel by basis. You know, we have a unique identifier for every single panel as part of its request coming in and that would be something we could key of of and use something like a semaphore. Actually something very much like a semaphore.

As I've gone on to do the design work for that, I'm actually going to built a map of semaphores keyed on those unique identifiers for the panels. And this will give us the ability to not only block a secondary incoming request from the same panel so we don't start processing work for a transaction that is already in flight it will allow that original request that is now invalid at that point because the second request will have everything that is in the first request and may have extra data in it. We don't want to process anything more than once. Our contract to the panel is, you send us data, we only process it once, exactly once, we won't process the same event more than once. That can cause a problem. We have a security log that the end users can review to see what the panels are reporting back to the central server. And you really don't want to have the same set of entries show up more than once. It is very obvious, it's an exposure we can't live with. We can't have that happen.

05:45

Personal tools