Idiomatic Programming
From TheCommandLineWiki
Part of The Inner Chapters Unbook.
Originally part of episode 68.
Contents |
Notes
Transcript
45:55
I have mentioned repeatedly this idea of idiomatic programming, and it's about time, maybe past due time, that I talk about this in some more depth as a practice of programming. And an idiom, if you're not familiar with the term, is just a particular convention, a way of doing or saying something. When we talk about spoken languages, there's a very strong corollary between idioms in spoken languages and idioms in programming languages. In a spoken language, for instance, in a language like Spanish that doesn't have an abbreviated possessive, you would say "the book of the brother of John". You would say that out very, very literally, you know: el libro del hermano de Juan, right? Whereas in a much more pithy language that does have an abbreviated convention for possession, like in English, you would say "John's brother's book". Now, maybe that's not quite as clear, but that aside, the more idiomatically correct way of saying that phrase in English is "John's brother's book", is to use that idiomatic convention of shortened possession, that apostrophe s.
This allows for, arguably in certain situations, more dense encoding of information, both in spoken languages and in programming languages. The reader or listener can scan through rather than literally following. They can fill in gaps in the verbiage by that implicit knowledge in the idiom. I think this is very much true of programming languages as well. If you're familiar with the way an IO file handle is constructed in a particular language, you don't have to pay as much mind to the details. You can pretty much take them as read because there's an idiom, there's a common convention for doing that. You don't have to understand it or... Well, maybe you do initially, but once you've grokked it, you can treat the whole construction just of a piece and you don't necessarily have to understand every little piece of it.
We talked in the past about implicit knowledge as carrying over as part of domain expertise, in picking apart the rants on rewriting code rather than reading it. And like I said, I think implicit knowledge is tied up a lot in programming language idioms as well. Like I said, that there's a particular convention for doing IO, for doing network, for doing multiple threading. Now, the language may allow you literally, by contrast, to do all of those things a variety of different ways. I think Java is a good example. C++ is an even better example. Perl maybe is the best, that there's... in C++ and Perl there can be, depending on what you're doing, there can literally be dozens of ways of accomplishing something with the language. Now that being said you don't see all of those varieties out there in the wild. You actually see very strong conventions or idioms for the way people do them. And in some circumstances, those are so strong that it allows you to anticipate — like I said, it allows you to fill in some of the gaps, and it allows you to ignore some of the lower-level details because that's just the way it's done in that language.
I think that from my own perspective, the language I'm most familiar with, Java, is much more strongly an idiomatic language than anything else. If you read through the core Java library APIs, there's a lot of internal consistency, and those internal consistencies then form that idiomatic expression set. That when you go out and look at that source project, you'll actually see those same idioms at play in a totally independent third-party's. So the Sun engineers in that sense have done a good job of incorporating a strong sense of idiom, of convention into the core language design itself as expressed through the core libraries everybody has access to.
I think that idioms are part of the expressiveness of a language as well, that, as I've said, they represent the most common ways problems are solved in any given language. And then, as I said (I almost kind of — I sniped myself with this point) that those idioms, those conventions, can suggest ways to write new code that then is consistent with the existing code base, therefore much more readable, much more easy to understand, and also by extension, much easier to learn for the new programmer. A highly idiomatic language or a community, a set of users that adopt the same idioms over and over again can achieve that effect. Now, it's not to say... It occurs to me in talking about idioms that way that you might mistake idioms for design patterns. I think design patterns are much more language agnostic, and they're at a much higher level. So they address just general problems and general solutions. And then the burden is on us to translate those into the idioms of a particular language that we're working with.
50:55
And the other reason that I mentioned idioms, I talked about earlier in the news section about Luke Plant, the guy who was ranting about how, because he knew Python and Haskell he thought he was ruined as a C# programmer, and he just couldn't transition back and forth between them. And the problem very much, the example he uses there is very much a problem of idiom: that Python and Haskell support functional programming, they support being able to have a pointer to a piece of functional code that has no side effects, no global references, it's all pure local code. Java and C# don't have that idea. C# is sort of does with the delegate, but it's very clunky, and that's where he expresses some of his frustration of how hard it is to take that functional idiom and put it into a language that doesn't even have the engineering or architectural support for that idiom. So not all idioms are equal, not all things translate equally well. And I think, again, spoken languages provide a good example of that, a good metaphor for that: that there are many terms, like Schadenfreude in German that doesn't translate literally into English quite well. It's a lot more wordy to try to explain that conception in English than just that single word is in German. And that's very similar. You could... That's the exact pathology we see in Luke Plant's example, that a functional idiom in Python or Haskell — it's expressible in C#, don't get me wrong, but it's just very, very long-winded to do it that way, and it's arguably not natural to the language. That language just isn't constructed to do that.
There are many other examples of forcing idioms that I've seen, ones that go back quite a bit farther than any arguments about functional programming vs. OO vs. aspect vs. whatever, that C, with it's very powerful pre-processor and macro language, has been very much abused over and over and over again by people probably in a very similar situation as Luke Plant, who for whatever reasons didn't have the wherewithal to bring that sort of polyglot mentality, that multilingual mentality to C, coming from something else. So they reimplement COBOL through C macros. They reimplement BPL or whatever, JPL, ou know, on top of C so that they never have to see the C.
Now the problem with this is that, I think we talked about the law of leaky abstractions in the past. If not, go out to Spolsky On Software. That's actually one of recommended essays on my website, so if you go to Selected Essays in the sidebar of TheCommandLine.net, you'll find a link to "The Law of Leaky Abstractions". The problem is that that essay, "The Law of Leaky Abstractions", talks about library design — presumably with idiomatic and conventionally correct construction in the language of choice, that it's going to leak, that abstracting complexity and putting an umbrella over it to simplify for the 80% use is going to leak. And the burden is on us as experts in any given technology to understand the underlying implementation of a component, of a library, to deal with when those leaks occur.
Now if you go in and you force foreign language idioms on top of a native language, you're putting an abstraction on top of an already probably fairly heavy abstraction set. And that can introduce many more problems as a consequence — not to mention complexity, if you're trying to get down to the lowest level of implementation and understand an IO bug or a driver bug or a network bug. And you've got to go from somebody who has encoded their favorite language through C macros into C, and then all the way down into the lowest level. Obviously that's going to be a lot harder than if you just found some way of understanding the way of doing C in a conventional fashion, and just program directly in C. It's obviously much more confusing to the reader as a consequence. And I think any given language allows the reader to make assumptions about the idioms of that language, and if you're not taking full advantage of that when you're forcing an idiom from another language in, it just makes it that much harder on the reader. If they're not familiar with the language that you're referencing, the idioms that you've borrowed from somewhere else and transplanted in, they have to literally read through every line of code. And I think that it's hard enough to write readable code — I've talked about that fairly recently — without getting into these issues of: when someone doesn't understand multilingualism, if you will, and the forcing of idioms across languages and how to write conventional code in idiomatic code, you can just make it that much worse. If you have a good understanding of this, you understand what parts of code that you're reading are convention amongst the community or convention from the language developer, and idiom can be treated as written, you're just much better off if you have that understanding and you leverage that, versus trying to find some other way, trying to reinvent those very low-level abstractions, if you will, if you want to think of idioms as abstractions.
56:01
Now that being the case, how do you learn to program idiomatically? I think it really boils down to the language at hand, that you need to learn the idioms for a particular language because, trust me, they're going to be very different from language to language to language. I've had a lot of first-hand experience with this. And it can be as simple as just the way you typically write a 'for' loop. There's the way the language actually supports that, but there's also how developers most commonly do that. I think that community is probably your first bet, is that you need to start networking with people who are using that technology: mailing lists, user forums, go in to the documentation. Any code samples provided by the original authors, provided by an acceptable expert, should evince some good idiomatic usage there. Any good book on the language, some of the better books like Bruce Eckel's book on Java. It's one of the reasons that I highly recommend it, is that he teaches idiomatic Java. And he picks apart some of the idioms. He doesn't call it such, but he does pick apart some of the idioms at a lower level. But then he leverages — once you kind of understand how the idiom is constructed, like how you use decoration for various IO behaviors in Java, he doesn't belabor it. You move on to the next thing. You take it as written. That's the whole point: an idiom.
So documentation both in technical books and documentation on source code and projects. And then open source represents a rather unique opportunity to learn idioms: that there are many open-source projects based on using, specifically for just about every language out there. And this may unfortunately introduce particular dialects, so you do have to be aware of that, that the Jakarta projects for Java, per se: there's some things that they do subtly differently from Sun engineers and some of the commercial Java projects. So you have to take some of the open-source stuff, at least in my experience, with a slight grain of salt. But otherwise, you've got a lot of the advantages of community and documentation along with a live and active code base that you can actually look at and read that code first hand. So I think open source is a great, like I said, rather unique opportunity to learn idiomatic programming in your language of choice.
So that's something for you to think about. We'll talk about intentional programming some other time. I realize that it really, the two ideas don't overlap, they complement each other, but there was no natural lead-in to talk about intentional programming. But part of what I hope you take away from me talking about idiomatic programming, whether you agree with my observations entirely or not — and especially if not, I want to hear about it, I want to hear about your contrasting experiences and opinions, I always welcome those — is to think about programming as an exercise in communication. Because that's going to be very critical to any discussion of intentional programming, which, like I said, we'll save for another Inner Chapter.

