Flashbake

I just released the project on which I have been collaborating with Cory Doctorow into the wild. From the project page:

This project was inspired by Cory Doctorow asking me for suggestions on source control for his writing and personal information files. After a few rounds of correspondence, we realized that while source control was closer to what he needed than automated backup, it still wasn’t a perfect fit. A little bit of research into seamless version control reveals some interesting research for ideas close to what he described but no actual code or code for much lower level projects, like file systems for viewing existing source control repositories or capturing versions without comment.

Cory wanted the version to carry prompts, snapshots of where he was at the time an automated commit occurred and what he was thinking. I quickly sketched out a Python script to pull the contextual information he wanted and started hacking together a shell script to drive git, using the Python script’s output for the commit comment when a cron job invoked the shell wrapper.

I added my own idea to the project, borrowing from continuous integration build systems the idea of a quiet period. I could easily imagine Cory actively working on a story, saving continually and a commit happening mechanically in the midst of that writing being less useful than if the script could find a quiet time to commit. This enhancement prompted me to ditch my shell script wrapper and pull that logic all into Python.

flashbake really did start with that simple question. I initially recommended git because of its distributed nature. It is far easier to clone git than to do anything comparable with centralized source control. I realize there are some good arch implementations that are comparable and perhaps in some ways better than git. I now some who highly recommend bzr which is highly comparable and may even better a bit better than git.  However, git is popular so it is well support by tool makers and there is, of course, github, if you want to host a remote copy of your repository.

This project really is a collaboration. Cory had a very particular idea in mind. We corresponded daily to pin down how he wanted the end result to work. He was very open to suggestions on ways to accomplish his goals that would make my life as the code monkey easier. For his part, he was a patience and conscientious tester. He also was astoundingly encouraging when we hit some real puzzlers early on that were confounding us both. Cory is by no means a slouch on the technical front but his experience with source control of any kind was and is nil. This meant that any rough edges that showed through got spotted pretty quickly and sanded off so the resulting tool comes as close to “just working” as I know how to make it.

The astonishing thing, to me is that this is my first major undertaking in Python. I was learning as much about Python tools and idioms as I wrote it as I was about the problem Cory was trying to solve. I have been programming in other languages for most of my life and had even tinkered a bit with Python before this. Enough to know that I liked the philosophy of the language designers and that I am one of the hackers that likes Python, significant white space and all. I do though welcome input and suggestions from any Python veterans about how to make the code more conventional. I am already looking at git-python and feedparser to replace some of my no doubt naive code.

If you do not like Python, the sources are freely available and you are welcome to port to your language of choice. If you prefer a different source control system, you are also welcome to fork the code though I already had a suggestion of making that part of the code pluggable, too. Contact me on that point before you decided to fork.

I was really quite surprised at how far seamless version control is from being usable by typical users. There is some great research and code out there by academics and hackers alike. It all seems to assume a certain level of technical savviness, though.

As a hacker well versed in source control systems, I will admit that I found the idea of automatically committing a little odd. I am used to the workflow of finishing some code, then explicitly committing with a nice, clear remark on the unit of work just completed.

The more I think about it, though, there are tons of sources of contextual information regular users are constantly updating that could be useful to help them characterize automatic commits. How many of you use microblogging tools or social networks? How often are you sending messages or updating statuses that could easily be pulled by something like flashbake over the public internet?

This is also what drives my number one feature for the next release, plugins. The initial version of flashbake is still pretty customized to what Cory wanted in his commit messages. Those four pieces of information are already coming out of prototypical plugins in flashbake’s sources, ones I am using to hammer out the protocol for new plugins by other authors. I would much rather write a new Python module, drop it into a known or configurable location and update a .control file than keep hacking on the main sources or start forking flashbake for each user (though with git, there would be entirely possible, if not so manageable).

If you decide to give flashbake a try, understand it is only about three weeks old. I also have one user for whom I cannot break it. I am also currently working on unit tests (using Python’s unittest package) to help make sure that changes and enhancements don’t break existing functionality.

Stay tuned, as with my other bit of publicly available software, Laconica Tools, I will post updates and new versions here. I also intended to get flashbake set up somewhere where folks can pull their own sources from a public repository.

Technorati Tags:

3 Replies to “Flashbake”

  1. One way to find out good snapshot intervals would be to record a given directory in parallel multiple ways with different parameters into separate repos. It doesn’t look like that’s currently possible.

  2. Congratulations of half-rewriting the Perl Git module which comes with git just so you could try out python.

    Does ‘Attribution-Noncommercial-Share Alike’ really require seeking permission before forking? Not that I would, non-commercial is a killer there.

    1. There is an unofficial git module for python, I will be switching to that shortly. The emphasis isn’t on driving git, it is on building up the commit message automatically. Otherwise, why even bother automating git?

      Bear in mind the original target audience, someone savvy enough to know git and other source control systems exist but have never used them. I’d welcome pointers to seamless source control usable by anyone less technically adept than a programmer or a system administrator.

      The point of asking for contact wasn’t because of any license concern. Fork away if you are okay with the boiler plate conditions of the license. The point was that I already had given some thought to pluggable back ends for different version control systems. I’d prefer to get contributors to help build that out in the original code if possible but the non-commercial license condition restriction aside, the fact that I am doing anything other than all rights reserved should signal my willingness to allow forks and other customizations. I am open to re-considering the license choice at some future point.

Leave a Reply

Your email address will not be published. Required fields are marked *