Python’s multiprocessing module is the new hottness

The new multiprocessing module in Python 2.6 and 3.0 looks pretty cool. It gets around the whole, um, design for low performance where the dreaded Global Interpreter Lock (GIL) makes multithreading difficult, by making it easy to spawn python subprocesses, communicate with them, and share data. They even have a form of security on the [...]

InfiniteSortedObjectSequence – for large data sets in Python

What do you do when you have a data set too large to fit into RAM? You could just use disk directly, so instead of a dict you use a shelve or bsddb, however the problem with that is that you then have a performance hit as all operations are disk based. You could have [...]

MapReduce in 10 or so lines of Python

I’ve realized that I understand things best when I implement them myself, and I was recently reading Trevor Strohman’s dissertation, intriguied by TupleFlow, a kind of more elaborate and improved MapReduce, and was about to write my own toy impl of TupleFlow when I decided to simplify and just for fun write MapReduce in Python. [...]

