Let there be Challah

We like the Challah from Trader Joes, and one of the advances of this century is that you can buy it almost any day of the week, whereas in the ole days it seemed to be a Friday/Saturday bread. We usually make french toast out of it on weekends or just plain toast during the week.

In a moment of inspiration I decided to dig into a recipe from Anne Willan’s Cook It Right and four or so hours later I had a loaf that turned out amazingly well.

Challah closeup

The finishes wasn’t quite perfect – I didn’t know how much egg wash to put on. I put it on fairly heavily and it didn’t end up even in spots – noone complained, but next time I’ll drench it more.

Challah and cookbook

The next day we had it for french toast and it was good, and there was still half a loaf left when we finished.

Big, fun, yummie and impressive. This will be repeated.

InfiniteSortedObjectSequence – for large data sets in Python

What do you do when you have a data set too large to fit into RAM?

You could just use disk directly, so instead of a dict you use a shelve or bsddb, however the problem with that is that you then have a performance hit as all operations are disk based.

You could have a specialization of such a data structure that tries to use RAM however it spills data to disk when necessary.

I’ve been playing with implementing MapReduce in Python, and for the case when the data set doesn’t fit into RAM you need to stream a potentially large, unordered sequence of tuples and then sort them.

The basic use case is that there is any number of Append() calls, and then when you’re done you want the data sorted.

I wrote InfiniteSortedObjectSequence which will be used in my Pythonic MapReduce code that works on large data sets (TBD), however it’s useful in isolation, thus this note.

The code is http://code.google.com/p/tropo/source/browse/trunk/Python/tr_mapreduce/infinite.py. You’ll also need pickle_io and mergesort from that same directory.

The basic way it works is:

infinite = InfiniteSortedObjectSequence()

infinite.Append(...) # call this many times

sorted = infinite.Sort()
for tuple in sorted:

Behind the scenes when it spills to disk it sorts the “run”, then writes it to a pickled file. There are options to compress the data when it’s written to disk, which unfortunately takes longer but uses less space.

To sort it does an in-memory sort if the data set is small enough, else it does a merge sort using an N-way merge sort.

This seems to work well for a data set that is larger than RAM but which fits on a hard drive. More hard core industry strength solutions would probably not be in python :) and would try to utilize more disks, control the number of files open during a merge, and maybe use async i/o.

Note that this class is for a specialized use case that is somewhat normal in some aspects of IR and text processing: first you gather the data, then you want it sorted — there’s no random access lookup, length call, etc, just Append() and Sort().

If you shop at Williams-Sonoma in Palo Alto, don’t forget Draegers in Menlo Park

I just rediscovered Draegers in Menlo Park. Informally the prices are just as high as Williams-Sonoma at the Stanford shopping center, however Draegers is open later (10pm nightly), seems to easily have more specialty food items than WS, and upstairs at Draegers they have a good selection of cooking items.

MapReduce in 10 or so lines of Python

I’ve realized that I understand things best when I implement them myself, and I was recently reading Trevor Strohman’s dissertation, intriguied by TupleFlow, a kind of more elaborate and improved MapReduce, and was about to write my own toy impl of TupleFlow when I decided to simplify and just for fun write MapReduce in Python.

The goal of here is for a simple and short implementation, and with comments stripped out we have:

def MrSimple(producer, mapper, reducer, consumer):
    stage1 = []
    for n, v in producer():
      for n2, v2 in mapper(n, v):
        stage1.append((n2, v2))
    for n2, vals in itertools.groupby(sorted(stage1), lambda x: x[0]):
      seconds = (second[1] for second in vals)
      for v2 in reducer(n2, seconds):
        consumer(n2, v2)


producer is a generator that yields a series of name, value pairs – in the classic term frequency counting case it would return file,contents pairs.

mapper takes in name,value pairs and generates a series of name2,value2 pairs. In the word freq case it would emit (term,’1′) pairs for every word in ‘value2′.

reducer is called with (name, values) and emits ‘value3′ that are associated with the name.

consumer is used to persist the results of reducer.

This MapReduce runs in three stages:

  1. Run producer and mapper.
  2. Sort the name,value pairs the mapper returned.
  3. Run the reducer and consumer.

A compile of implementation notes are:

  • Ideally I would use the builtin map() however I think that would complicate the code.
  • I do use itertools.groupby() which is very handy.

A real implementation would use multiple threads, multiple processes, and be able to process data sets larger than fit into memory.

The core code is in mr_simple.py and a demonstration driver is mr_simple_demo.py. I’ve recently started storing any personal projects that I’m not totally embarrassed by in code.google.com BTW.

To follow – a variation that can work on larger data sets.

For a similar stab at it see this.

Congratulations to Kieran Sherlock, First in Road Mens Cat 3 Individual Time Trial Rankings in California

Kieran has made a great move from a fast runner to a faster “push bike” racer and is ranked first in his very serious division.

Here’s a nice picture of him here.

WordPress xmlrpc.php considered continually dangerous

It seems that for years I’ve been upgrading wordpress, and usually a security bug in xmlrpc.php is mentioned.
The latest update, 2.3.3, has a typical line:

…a flaw was found in the XML-RPC implementation…

Besides upgrading whenever I notice the problem, my attempt an additional measure of safety is:

chmod 000 xmlrpc.php

Or maybe:

mv xmlrpc.php xmlrpc.BAD
touch xmlrpc.php
chmod 000 xmlrpc.php xmlrpc.BAD

I think ultimately the only hope is to run a blog on a hosted service. Manually performing upgrades gets old.

Funny comment in SFGate story

In this sad story about a man being beaten after a traffic accident in Oakland, I found the first comment

In that neighborhood, unless your car can’t run anymore, do not stop for anything. Call the police after you come to a safe environment. If your car is dead, come out shooting. If you don’t have a gun, which is unlikely for someone driving in that area, call your loved ones to say goodbye.

Ominous use of “unfortunate”

This is ominous:

Software giant says rejection of $45 billion offer is ‘unfortunate.’

Morning Glory Muffins

For years I used to go to Java Man in Hermosa Beach and regularly order their Morning Glory Muffins.

I’m pretty happy with the Morning Glory Muffin recipe from Earthbound Organic Farms and it produces muffins that look like this:

I’ve played around with the recipe and made some modifications:

  • I don’t use coconut
  • The amount of sugar can probably be slightly reduced, and I plan to try date sugar in the future.
  • I use cranberries instead of raisins. Trader Joes even has some nice orange flavored cranberries.
  • I used the indicated amount of flour, but I put in 1/2c of whole wheat
  • I usually add 1/4c of wheat germ for a super food boost
  • I plan to try reducing the amount of oil slightly as it seems kind of excessive
  • I usually don’t end up with all the carrots they want
  • I add sunflower seeds as that’s how I remember Java Man doing it.
  • The 8oz of pineapple is a handy amount as you can be convenient 8oz tins in the market.

The muffins are kind of amusing when done – they have such big, ah, heads, they’re so top heavy, that most tip over and can’t stand alone.

Don’t forget the monster, bakery/texas muffin pan – I bought mine at Williams Sonoma:

Bobby Fischer RIP

Unfortunately Bobby Fischer passed away recently :(

I think I like Mark Crowther’s writeup the best esp when he says:

Fischer’s greatness was the clarity, precision and beauty of his chess games, the battling uncompromising nature he took to every tournament and match he ever played and the sheer drama of his chess career. His personal demands and the way he raised the profile of chess led to improved conditions for a whole generation that followed him. He took on the Soviet Chess Machine virtually alone and won, at least over the board. It probably cost him everything else in his life. When I was younger he was my absolute hero.

It probably cost him everything else in his life” — I have a sinking feeling in my stomach when I read that.

  • Activity

  • Archives

  • Categories

  • Feeds

  • Meta