Speed improvements coming
The great perversion of trying to write high-performance code in Python is that it has almost nothing to do with being clever. It really comes down to writing your program in such as way that it touches as much C code as possible. Some things you would do for performance in another languag would be a disaster in Python. For example, it would be foolish to implement a perfect hash function in Python for better performance, because a Python implementation could never be as fast as the C implementation of the built-in dict class.
When I began implementing Whoosh, I quite shamelessly used Lucene as a template, and then elaborated on it, sometimes implementing things -- completely customizable posting formats, for example -- beyond what Lucene offers.
Lucene uses varints (variable-length integers) and deltas to compress postings lists, so instead of four bytes per number for a 32-bit integer, most numbers are stored in one byte (using deltas keeps the numbers small enough to fit in one byte). This not only saves space, but in low-level languages like C and Java, it's faster, because it helps the disk cache. My implementation of this method in Whoosh (along with the fact that I use zlib to compress stored document metadata by default) is one of the reasons Whoosh indexes are typically extremely small compared to those created by most search libraries.
The downside of using varints for Whoosh is that reading and writing varints is (obviously) not a built-in function of Python. So in some of the tightest loops in the search code, I'm using relatively slow Python constructs, like explicit for loops.
Recent tests I've run suggest using array.tofile() and array.fromfile() is about 200 times faster than writing and reading postings "manually" (that is, explicitly in Python). Which makes sense, because these array methods do both the looping and IO in C, where I'm currently doing it in the interpreter.
Of course the downside of using array is that it doesn't do any compression -- every number, no matter how small, takes up four bytes on disk. So you'd expect the on-disk size of a Whoosh index to get much larger, probably doubling. For my use-case (a medium-size online help system of about 6000 documents), I'd guess the index would probably go from approx. 4MB to somewhere around 8MB.
However, the uncompressed nature of on-disk arrays has a potential advantage: because the numbers are fixed-size, I could use offset file.seek()s to shortcut certain operations.
On balance, considering the price of disk space these days, 100-200x better searching performance probably justifies doubling the size of the index. Combined with ideas for chunked posting formats I've cribbed from the Google presentation that recently showed up on the search blogs, I think searching could potentially be much faster in a future version of Whoosh.

rss
Comments
This is really exciting, Matt. I definitely agree that that space trade off is worth it for any use case I might have.