Tradeoffs of additional speed
As a followup to my last post, I've been thinking about how speeding up Whoosh using array.tofile() and fromfile() raises a conflict between two of my goals for Whoosh.
Whoosh is supposed to be a fast (for Python) search library. But I also envisioned it as a useful bit of source code to hobbyists and maybe even serious researches, who could take advantage of the dynamic nature of Python to do quick experiments with it.
Using the array methods will speed up Whoosh by quite a bit (at the macro level, not 200x -- I meant to put a ;) after that bit -- but quite a bit). But while it gets Whoosh closer to being as fast as Python can go, it will also warp the implementation into something that doesn't make any sense outside of the Python interpreter.
That is, it would be a horrible way to write a search library except for the fact that it's also the fastest way given the nature of the Python interpreter, where increasing the percentage of your program that touches C code is more important than being clever or "doing it right".
I still think the speed improvements are worth it, because above all else I'd like Whoosh to be of practical use, for example to projects like Trac and MoinMoin? that have search functions but can't require native libraries. And the best way to do that is to be fast. But I'll be sorry to remove some of the "clever" bits.

rss
Comments
One way around these constrants may be to embrace cython. The code will remain meaningful for experimentation, pure python projects would still have a working search engine, and those who can afford to compile the cython extensions will also get big speed improvements.