The Future
Well, that was interesting. Whoosh got noticed in a few places, and a few people are playing with it. Now I need to work on three things.
- Documentation
- Finishing features
- Exploring performance
Documentation is what I'm going to be focusing on first, and right away. I think it's awesome that people are trying Whoosh out just based on docstrings and the GettingStarted page, but I really need to write a user guide, get down some design notes, and write about how to accomplish things in Whoosh. It would be cool to do a screencast or something too. But writing documentation is my day job; I need to find motivation to do it in my spare time too.
There are quite a few things that are coded and more-or-less-working-probably (e.g. highlighting search results excerpts) but not perfectly integrated, or wanting a better API. There are also some code I wrote just to get things to a releasable state that I need to revisit. That's what I'll probably work on after getting the documentation going.
Increasing performance is going to be tricky. Given the design parameters, I feel like I'm running up against the speed of the Python interpreter. Finding more performance may mean being extremely clever about doing less work, and exploring parallel processing. I can also try working with very large indexes and see what kind of tunable parameters might help with that.

rss
Comments
Hey Matt!
Thanks for putting all this quality code out there. I had been looking for a pure-python full-text indexing engine for a while (mostly due to reason 1 as detailed on the front page, but also a bit for reasons 2&3). Friday I finally dug up whoosh in a google search and I'm amazed at how well it seems to perform.
I noticed in the storage classes that there's a whole chunk of code commented out to store the index tables in an sqlLite database. Are you planning to include that back into the code eventually or is it some evolutionary dead-end?
Keep up the good work!
Ranieri