Whoosh: a fast pure-Python search engine

Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.

Whoosh was created and is maintained by MattChaput. It was originally created for use in the online help system of Side Effects Software's 3D animation software Houdini. Side Effects Software Inc. graciously agreed to open-source the code.

Some of Whoosh's features include:

  • Pythonic API.
  • Pure-Python. No compilation or binary packages needed, no mysterious crashes.
  • Fielded indexing and search.
  • Fast indexing and retrieval -- faster than any other pure-Python, scoring, full-text search solution I know of.
  • Pluggable scoring algorithm (including BM25F), text analysis, storage, posting format, etc.
  • Powerful query language parsed by pyparsing.
  • Pure Python spell-checker (as far as I know, the only one).

Whoosh might be useful in the following circumstances:

  • Anywhere a pure-Python solution is desirable to avoid having to build/compile native libraries (or force users to build/compile them).
  • As a research platform (at least for programmers that find Python easier to read and work with than Java ;)
  • When an easy-to-use Pythonic interface is more important to you than raw speed.

Whoosh takes much inspiration (and sometimes translates code) from other open-source search engines. The fundamental design is similar to (but does not entirely work like and is not compatible with) Lucene, but uses KinoSearch's indexing algorithm, some scoring algorithms from Terrier, and the English morphological variation generator from Minion.

Documentation and support

See the latest documentation.

Join the Whoosh mailing list.

Installation

Source releases and any binaries can be downloaded from PyPI.

http://pypi.python.org/pypi/Whoosh

If you have setuptools installed, you can use easy_install to download and install Whoosh automatically:

$ easy_install Whoosh

You can check out the latest sources from the Subversion repository:

$ svn co http://svn.whoosh.ca/projects/whoosh/trunk/

News

Recent changes

[416] by matt on 02/05/10 11:32:32

Added unit tests.

[415] by matt on 02/03/10 18:45:12

Work on NUMERIC, DATETIME, and BOOLEAN field types. Changes instances of test_index to testindex.

[414] by matt on 02/02/10 15:57:33

Fixed docstring problem.

[413] by matt on 01/30/10 15:24:23

Added writing.AsyncWriter?. Removed obsolete call to FileIndex?.unlock(). Bumped version number.