Ticket #25 (closed defect: fixed)

Opened 12 months ago

Last modified 11 months ago

IndexWriter.update_document leaves files open

Reported by: dcrosta Owned by: matt
Priority: major Milestone:
Component: Storage Version:
Keywords: Cc:

Description

IndexWriter?.update_document creates a Searcher (to delete the old version of the document before re-adding the new version), but fails to close the searcher. In a script I wrote to reindex many documents (approximately 400, vs. OS X's open files limit of 256), I ran into the following error:

Traceback (most recent call last):
  [redacted]
  File "build/bdist.macosx-10.5-i386/egg/whoosh/writing.py", line 202, in update_document
  File "build/bdist.macosx-10.5-i386/egg/whoosh/writing.py", line 141, in searcher
  File "build/bdist.macosx-10.5-i386/egg/whoosh/index.py", line 506, in searcher
  File "build/bdist.macosx-10.5-i386/egg/whoosh/searching.py", line 42, in __init__
  File "build/bdist.macosx-10.5-i386/egg/whoosh/index.py", line 494, in doc_reader
  File "build/bdist.macosx-10.5-i386/egg/whoosh/reading.py", line 55, in __init__
  File "build/bdist.macosx-10.5-i386/egg/whoosh/store.py", line 56, in open_table
  File "build/bdist.macosx-10.5-i386/egg/whoosh/store.py", line 125, in open_file
IOError: [Errno 24] Too many open files: '/Users/[redacted]/search/index/_MAIN_34.dcz'

Here's a patch to fix the bug:

--- writing.py.orig	2009-03-31 17:15:33.000000000 -0400
+++ writing.py	2009-03-31 17:02:44.000000000 -0400
@@ -202,6 +202,7 @@
         searcher = self.searcher()
         for name in unique_fields:
             self.delete_by_term(name, fields[name], searcher = searcher)
+        searcher.close()
         
         # Add the given fields
         self.add_document(**fields)

Change History

Changed 12 months ago by matt

  • status changed from new to assigned

Yikes! Thanks for the catch. It's also very inefficient to keep opening and closing searchers like that (I think I must have been in a hurry when I wrote that method ;). I should allow the user to pass in a searcher, or keep a persistent searcher in the writer object.

Changed 12 months ago by dcrosta

Here's an alternative patch which will fix this file leak wherever else it occurs:

Index: src/whoosh/searching.py
===================================================================
--- src/whoosh/searching.py	(revision 139)
+++ src/whoosh/searching.py	(working copy)
@@ -54,6 +54,9 @@
     
     def __iter__(self):
         return iter(self.term_reader)
+
+    def __del__(self):
+        self.close()
     
     def __contains__(self, term):
         return term in self.term_reader

Changed 11 months ago by matt

  • status changed from assigned to closed
  • resolution set to fixed

Finally got around to adding this in the latest release, thanks for the info!

Note: See TracTickets for help on using tickets.