-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix file-like closing bug from gensim.corpora.MmCorpus
. Fix #1869
#1911
Conversation
…into develop Conflicts: gensim/summarization/bm25.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, looks really scary at all, much complicated.
Is it possible to hide most of this stuff (maybe using @context_manager
)?
gensim/matutils.py
Outdated
with utils.file_or_filename(self.input) as lines: | ||
|
||
# 'with' statement behaviour without closing the file object | ||
mgr = (utils.file_or_filename(self.input)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useless ()
gensim/matutils.py
Outdated
|
||
# 'with' statement behaviour without closing the file object | ||
mgr = (utils.file_or_filename(self.input)) | ||
exit = type(mgr).__exit__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exit
is built-in function, don't re-define it
gensim/matutils.py
Outdated
if not self.transposed: | ||
self.num_docs, self.num_terms = self.num_terms, self.num_docs | ||
break | ||
except RuntimeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why RuntimeError
?
gensim/matutils.py
Outdated
mgr = (utils.file_or_filename(self.input)) | ||
exit = type(mgr).__exit__ | ||
value = type(mgr).__enter__(mgr) | ||
exc = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be inverted (i.e. false by default and True if exception)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, this looks significantly more readable, good work.
gensim/matutils.py
Outdated
@@ -1328,10 +1328,23 @@ class MmReader(object): | |||
This allows us to process corpora which are larger than the available RAM. | |||
|
|||
""" | |||
@contextmanager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this function to utils.py
(this isn't related with MmReader, this can be useful in other cases)
gensim/matutils.py
Outdated
@@ -1328,10 +1328,23 @@ class MmReader(object): | |||
This allows us to process corpora which are larger than the available RAM. | |||
|
|||
""" | |||
@contextmanager | |||
def open_file(self, input): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstring in numpy-style please
gensim/matutils.py
Outdated
exc = False | ||
try: | ||
yield mgr | ||
except StandardError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add the comment, what's a reason for exception here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need an exception here to handle any unhandled exceptions in the code nested in the with statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if handle any - why not Execption
instead of StandartError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm , ya my bad . I will use except Exception.
gensim/matutils.py
Outdated
yield mgr | ||
except StandardError: | ||
exc = True | ||
if not exit(mgr, *sys.exc_info()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly worried about exit
, how it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of the exception , 'with' statement has the ability to silence errors. Here trying to close the file with the exception information if possible , otherwise raise the exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this call __exit__
? I'm worried because we have exit()
function in python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya , it is exit() , I will correct it as in the finally block. Thanks.
@@ -225,6 +225,14 @@ def test_serialize_compressed(self): | |||
# MmCorpus needs file write with seek => doesn't support compressed output (only input) | |||
pass | |||
|
|||
def test_closed_file_object(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add test for your context manager too (with file path (existent, non-existent), file-like object)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fine if I test whether the context manager throws exception or not for various cases ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think yes
@@ -84,6 +85,48 @@ def test_decode_entities(self): | |||
expected = u'It\x92s the Year of the Horse. YES VIN DIESEL \U0001f64c \U0001f4af' | |||
self.assertEqual(utils.decode_htmlentities(body), expected) | |||
|
|||
def test_open_file_existent_file(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to check that this doesn't raise an exception if something goes wrong - unittest fail the current test and will show needed information (here and everywhere). Sorry, apparently I misunderstood you in the previous question
gensim/test/test_utils.py
Outdated
|
||
self.assertFalse(exc) | ||
|
||
def test_open_file_non_existent_file(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use something like Please use something like https://docs.python.org/2/library/unittest.html#unittest.TestCase.assertRaises (instead of manual wrapping to try/exc )
gensim/utils.py
Outdated
except Exception: | ||
# Handling any unhandled exceptions from the code nested in 'with' statement. | ||
exc = True | ||
if not type(mgr).__exit__(mgr, *sys.exc_info()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why type
is needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the so thread , " exit = type(mgr).exit is used because CPython has a history of caching special methods and this just makes it consistent ". I tried to implement based on that. mgr.exit() should also work just fine.
gensim/test/test_utils.py
Outdated
exc = False | ||
try: | ||
with utils.open_file(datapath('non_existent_file.txt')): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please iterate through this file too (not enough only open it), here and everywhere.
Simplest way looks like like
with utils.open_file("...") as infile:
self.assertEqual(sum(1 for _ in infile), numer_of_lines_in_file)
gensim/utils.py
Outdated
raise | ||
# Try to introspect and silence errors. | ||
finally: | ||
if not exc and isinstance(input, string_types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if exc=True
and input is file-obj
, what happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I think I missed checking for string type there. Will do that.
gensim/utils.py
Outdated
except Exception: | ||
# Handling any unhandled exceptions from the code nested in 'with' statement. | ||
exc = True | ||
if not mgr.__exit__(*sys.exc_info()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what return __exit__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry , I didn't understand your query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X = mgr.__exit__(*sys.exc_info())
, what is X
in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x is True if the exception was handled gracefully by " exit() " , else False. In case of file object should we be raising the exception if it occurs ? because we cannot exit it.
Ping @sj29-innovate, can you resolve merge-conflict please |
584f66d
to
655fe39
Compare
655fe39
to
b5db34c
Compare
f8c26d2
to
644f97a
Compare
4b44735
to
6cdc488
Compare
@sj29-innovate you broken something in your code, tests didn't passed correctly, please have a look to Travis logs |
@menshikh-iv The test is running fine locally , can it be the cython support for in _mmreader that caused the test to break ? |
@sj29-innovate what you did in the last commit? Re-build Hm, I see new signatures, but not see updates in pyx file |
@menshikh-iv Yes i rebuild it using cython , there is an update in .pyx , replace "utils.file_or_filename" with "utils.open_file" |
@@ -65,7 +65,7 @@ cdef class MmReader(object): | |||
""" | |||
logger.info("initializing cython corpus reader from %s", input) | |||
self.input, self.transposed = input, transposed | |||
with utils.file_or_filename(self.input) as lines: | |||
with utils.open_file(self.input) as lines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@menshikh-iv Here is the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sj29-innovate yes, I see, thanks
Need to merge #1910 first, sorry for waiting @sj29-innovate |
@menshikh-iv Can you please suggest some other task ? . Thanks. |
@sj29-innovate help guys with #1901 (you can try to setup MacOSX build in distinct PR yourself) |
# Conflicts: # gensim/corpora/_mmreader.c
gensim.corpora.MmCorpus
. Fix #1869
Thank you @sj29-innovate 👍 |
No description provided.