Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix file-like closing bug from gensim.corpora.MmCorpus. Fix #1869 #1911

Merged
merged 12 commits into from
Feb 23, 2018

Conversation

sj29-innovate
Copy link
Contributor

No description provided.

Copy link
Contributor

@menshikh-iv menshikh-iv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, looks really scary at all, much complicated.
Is it possible to hide most of this stuff (maybe using @context_manager)?

with utils.file_or_filename(self.input) as lines:

# 'with' statement behaviour without closing the file object
mgr = (utils.file_or_filename(self.input))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless ()


# 'with' statement behaviour without closing the file object
mgr = (utils.file_or_filename(self.input))
exit = type(mgr).__exit__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exit is built-in function, don't re-define it

if not self.transposed:
self.num_docs, self.num_terms = self.num_terms, self.num_docs
break
except RuntimeError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why RuntimeError ?

mgr = (utils.file_or_filename(self.input))
exit = type(mgr).__exit__
value = type(mgr).__enter__(mgr)
exc = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be inverted (i.e. false by default and True if exception)?

Copy link
Contributor

@menshikh-iv menshikh-iv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, this looks significantly more readable, good work.

@@ -1328,10 +1328,23 @@ class MmReader(object):
This allows us to process corpora which are larger than the available RAM.

"""
@contextmanager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this function to utils.py (this isn't related with MmReader, this can be useful in other cases)

@@ -1328,10 +1328,23 @@ class MmReader(object):
This allows us to process corpora which are larger than the available RAM.

"""
@contextmanager
def open_file(self, input):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring in numpy-style please

exc = False
try:
yield mgr
except StandardError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the comment, what's a reason for exception here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need an exception here to handle any unhandled exceptions in the code nested in the with statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if handle any - why not Execption instead of StandartError ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm , ya my bad . I will use except Exception.

yield mgr
except StandardError:
exc = True
if not exit(mgr, *sys.exc_info()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly worried about exit, how it works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of the exception , 'with' statement has the ability to silence errors. Here trying to close the file with the exception information if possible , otherwise raise the exception.

Copy link
Contributor

@menshikh-iv menshikh-iv Feb 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this call __exit__ ? I'm worried because we have exit() function in python

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya , it is exit() , I will correct it as in the finally block. Thanks.

@@ -225,6 +225,14 @@ def test_serialize_compressed(self):
# MmCorpus needs file write with seek => doesn't support compressed output (only input)
pass

def test_closed_file_object(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add test for your context manager too (with file path (existent, non-existent), file-like object)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine if I test whether the context manager throws exception or not for various cases ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes

@@ -84,6 +85,48 @@ def test_decode_entities(self):
expected = u'It\x92s the Year of the Horse. YES VIN DIESEL \U0001f64c \U0001f4af'
self.assertEqual(utils.decode_htmlentities(body), expected)

def test_open_file_existent_file(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check that this doesn't raise an exception if something goes wrong - unittest fail the current test and will show needed information (here and everywhere). Sorry, apparently I misunderstood you in the previous question


self.assertFalse(exc)

def test_open_file_non_existent_file(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use something like Please use something like https://docs.python.org/2/library/unittest.html#unittest.TestCase.assertRaises (instead of manual wrapping to try/exc )

gensim/utils.py Outdated
except Exception:
# Handling any unhandled exceptions from the code nested in 'with' statement.
exc = True
if not type(mgr).__exit__(mgr, *sys.exc_info()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why type is needed here?

Copy link
Contributor Author

@sj29-innovate sj29-innovate Feb 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the so thread , " exit = type(mgr).exit is used because CPython has a history of caching special methods and this just makes it consistent ". I tried to implement based on that. mgr.exit() should also work just fine.

exc = False
try:
with utils.open_file(datapath('non_existent_file.txt')):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please iterate through this file too (not enough only open it), here and everywhere.
Simplest way looks like like

with utils.open_file("...") as infile:
    self.assertEqual(sum(1 for _ in infile), numer_of_lines_in_file)

gensim/utils.py Outdated
raise
# Try to introspect and silence errors.
finally:
if not exc and isinstance(input, string_types):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if exc=True and input is file-obj, what happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I think I missed checking for string type there. Will do that.

gensim/utils.py Outdated
except Exception:
# Handling any unhandled exceptions from the code nested in 'with' statement.
exc = True
if not mgr.__exit__(*sys.exc_info()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what return __exit__ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry , I didn't understand your query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X = mgr.__exit__(*sys.exc_info()) , what is X in this case?

Copy link
Contributor Author

@sj29-innovate sj29-innovate Feb 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x is True if the exception was handled gracefully by " exit() " , else False. In case of file object should we be raising the exception if it occurs ? because we cannot exit it.

@menshikh-iv
Copy link
Contributor

Ping @sj29-innovate, can you resolve merge-conflict please

@sj29-innovate sj29-innovate force-pushed the new-develop branch 2 times, most recently from 584f66d to 655fe39 Compare February 21, 2018 21:25
@menshikh-iv
Copy link
Contributor

@sj29-innovate you broken something in your code, tests didn't passed correctly, please have a look to Travis logs

@sj29-innovate
Copy link
Contributor Author

@menshikh-iv The test is running fine locally , can it be the cython support for in _mmreader that caused the test to break ?

@menshikh-iv
Copy link
Contributor

menshikh-iv commented Feb 22, 2018

@sj29-innovate what you did in the last commit? Re-build _mmcorpus.c with Cython?

Hm, I see new signatures, but not see updates in pyx file

@sj29-innovate
Copy link
Contributor Author

sj29-innovate commented Feb 22, 2018

@menshikh-iv Yes i rebuild it using cython , there is an update in .pyx , replace "utils.file_or_filename" with "utils.open_file"

@@ -65,7 +65,7 @@ cdef class MmReader(object):
"""
logger.info("initializing cython corpus reader from %s", input)
self.input, self.transposed = input, transposed
with utils.file_or_filename(self.input) as lines:
with utils.open_file(self.input) as lines:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@menshikh-iv Here is the change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sj29-innovate yes, I see, thanks

@menshikh-iv
Copy link
Contributor

Need to merge #1910 first, sorry for waiting @sj29-innovate

@sj29-innovate
Copy link
Contributor Author

@menshikh-iv Can you please suggest some other task ? . Thanks.

@menshikh-iv
Copy link
Contributor

@sj29-innovate help guys with #1901 (you can try to setup MacOSX build in distinct PR yourself)

@menshikh-iv menshikh-iv changed the title Fixes #1869 , Mmcorpus file object behaviour fixed Fix file-like closing bug from gensim.corpora.MmCorpus. Fix #1869 Feb 23, 2018
@menshikh-iv
Copy link
Contributor

Thank you @sj29-innovate 👍

@menshikh-iv menshikh-iv merged commit c1e6c65 into piskvorky:develop Feb 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants