Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow indexing with np.int64 in doc2vec - #1231 #1254

Merged
merged 4 commits into from
May 2, 2017

Conversation

bogdanteleaga
Copy link
Contributor

This PR attempts to fix #1231.
It also includes a drive-by fix for a warning that was thrown during tests that asked the assertEquals to be changed into an assertEqual.

As far as the fix itself goes, I am not completely sure this is sane and am willing to discard the PR. Indexing with np.int32 works fine with the previous code and if indexing with np.int64 is going to be allowed here, it might as well be allowed everywhere else, which is a way bigger change.

However, this does fix the original issue as it has been presented.

@tmylk
Copy link
Contributor

tmylk commented Apr 3, 2017

Agree that it should be changed throughout doc2vec.
Some of the doc2vec code uses isinstance(x, integer_types) which includes int64 . So suggest to change all isinstance(x, int) in doc2vec to this check. It is more concise than proposed isinstance(x, (int, integer))

@bogdanteleaga
Copy link
Contributor Author

I meant across the codebase, but after a quick search it seems this is the only place where this actually happens.
There's two other spots that do this: one uses numbers.Integral and np.integer, the other uses int and np.integer.
Now, for me numbers.Integral actually covers all 3 cases, while just integer_types fails for numpy(see below). I think numbers.Integral should be safe considering this.

Does the CI run the tests on multiple python versions?

>>> l=[42, np.int32(42), np.int64(42)]
>>> [isinstance(x, numbers.Integral) for x in l]
[True, True, True]
>>> [isinstance(x, six.integer_types) for x in l]
[True, False, False]

@bogdanteleaga
Copy link
Contributor Author

Okay, did some more testing and the last commit makes the most sense I think. integer_types takes care of regular python stuff and integer takes care of numpy integers. It is a bit ugly but it is the only one that was consistent. numbers.Integral was still failing for some numpy types on python2.7.

@tmylk
Copy link
Contributor

tmylk commented Apr 4, 2017

Thanks for the changes across the broader codebase. New tests of int64 are needed for every changed class.

The test failure in CI in wiki_corpus is a known flake, please ignore.

@bogdanteleaga
Copy link
Contributor Author

Well, at this point I guess it's not just np.int64 anymore, it's other numpy integer types as well. I'll take a look later this week.

@tmylk
Copy link
Contributor

tmylk commented Apr 10, 2017

@bogdanteleaga Are you interested in adding tests to this PR?

@tmylk tmylk added difficulty easy Easy issue: required small fix testing Issue related with testing (code, documentation, etc) labels Apr 10, 2017
@bogdanteleaga
Copy link
Contributor Author

Sorry, it's been a really busy week and this one's not looking that great either. But I should be able to do it at the end of this one/start of next one. If anybody else wants to take it up faster than that that's ok as well.

@bogdanteleaga
Copy link
Contributor Author

Added 2 simple tests for indexed corpus and word2vec. Not really sure how to test lda/at. Also I'm not completely sure on how to test other types(if that's wanted). I could yield all the different types and iterate over them, but it could make tests take longer. Let me know what you think.

@tmylk
Copy link
Contributor

tmylk commented May 2, 2017

Thanks for the fix.

I am not sure what is the point of having and not isinstance(doc[0][0], six.integer_types + (np.integer,)): in LDAModel is either.

@tmylk tmylk merged commit 50a18b9 into piskvorky:develop May 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty easy Easy issue: required small fix testing Issue related with testing (code, documentation, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DocvecsArray getitem int64 bug
2 participants