Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning when string is used as argument to infer_vector() #2347

Merged
merged 5 commits into from
Jan 23, 2019

Conversation

tobycheese
Copy link
Contributor

Another one that I just stumbled over.

infer_vector() in doc2vec.py requires a list of strings (words). If one does not pay attention and passes a string instead, this does not fail (as strings are lists of characters).

infer_vector() then calculates the vector based on the single characters in the string, which most of the time is not what one wants I think. Imho this merits a warning.

gensim/models/doc2vec.py Outdated Show resolved Hide resolved
@piskvorky
Copy link
Owner

piskvorky commented Jan 21, 2019

This is a welcome sanity-check, thanks!

@gojomo any reason the check wasn't previously included, do/did we have a reason to accept strings?

@menshikh-iv
Copy link
Contributor

@tobycheese thanks for PR, please

After, I'll merge your PR

@gojomo
Copy link
Collaborator

gojomo commented Jan 22, 2019

Only reason for no check previously was that parameter-name/doc-comment were thought clear enough, and perhaps if Python loose typing wants to let a string behave just like a list-of-one-character-strings, why should we complain? (It's not inconcievable that someone would be trying to create a model based on single-char tokens.) But certainly this bites people, and those who need single-char words can just force their strings to be lists, so some sort of warning or error is a good idea.

gensim/models/doc2vec.py Outdated Show resolved Hide resolved
@piskvorky
Copy link
Owner

Thanks @tobycheese ! A small step for code, one giant leap for user-friendliness :)

@piskvorky piskvorky merged commit 0cc0994 into piskvorky:develop Jan 23, 2019
@tobycheese
Copy link
Contributor Author

Thanks for commiting, userfriendlyness ftw!

@tobycheese tobycheese deleted the warning_for_infer_vector branch January 23, 2019 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants