Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lda transform prefomance #1154

Merged
merged 24 commits into from
Feb 20, 2017

Conversation

menshikh-iv
Copy link
Contributor

I founded one terrible performance issue connected with apply LDA model

Typical example

# *_PATH - path to file with model
# tokens - list with tokens

from gensim.corpora import Dictionary
from gensim.models import LdaModel

dct_model = Dictionary.load(DICT_PATH)
lda_model = LdaModel.load(LDA_PATH)

d2b_vector = dct_model.doc2bow(tokens)

print(lda_model[d2b_vector])

This triggers a chain of calls

__getitem__(self, bow, eps=None)

self.get_document_topics(bow, eps, self.minimum_phi_value, self.per_word_topics)

gamma, phis = self.inference([bow], collect_sstats=True)

collect_sstats=True initiates heavy computation for sstats, but sstats (phis) can't used if per_word_topics=False (Proof block)

I replaced True flag to per_word_topics for significant speedup get_document_topics for case per_word_topics=False (this effect is clearly visible if you use LdaModel with ~180 topics and 700k+ dictionary, i.e. huge topic matrix)

@tmylk
Copy link
Contributor

tmylk commented Feb 20, 2017

Thanks for the fix. The docstrings in __init__ and get_item need to be updated to include the per_word_topics parameter.

@tmylk tmylk merged commit e56fcbc into piskvorky:develop Feb 20, 2017
@menshikh-iv menshikh-iv deleted the fix-lda-transform-prefomance branch February 19, 2018 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants