Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-design "*2vec" implementations #1777

Merged
merged 119 commits into from
Feb 1, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
119 commits
Select commit Hold shift + click to select a range
31943ae
first design draft
manneshiva Dec 11, 2017
d7209f4
adds public interfaces
manneshiva Dec 13, 2017
fe19b9a
adds VocabItem and cleans BaseKeyedVectors
manneshiva Dec 13, 2017
fece94f
adds explicit parameters
manneshiva Dec 13, 2017
e310dbf
implements `train` and adds `Callback` functionality
manneshiva Dec 14, 2017
30872ac
refactors `train`, adds classes for vocabulary building and trainable…
manneshiva Dec 18, 2017
2892f37
changes function parameters
manneshiva Dec 19, 2017
4b1e7f8
fixes minor errors
manneshiva Dec 19, 2017
68ac5bc
starts refactoring `Word2Vec` based on new design
manneshiva Dec 19, 2017
7f60a47
removes `build_vocab_from_freq`, corrects `reset_from`
manneshiva Dec 19, 2017
abc5702
changes attribute names
manneshiva Dec 19, 2017
b60a9d5
adds saving/loading from word2vec format
manneshiva Dec 19, 2017
ca1eae9
refactors/renames variables based on new design
manneshiva Dec 19, 2017
dab7b99
fixes **not** storing normalized vectors and recalculable tables
manneshiva Dec 19, 2017
d249668
replaces `syn0` with `vectors`, adds `estimate_memory`
manneshiva Dec 20, 2017
99cf2ad
fixes indents
manneshiva Dec 20, 2017
267c682
starts `FastText` refactoring based on new design
manneshiva Dec 20, 2017
c2bbb20
refactors to call coomon methods from `word2vec_utils`, removes depre…
manneshiva Dec 21, 2017
7d774d7
refactors `FastText`
manneshiva Dec 21, 2017
9b156f5
adds common methods in `word2vec_utils`
manneshiva Dec 21, 2017
0db83f1
refactors keyedvectors for FT & W2V by creating a common base class
manneshiva Dec 21, 2017
b761dff
creates a common base class for Word2Vec and FastText
manneshiva Dec 24, 2017
817f71b
deletes word2vec_utils.py
manneshiva Dec 24, 2017
75892cc
extracts logging to separate methods
manneshiva Dec 25, 2017
61c4e5e
corrects alpha decay, modifies `_get_thread_working_mem` to support d…
manneshiva Dec 26, 2017
707aef3
refactors doc2vec initialization and training
manneshiva Dec 26, 2017
e370314
minor fixes to support doc2vec
manneshiva Dec 26, 2017
45347f3
corrects parameter setting while calling `train`
manneshiva Dec 26, 2017
ab8dd4b
deletes `callbacks`, fixes alpha setting and degradation from `train`
manneshiva Dec 26, 2017
679e82f
adds post training methods and keyedvectors for docvecs
manneshiva Dec 26, 2017
1f488a5
extracts common methods as functions, discard unnecessary function call
manneshiva Dec 27, 2017
0f666f4
shifts adding null word from trainables to vocab class
manneshiva Dec 27, 2017
6a9171d
unifies variable naming
manneshiva Dec 27, 2017
1246d13
moves corpus_count from vocabulary to model attribute
manneshiva Dec 27, 2017
4bba589
refactors test cases and corrects failing cases
manneshiva Dec 27, 2017
26b9b06
removes old import
manneshiva Dec 27, 2017
a923e7e
fixes errors
manneshiva Dec 27, 2017
51df908
creates seperate class for callbacks, adds saving and loss capturing …
manneshiva Dec 27, 2017
bfae0e7
refactors poincare keyedvectors base and related changes
manneshiva Dec 28, 2017
9c261e0
extracts save/load_word2vec_format as functions to avoid code repitio…
manneshiva Dec 29, 2017
d8d22bd
removes model initialization to None
manneshiva Dec 29, 2017
8301b03
shifts cum_tables, make_cum_table & create_binary_tree from trainable…
manneshiva Jan 3, 2018
36e6a30
adds fasttext test cases
manneshiva Jan 3, 2018
ae60bd8
adds doc strings for public APIs for D2V, W2V & FT
manneshiva Jan 4, 2018
eddd24e
adds docstrings for keyedvectors
manneshiva Jan 4, 2018
f3d76cf
resolves failing test cases
manneshiva Jan 5, 2018
9367dc6
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
manneshiva Jan 5, 2018
a721aac
updates cython generated .c files
manneshiva Jan 5, 2018
65b8821
corrects error statement when failing to import FAST VERSION
manneshiva Jan 5, 2018
9f1103e
betters logging
manneshiva Jan 5, 2018
52d1e5f
deletes fasttext wrapper
manneshiva Jan 7, 2018
6941e1e
fixes PEP8 long lines error
manneshiva Jan 7, 2018
8574055
fixes non-any2vec failing test cases
manneshiva Jan 7, 2018
173a8e9
deletes testing pure python any2vec implementations from tox
manneshiva Jan 7, 2018
be73e0b
fixes test_similarities failing test cases
manneshiva Jan 7, 2018
0fc8340
fixes PEP8 errors
manneshiva Jan 8, 2018
673086d
fixes python3 failing test cases
manneshiva Jan 8, 2018
ce0dee9
renames syn0 to vectors in keras integration test
manneshiva Jan 8, 2018
f300088
fixes annoy notebook failure
manneshiva Jan 8, 2018
211c286
adds property aliases for backward compatibility
manneshiva Jan 8, 2018
b4700ed
adds properties and methods for backward compatibility
manneshiva Jan 8, 2018
142c8a6
removes trainables save
manneshiva Jan 8, 2018
74ce823
minor changes to test cases
manneshiva Jan 8, 2018
3281a73
shifts epoch saver callback to an example in docstring
manneshiva Jan 9, 2018
b1a7390
adds deleters for syn1 & syn1neg
manneshiva Jan 9, 2018
995d1cf
deprecates old KeyedVectors in favour of Word2VecKeyedVectors
manneshiva Jan 9, 2018
fc9e77f
reverts word2vec_pre_kv_py2 saved models to original
manneshiva Jan 10, 2018
9fba59f
adds deprecated models and dependent python files
manneshiva Jan 10, 2018
c9d9ec8
adds unit tests for loading old models
manneshiva Jan 10, 2018
883cb81
imports deprecated in model.__init__
manneshiva Jan 10, 2018
7fcc3a4
removes .wv.most_similar calls
manneshiva Jan 10, 2018
9001abe
adds code to support loading old models
manneshiva Jan 10, 2018
6c42905
adds cython auto generated .c files
manneshiva Jan 10, 2018
0bea623
fixes PEP8 failures & fetching attributes from pre_kv word2vec models
manneshiva Jan 10, 2018
09f9bdf
fixes num_ngram_vectors
manneshiva Jan 10, 2018
4b142a0
fixes estimate_memory, shifts BaseKeyedVectors to keyedvectors.py
manneshiva Jan 11, 2018
710c124
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
manneshiva Jan 11, 2018
6f1c522
Merge branch 'develop' into refactor_any2vec
manneshiva Jan 12, 2018
60db35d
fixes review comments -- typos, indents, adding deprecated. No design…
manneshiva Jan 12, 2018
922ae60
Merge branch 'refactor_any2vec' of https://github.com/manneshiva/gens…
manneshiva Jan 12, 2018
f3e2259
fixes PEP8
manneshiva Jan 12, 2018
0a76c2a
shifts *KeyedVectors to keyedvectors.py
manneshiva Jan 13, 2018
06e03ef
de-duplicates data between keyedvectors, vocabulary, trainables and r…
manneshiva Jan 15, 2018
9aa9b66
fixes failing cases
manneshiva Jan 15, 2018
cbffa32
removes unused vocabulary paramter from methods
manneshiva Jan 15, 2018
4caa3f4
removes base classes for vocabulary & trainables, cleans code
manneshiva Jan 16, 2018
31f9943
removes build_vocab from BaseAny2VecModel
manneshiva Jan 16, 2018
5650bab
fixes vector size for doc2vec
manneshiva Jan 21, 2018
da539e2
Fix typo in classname
menshikh-iv Jan 23, 2018
818439d
remove docs for fasttext wrapper
menshikh-iv Jan 23, 2018
54c9b2e
update docstrings for callback
menshikh-iv Jan 23, 2018
bb54290
Merge remote-tracking branch 'upstream/develop' into refactor_any2vec
menshikh-iv Jan 23, 2018
0d1c48c
Fix documentation build
menshikh-iv Jan 23, 2018
13f5ea9
light cleanup for docstrings
menshikh-iv Jan 25, 2018
8cc2bf6
renames private util_any2vec functions
manneshiva Jan 28, 2018
ac2d01f
adds deprecated warning for attributes
manneshiva Jan 28, 2018
0fae977
adds deprecated warnings.warn for old doc2vec parameters
manneshiva Jan 28, 2018
d58dc41
shifts any2vec callback under gensim/models
manneshiva Jan 28, 2018
2422994
adds pure python implementations
manneshiva Jan 28, 2018
401d46e
fixes PEP8 errors
manneshiva Jan 28, 2018
46b0b3a
changes build_vocab method signature
manneshiva Jan 28, 2018
902aed7
fixes vocabulary trimming error
manneshiva Jan 29, 2018
3562818
fixes long line
manneshiva Jan 29, 2018
83374be
removes deprecated/utils
manneshiva Jan 30, 2018
d8455fa
adds old_saveload to deprecated
manneshiva Jan 30, 2018
1f38dc7
removes unused import
manneshiva Jan 30, 2018
cd4e22d
returns fasttext wrapper
manneshiva Feb 1, 2018
fd2e697
adds alias iter setter
manneshiva Feb 1, 2018
02072b1
fixes fasttext load error
manneshiva Feb 1, 2018
114ab5f
ignores PEP8 unused import
manneshiva Feb 1, 2018
0179835
Return fasttext wrapper rst
menshikh-iv Feb 1, 2018
0601c69
Add rst for deprecated stuff
menshikh-iv Feb 1, 2018
572c960
Add all needed deprecations, upd *.rst.
menshikh-iv Feb 1, 2018
62b0852
add description for deprecated package
menshikh-iv Feb 1, 2018
e9ebaa8
add missing import + return env war to tox config
menshikh-iv Feb 1, 2018
d7cee63
drop useless import
menshikh-iv Feb 1, 2018
79c1263
adds num_ngrams_vectors property
manneshiva Feb 1, 2018
19f2ee5
reverts to calling old attributes in all tests
manneshiva Feb 1, 2018
7a32739
fixes PEP8
manneshiva Feb 1, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/src/apiref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,11 @@ Modules:
models/wrappers/wordrank
models/wrappers/varembed
models/wrappers/fasttext
models/deprecated/doc2vec
models/deprecated/fasttext
models/deprecated/word2vec
models/deprecated/keyedvectors
models/deprecated/fasttext_wrapper
similarities/docsim
similarities/index
sklearn_api/atmodel
Expand Down
9 changes: 9 additions & 0 deletions docs/src/models/deprecated/doc2vec.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
:mod:`models.deprecated.doc2vec` -- Deep learning with paragraph2vec
====================================================================

.. automodule:: gensim.models.deprecated.doc2vec
:synopsis: Deep learning with doc2vec
:members:
:inherited-members:
:undoc-members:
:show-inheritance:
10 changes: 10 additions & 0 deletions docs/src/models/deprecated/fasttext.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
:mod:`models.deprecated.fasttext` -- FastText model
===================================================

.. automodule:: gensim.models.deprecated.fasttext
:synopsis: FastText model
:members:
:inherited-members:
:special-members: __getitem__
:undoc-members:
:show-inheritance:
10 changes: 10 additions & 0 deletions docs/src/models/deprecated/fasttext_wrapper.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
:mod:`models.deprecated.fasttext_wrapper` -- Wrapper for Facebook implementation of FastText model
==================================================================================================

.. automodule:: gensim.models.deprecated.fasttext_wrapper
:synopsis: FastText model
:members:
:inherited-members:
:special-members: __getitem__
:undoc-members:
:show-inheritance:
9 changes: 9 additions & 0 deletions docs/src/models/deprecated/keyedvectors.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
:mod:`models.deprecated.keyedvectors` -- Store and query word vectors
=====================================================================

.. automodule:: gensim.models.deprecated.keyedvectors
:synopsis: Store and query word vectors
:members:
:inherited-members:
:undoc-members:
:show-inheritance:
9 changes: 9 additions & 0 deletions docs/src/models/deprecated/word2vec.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
:mod:`models.deprecated.word2vec` -- Deep learning with word2vec
================================================================

.. automodule:: gensim.models.deprecated.word2vec
:synopsis: Deep learning with word2vec
:members:
:inherited-members:
:undoc-members:
:show-inheritance:
6 changes: 3 additions & 3 deletions docs/src/models/wrappers/fasttext.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
:mod:`models.wrappers.fasttext` -- FastText Word Embeddings
===========================================================
:mod:`models.wrappers.fasttext` -- Wrapper for FastText implementation from Facebook
====================================================================================

.. automodule:: gensim.models.wrappers.fasttext
:synopsis: FastText Embeddings
:synopsis: FastText
:members:
:inherited-members:
:undoc-members:
Expand Down
1 change: 1 addition & 0 deletions gensim/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
from .translation_matrix import TranslationMatrix, BackMappingTranslationMatrix # noqa:F401

from . import wrappers # noqa:F401
from . import deprecated # noqa:F401

from gensim import interfaces, utils

Expand Down
Loading