Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author-topic model #893

Merged
merged 103 commits into from
Jan 17, 2017
Merged

Commits on Sep 25, 2016

  1. Configuration menu
    Copy the full SHA
    2e8f3cb View commit details
    Browse the repository at this point in the history
  2. Fixed some errors.

    olavurmortensen committed Sep 25, 2016
    Configuration menu
    Copy the full SHA
    a21059e View commit details
    Browse the repository at this point in the history

Commits on Sep 27, 2016

  1. Configuration menu
    Copy the full SHA
    a9bddaa View commit details
    Browse the repository at this point in the history

Commits on Sep 29, 2016

  1. Using max change instead of mean change criterion. Computing a differ…

    …ent likelihood measure. OnlineAtVb now extends (inherits) LdaModel. Other minor changes.
    olavurmortensen committed Sep 29, 2016
    Configuration menu
    Copy the full SHA
    7ea76f2 View commit details
    Browse the repository at this point in the history

Commits on Sep 30, 2016

  1. Configuration menu
    Copy the full SHA
    839a8b3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    bd13c60 View commit details
    Browse the repository at this point in the history

Commits on Oct 9, 2016

  1. Fixed log normalization. Also changed symmetric initilization of hype…

    …rparameters. Updated notebook accordingly.
    olavurmortensen committed Oct 9, 2016
    Configuration menu
    Copy the full SHA
    ebc808c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    16b26f7 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    10d2b36 View commit details
    Browse the repository at this point in the history

Commits on Oct 10, 2016

  1. Configuration menu
    Copy the full SHA
    c94f516 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a1d758f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    46cc8bf View commit details
    Browse the repository at this point in the history

Commits on Oct 11, 2016

  1. Configuration menu
    Copy the full SHA
    3e53655 View commit details
    Browse the repository at this point in the history

Commits on Oct 12, 2016

  1. Changed the way the data structure is prepared and how the model acce…

    …pts it. Still work to be done in that area.
    olavurmortensen committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    09666c4 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a562fca View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    0de43a5 View commit details
    Browse the repository at this point in the history

Commits on Oct 13, 2016

  1. Ran some very successful experiments on 286 documents. Offline algori…

    …thm works. Updated notebook.
    olavurmortensen committed Oct 13, 2016
    Configuration menu
    Copy the full SHA
    a892564 View commit details
    Browse the repository at this point in the history

Commits on Oct 14, 2016

  1. Changed the online algorithm according to all the changes that have h…

    …appened to the offline lately.
    olavurmortensen committed Oct 14, 2016
    Configuration menu
    Copy the full SHA
    388a5e9 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2b2a896 View commit details
    Browse the repository at this point in the history
  3. Fixed lambda update, multiplication by size of corpus was missing. Re…

    …moved author_prior_prob from mu update.
    olavurmortensen committed Oct 14, 2016
    Configuration menu
    Copy the full SHA
    3756435 View commit details
    Browse the repository at this point in the history
  4. Added a loop for passing over entire corpus. Discarded use of log_nor…

    …malize. Various other changes.
    olavurmortensen committed Oct 14, 2016
    Configuration menu
    Copy the full SHA
    ed3416d View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    994f212 View commit details
    Browse the repository at this point in the history
  6. Updated notebook.

    olavurmortensen committed Oct 14, 2016
    Configuration menu
    Copy the full SHA
    956fbd5 View commit details
    Browse the repository at this point in the history
  7. Computing rho in a different way. Added the possibility to evaluate o…

    …nly occasionally. Updated notebook.
    olavurmortensen committed Oct 14, 2016
    Configuration menu
    Copy the full SHA
    40bbabf View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2016

  1. Implemented hyperparam MLE for eta and alpha in offline algo. Removed…

    … use of log_normalize in offline algo. Update notebook.
    olavurmortensen committed Oct 17, 2016
    Configuration menu
    Copy the full SHA
    ed96b23 View commit details
    Browse the repository at this point in the history
  2. Made it possible to sample a subset of documents in lambda update to …

    …speed up large experiments. Made it possible to initialize the model with LDA topics (lambda).
    olavurmortensen committed Oct 17, 2016
    Configuration menu
    Copy the full SHA
    a225399 View commit details
    Browse the repository at this point in the history

Commits on Oct 18, 2016

  1. Now, if LDA topics are supplied lambda is not estimated at all. Added…

    … a small number to mu and phi normalization term to avoid divide by zero. Made some comments (NOTEs and TODOs) about numerical stability.
    olavurmortensen committed Oct 18, 2016
    Configuration menu
    Copy the full SHA
    938daff View commit details
    Browse the repository at this point in the history

Commits on Oct 19, 2016

  1. Updating notebook.

    olavurmortensen committed Oct 19, 2016
    Configuration menu
    Copy the full SHA
    b43d344 View commit details
    Browse the repository at this point in the history

Commits on Oct 20, 2016

  1. Configuration menu
    Copy the full SHA
    1dc7e6a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    910c626 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    7dbd01f View commit details
    Browse the repository at this point in the history

Commits on Oct 21, 2016

  1. Configuration menu
    Copy the full SHA
    9a04533 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b450609 View commit details
    Browse the repository at this point in the history

Commits on Nov 4, 2016

  1. Configuration menu
    Copy the full SHA
    d3ca917 View commit details
    Browse the repository at this point in the history

Commits on Nov 7, 2016

  1. Making sure that the model is evaluated after the last iteration, if …

    …eval_every is different from 0. Various comment changes. Updated notebook.
    olavurmortensen committed Nov 7, 2016
    Configuration menu
    Copy the full SHA
    ba5ba63 View commit details
    Browse the repository at this point in the history

Commits on Nov 8, 2016

  1. Updated notebook.

    olavurmortensen committed Nov 8, 2016
    Configuration menu
    Copy the full SHA
    afa747d View commit details
    Browse the repository at this point in the history
  2. Fixed mistake in interpolating gamma. Moved lambda update outside of …

    …'iterations' loop. Updated notebook.
    olavurmortensen committed Nov 8, 2016
    Configuration menu
    Copy the full SHA
    693b70b View commit details
    Browse the repository at this point in the history
  3. Working on an algorithm that tries to process each 'disjoint' set of …

    …authors independently in a mini-batch sort of way.
    olavurmortensen committed Nov 8, 2016
    Configuration menu
    Copy the full SHA
    7783261 View commit details
    Browse the repository at this point in the history

Commits on Nov 9, 2016

  1. Configuration menu
    Copy the full SHA
    868b174 View commit details
    Browse the repository at this point in the history

Commits on Nov 11, 2016

  1. Only updating the necessary expected log theta. Changed the name of O…

    …nlineAtVb2 to DisjointAtVb. Updated notebook. Other minor changes.
    olavurmortensen committed Nov 11, 2016
    Configuration menu
    Copy the full SHA
    1cfd00f View commit details
    Browse the repository at this point in the history

Commits on Nov 13, 2016

  1. Implemented a new algorithm. It is 5 times faster, more memory effici…

    …ent, and even gives better results.
    olavurmortensen committed Nov 13, 2016
    Configuration menu
    Copy the full SHA
    edd5025 View commit details
    Browse the repository at this point in the history

Commits on Nov 15, 2016

  1. Moved all algorithms except the new online one to a 'temp' folder. Ve…

    …ctorization is an option, so I can test it (speed up etc.). Updated notebook.
    olavurmortensen committed Nov 15, 2016
    Configuration menu
    Copy the full SHA
    fafc20a View commit details
    Browse the repository at this point in the history
  2. Changed the name of the main algorithm (and file). Made a new noteboo…

    …k for old tests, removed all old tests from main notebook. Removed references to old code in __init__.py.
    olavurmortensen committed Nov 15, 2016
    Configuration menu
    Copy the full SHA
    32e750d View commit details
    Browse the repository at this point in the history

Commits on Nov 16, 2016

  1. Cleaning up code. Removed or changed a lot of comments. Removed optio…

    …n of computing log probabilities, although the method still exists. Added a method of computing all the terms of the bound at once.
    olavurmortensen committed Nov 16, 2016
    Configuration menu
    Copy the full SHA
    4286e90 View commit details
    Browse the repository at this point in the history

Commits on Nov 21, 2016

  1. Was computing the norm of phi incorrectly, fixed that, speed-up not a…

    …s large as first thought. Made a method for computing phinorm. Updating lambda in a different way. Implemented a numerically stable softmax (phi is a softmax).
    olavurmortensen committed Nov 21, 2016
    Configuration menu
    Copy the full SHA
    12f231c View commit details
    Browse the repository at this point in the history

Commits on Nov 22, 2016

  1. Working on numerically stable phi update and bound computation. Is no…

    …t converging the same way, so it is an option for now.
    olavurmortensen committed Nov 22, 2016
    Configuration menu
    Copy the full SHA
    76764ff View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4cb3ee9 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    7c14f61 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    eade3e1 View commit details
    Browse the repository at this point in the history
  5. Updated notebook.

    olavurmortensen committed Nov 22, 2016
    Configuration menu
    Copy the full SHA
    6fe4c0e View commit details
    Browse the repository at this point in the history

Commits on Nov 23, 2016

  1. In mini-batch algo, only the terms seen in the current chunk are upda…

    …ted, otherwise problems occur.
    olavurmortensen committed Nov 23, 2016
    Configuration menu
    Copy the full SHA
    1975321 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e4a0e4b View commit details
    Browse the repository at this point in the history

Commits on Nov 28, 2016

  1. Configuration menu
    Copy the full SHA
    526a3bb View commit details
    Browse the repository at this point in the history
  2. Computing the bound more efficiently (much faster now). Now not passi…

    …ng entire expElogtheta and expElogbeta matrices to compute_phinorm.
    olavurmortensen committed Nov 28, 2016
    Configuration menu
    Copy the full SHA
    5ee9a95 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e0d7367 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    2f621e2 View commit details
    Browse the repository at this point in the history

Commits on Nov 30, 2016

  1. Configuration menu
    Copy the full SHA
    df11bb4 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    054d37c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9d9da44 View commit details
    Browse the repository at this point in the history

Commits on Dec 1, 2016

  1. The refactored code now runs, converges almost exactly as the old cod…

    …e, produces some decent results, but somehow the results are slightly different. Made some slight changes to LdaModel to make the author-topic model work.
    olavurmortensen committed Dec 1, 2016
    Configuration menu
    Copy the full SHA
    336ff92 View commit details
    Browse the repository at this point in the history

Commits on Dec 4, 2016

  1. Refactoring. Various docstring and commenting. Made methods for const…

    …ructing author2doc and doc2author so that the user may do this at will. Assuming that input to bound is seen data, as the converse may be problematic.
    olavurmortensen committed Dec 4, 2016
    Configuration menu
    Copy the full SHA
    e5e7722 View commit details
    Browse the repository at this point in the history

Commits on Dec 5, 2016

  1. New refactored code now in atmodel.py. Old code is in atmodelold.py, …

    …until I'm confident I don't need it anymore. Working on a new version in atmodel2.py, where I will be looping over authors rather than documents in the update.
    olavurmortensen committed Dec 5, 2016
    Configuration menu
    Copy the full SHA
    861e81a View commit details
    Browse the repository at this point in the history

Commits on Dec 7, 2016

  1. Implemented 'continued training' (call update multiple times) and __g…

    …etitem__ in refactored code (atmodel.py).
    olavurmortensen committed Dec 7, 2016
    Configuration menu
    Copy the full SHA
    e911aed View commit details
    Browse the repository at this point in the history

Commits on Dec 8, 2016

  1. A lot of changes. Most notably, added docstrings, and made it possibl…

    …e to evaluate test set (held-out data).
    olavurmortensen committed Dec 8, 2016
    Configuration menu
    Copy the full SHA
    ff7f8e6 View commit details
    Browse the repository at this point in the history

Commits on Dec 9, 2016

  1. Added unit tests. Basically a retrofit of LDA test; some new tests, s…

    …ome altered tests, some tests removed, others unchanged.
    olavurmortensen committed Dec 9, 2016
    Configuration menu
    Copy the full SHA
    bdac93a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9429c0a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e0dc2d9 View commit details
    Browse the repository at this point in the history

Commits on Dec 11, 2016

  1. Made it possible to use serialized corpora (MmCorpus), and made unit …

    …tests of that functionality. There are some caveats to the use of serialized corpora in the author-topic model. Updated docstring.
    olavurmortensen committed Dec 11, 2016
    Configuration menu
    Copy the full SHA
    aabc0f4 View commit details
    Browse the repository at this point in the history
  2. Removed code in unit tests that silence logging (useful when doing lo…

    …cal testing), as it broke some other tests.
    olavurmortensen committed Dec 11, 2016
    Configuration menu
    Copy the full SHA
    e526cbc View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2016

  1. Configuration menu
    Copy the full SHA
    8cb404f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6cf4e75 View commit details
    Browse the repository at this point in the history
  3. get_author_topics now takes author name instead of integer ID; change…

    …d unit tests accordingly. Overwrote get_document_topics to raise a NotImplementedError in the author-topic model.
    olavurmortensen committed Dec 12, 2016
    Configuration menu
    Copy the full SHA
    94956fa View commit details
    Browse the repository at this point in the history

Commits on Dec 13, 2016

  1. Configuration menu
    Copy the full SHA
    ebd9679 View commit details
    Browse the repository at this point in the history

Commits on Dec 28, 2016

  1. Configuration menu
    Copy the full SHA
    bafb5ef View commit details
    Browse the repository at this point in the history
  2. Added a new notebook where a stackexchange dataset is used. Started w…

    …riting a tutorial. Updated old notebook as well.
    olavurmortensen committed Dec 28, 2016
    Configuration menu
    Copy the full SHA
    8cd90cf View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ac9ecd4 View commit details
    Browse the repository at this point in the history

Commits on Jan 5, 2017

  1. Configuration menu
    Copy the full SHA
    aa08b49 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9ce1fd5 View commit details
    Browse the repository at this point in the history
  3. Two algorithms in 'temp' used to test the difference between blocking…

    … and standard variational Bayes AT model.
    olavurmortensen committed Jan 5, 2017
    Configuration menu
    Copy the full SHA
    f1f9f50 View commit details
    Browse the repository at this point in the history
  4. Added the deepcopy again. Without it, the program can fail and the sy…

    …stem crashes (looking into it).
    olavurmortensen committed Jan 5, 2017
    Configuration menu
    Copy the full SHA
    cad8f26 View commit details
    Browse the repository at this point in the history

Commits on Jan 10, 2017

  1. Configuration menu
    Copy the full SHA
    6caefd7 View commit details
    Browse the repository at this point in the history
  2. Comments and docstrings. Responding to comments from Lev, and working…

    … a bit with Sphinx documentation.
    olavurmortensen committed Jan 10, 2017
    Configuration menu
    Copy the full SHA
    48b6c1a View commit details
    Browse the repository at this point in the history
  3. Added the author-topic model to the API reference. Also slight change…

    … to author-topic model rst file.
    olavurmortensen committed Jan 10, 2017
    Configuration menu
    Copy the full SHA
    d03e020 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    7ac77b7 View commit details
    Browse the repository at this point in the history
  5. Removed test for single author in persistency test (test is simplifie…

    …d with this and the previous commit 7ac77b7).
    olavurmortensen committed Jan 10, 2017
    Configuration menu
    Copy the full SHA
    cab716d View commit details
    Browse the repository at this point in the history
  6. Removed save and load methods, using LdaModel's methods directly work…

    …s. Also fixed the 'endclass' comment.
    olavurmortensen committed Jan 10, 2017
    Configuration menu
    Copy the full SHA
    be7bddf View commit details
    Browse the repository at this point in the history

Commits on Jan 12, 2017

  1. Configuration menu
    Copy the full SHA
    ffadaf1 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e218883 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    7d2994f View commit details
    Browse the repository at this point in the history

Commits on Jan 13, 2017

  1. Modified the bound method; it was somewhat confusing, and there were …

    …even some mistakes. Cleaned up the code (atmodel.py and tests) w.r.t. PEP8 (disregarding E501, E731, E12 and W503) and removing vertical indent.
    olavurmortensen committed Jan 13, 2017
    Configuration menu
    Copy the full SHA
    616a965 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    7f98e3a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ddfc8f7 View commit details
    Browse the repository at this point in the history

Commits on Jan 14, 2017

  1. Updated algorithm and tests w.r.t. comments from Lev. Other changes a…

    …s well: removed do_mstep method (using LdaModels version directly), using minimum_probability in get_author_topics, removed statement (in log) that said perplexity is evaluated on held-out data.
    olavurmortensen committed Jan 14, 2017
    Configuration menu
    Copy the full SHA
    7d03608 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    661e7e5 View commit details
    Browse the repository at this point in the history
  3. Updated notebook.

    olavurmortensen committed Jan 14, 2017
    Configuration menu
    Copy the full SHA
    85123c0 View commit details
    Browse the repository at this point in the history
  4. Updated tutorial.

    olavurmortensen committed Jan 14, 2017
    Configuration menu
    Copy the full SHA
    91675a5 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    6d961a5 View commit details
    Browse the repository at this point in the history

Commits on Jan 16, 2017

  1. Configuration menu
    Copy the full SHA
    13fa9ee View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    018896c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a0a9832 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    5d6944a View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    8e56e9e View commit details
    Browse the repository at this point in the history

Commits on Jan 17, 2017

  1. Configuration menu
    Copy the full SHA
    aecaecb View commit details
    Browse the repository at this point in the history