Skip to content

Commit

Permalink
Merge branch 'develop' into fix_1589
Browse files Browse the repository at this point in the history
  • Loading branch information
probinso committed Jul 18, 2018
2 parents 8e6427e + 3a83960 commit ef5fe5d
Show file tree
Hide file tree
Showing 512 changed files with 159,808 additions and 18,159 deletions.
44 changes: 44 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
version: 2
jobs:
build:
docker:
- image: circleci/python:2.7

working_directory: ~/gensim

steps:
- checkout

- restore_cache:
key: pip-cache

- run:
name: Apt install (for latex render)
command: |
sudo apt-get -yq update
sudo apt-get -yq remove texlive-binaries --purge
sudo apt-get -yq --no-install-suggests --no-install-recommends --force-yes install dvipng texlive-latex-base texlive-latex-extra texlive-latex-recommended texlive-latex-extra texlive-fonts-recommended latexmk
- run:
name: Basic installation (tox)
command: |
python -m virtualenv venv
source venv/bin/activate
pip install tox
- run:
name: Build documentation
command: |
source venv/bin/activate
tox -e docs -vv
- store_artifacts:
path: docs/src/_build
destination: documentation

- save_cache:
key: pip-cache
paths:
- "~/.cache/pip"
- "~/.ccache"
- "~/.pip-cache"
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ Thumbs.db

# Other #
#########
.tox/
.cache/
.project
.pydevproject
.ropeproject
Expand Down Expand Up @@ -70,4 +72,5 @@ data
*_out.txt
*.html
*.inv
*.js
*.js
docs/_images/
45 changes: 25 additions & 20 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,23 +1,28 @@
sudo: false

cache:
apt: true
directories:
- $HOME/.cache/pip
- $HOME/.ccache
- $HOME/.pip-cache
dist: trusty
language: python
python:
- "2.7"
- "3.5"
- "3.6"
before_install:
- wget 'http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh' -O miniconda.sh
- chmod +x miniconda.sh
- ./miniconda.sh -b
- export PATH=/home/travis/miniconda2/bin:$PATH
- conda update --yes conda
install:
- conda create --yes -n gensim-test python=$TRAVIS_PYTHON_VERSION pip atlas numpy scipy
- source activate gensim-test
- pip install pyemd
- pip install annoy
- pip install testfixtures
- pip install unittest2
- pip install Morfessor==2.0.2a4
- python setup.py install
script: python setup.py test


matrix:
include:
- python: '2.7'
env: TOXENV="flake8"

- python: '2.7'
env: TOXENV="py27-linux"

- python: '3.5'
env: TOXENV="py35-linux"

- python: '3.6'
env: TOXENV="py36-linux"

install: pip install tox
script: tox -vv
1,124 changes: 1,002 additions & 122 deletions CHANGELOG.md

Large diffs are not rendered by default.

37 changes: 32 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,36 @@
Please see [contribution-guide.org](http://www.contribution-guide.org/) for steps we expect from contributors before submitting an issue or bug report. Be as concrete as possible, include relevant logs, package versions etc.
# How to submit an issue?

Please check the [Gensim FAQ](https://github.com/RaRe-Technologies/gensim/wiki/Recipes-&-FAQ) page before posting.
First, please see [contribution-guide.org](http://www.contribution-guide.org/) for the steps we expect from contributors before submitting an issue or bug report. Be as concrete as possible, include relevant logs, package versions etc.

**The proper place for open-ended questions is the [gensim mailing list](https://groups.google.com/forum/#!forum/gensim).** Github is not the right place for research discussions or feature requests.
Also, please check the [Gensim FAQ](https://github.com/RaRe-Technologies/gensim/wiki/Recipes-&-FAQ) page before posting.

For developers: see our [Developer Page](https://github.com/piskvorky/gensim/wiki/Developer-page#code-style) for details on code style, testing and similar.
**The proper place for open-ended questions is the [Gensim mailing list](https://groups.google.com/forum/#!forum/gensim).** Github is not the right place for research discussions or feature requests.

Thanks!
# How to add a new feature or create a pull request?

1. <a href="https://github.com/RaRe-Technologies/gensim/fork">Fork the Gensim repository</a>
2. Clone your fork: `git clone https://github.com/<YOUR_GITHUB_USERNAME>/gensim.git`
3. Create a new branch based on `develop`: `git checkout -b my-feature develop`
4. Setup your Python enviroment
- Create a new [virtual environment](https://virtualenv.pypa.io/en/stable/): `pip install virtualenv; virtualenv gensim_env` and activate it:
- For linux: `source gensim_env/bin/activate`
- For windows: `gensim_env\Scripts\activate`
- Install Gensim and its test dependencies in [editable mode](https://pip.pypa.io/en/stable/reference/pip_install/#editable-installs):
- For linux: `pip install -e .[test]`
- For windows: `pip install -e .[test-win]`
5. Implement your changes
6. Check that everything's OK in your branch:
- Check it for PEP8: `tox -e flake8`
- Build its documentation (works only for MacOS/Linux): `tox -e docs` (documentation stored in `docs/src/_build`)
- Run unit tests: `tox -e py{version}-{os}`, for example `tox -e py27-linux` or `tox -e py36-win` where
- `{version}` is one of `27`, `35`, `36`
- `{os}` is either `win` or `linux`
7. Add files, commit and push: `git add ... ; git commit -m "my commit message"; git push origin my-feature`
8. [Create a PR](https://help.github.com/articles/creating-a-pull-request/) on Github. Write a **clear description** for your PR, including all the context and relevant information, such as:
- The issue that you fixed, e.g. `Fixes #123`
- Motivation: why did you create this PR? What functionality did you set out to improve? What was the problem + an overview of how you fixed it? Whom does it affect and how should people use it?
- Any other useful information: links to other related Github or mailing list issues and discussions, benchmark graphs, academic papers…

P.S. for developers: see our [Developer Page](https://github.com/piskvorky/gensim/wiki/Developer-page#code-style) for details on the Gensim code style, CI, testing and similar.

**Thanks and let's improve the open source world together!**
48 changes: 48 additions & 0 deletions ISSUE_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
<!--
If your issue is a usage or a general question, please submit it here instead:
- Mailing List: https://groups.google.com/forum/#!forum/gensim
For more information, see Recipes&FAQ: https://github.com/RaRe-Technologies/gensim/wiki/Recipes-&-FAQ
-->

<!-- Instructions For Filing a Bug: https://github.com/RaRe-Technologies/gensim/blob/develop/CONTRIBUTING.md -->

#### Description
TODO: change commented example
<!-- Example: Vocabulary size is not what I expected when training Word2Vec. -->

#### Steps/Code/Corpus to Reproduce
<!--
Example:
```
from gensim.models import word2vec
sentences = ['human', 'machine']
model = word2vec.Word2Vec(sentences)
print(model.syn0.shape)
```
If the code is too long, feel free to put it in a public gist and link
it in the issue: https://gist.github.com
-->

#### Expected Results
<!-- Example: Expected shape of (100,2).-->

#### Actual Results
<!-- Example: Actual shape of (100,5).
Please paste or specifically describe the actual output or traceback. -->

#### Versions
<!--
Please run the following snippet and paste the output below.
import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import gensim; print("gensim", gensim.__version__)
from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)
-->


<!-- Thanks for contributing! -->

12 changes: 8 additions & 4 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
recursive-include docs *
recursive-include gensim/test/test_data *
recursive-include . *.sh
prune docs/src*
prune docs/notebooks/datasets
include README.md
include CHANGELOG.md
include COPYING
Expand All @@ -14,3 +10,11 @@ include gensim/models/word2vec_inner.pyx
include gensim/models/word2vec_inner.pxd
include gensim/models/doc2vec_inner.c
include gensim/models/doc2vec_inner.pyx
include gensim/models/fasttext_inner.c
include gensim/models/fasttext_inner.pyx
include gensim/models/_utils_any2vec.c
include gensim/models/_utils_any2vec.pyx
include gensim/corpora/_mmreader.c
include gensim/corpora/_mmreader.pyx
include gensim/_matutils.c
include gensim/_matutils.pyx
44 changes: 26 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
gensim – Topic Modelling in Python
==================================

[![Build Status](https://travis-ci.org/RaRe-Technologies/gensim.svg?branch=develop)](https://travis-ci.org/RaRe-Technologies/gensim)[![GitHub release](https://img.shields.io/github/release/rare-technologies/gensim.svg?maxAge=2592000)]()[![Wheel](https://img.shields.io/pypi/wheel/gensim.svg)](https://pypi.python.org/pypi/gensim)
[![Mailing List](https://img.shields.io/badge/-Mailing%20List-lightgrey.svg)](https://groups.google.com/forum/#!forum/gensim)
[![Build Status](https://travis-ci.org/RaRe-Technologies/gensim.svg?branch=develop)](https://travis-ci.org/RaRe-Technologies/gensim)
[![GitHub release](https://img.shields.io/github/release/rare-technologies/gensim.svg?maxAge=3600)](https://github.com/RaRe-Technologies/gensim/releases)
[![Conda-forge Build](https://anaconda.org/conda-forge/gensim/badges/version.svg)](https://anaconda.org/conda-forge/gensim)
[![Wheel](https://img.shields.io/pypi/wheel/gensim.svg)](https://pypi.python.org/pypi/gensim)
[![DOI](https://zenodo.org/badge/DOI/10.13140/2.1.2393.1847.svg)](https://doi.org/10.13140/2.1.2393.1847)
[![Mailing List](https://img.shields.io/badge/-Mailing%20List-brightgreen.svg)](https://groups.google.com/forum/#!forum/gensim)
[![Gitter](https://img.shields.io/badge/gitter-join%20chat%20%E2%86%92-09a3d5.svg)](https://gitter.im/RaRe-Technologies/gensim)
[![Follow](https://img.shields.io/twitter/follow/spacy_io.svg?style=social&label=Follow)](https://twitter.com/gensim_py)

[![Follow](https://img.shields.io/twitter/follow/gensim_py.svg?style=social&label=Follow)](https://twitter.com/gensim_py)



Expand Down Expand Up @@ -120,20 +123,25 @@ Adopters

| Name | Logo | URL | Description |
|----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| RaRe Technologies | <img src="http://rare-technologies.com/wp-content/uploads/2016/02/rare_image_only.png" width="100"> | [rare-technologies.com](http://rare-technologies.com) | Machine learning & NLP consulting and training. Creators and maintainers of Gensim. |
| Talentpair | ![Talentpair](https://avatars3.githubusercontent.com/u/8418395?v=3&s=100) | [talentpair.com](http://talentpair.com) | Data science driving high-touch recruiting |
| Tailwind | <img src="http://blog.tailwindapp.com/wp-content/uploads/2013/10/Tailwind-Square-Logo-Blue-White-300x300.png" width="100"> | [Tailwindapp.com](https://www.tailwindapp.com/)| Post interesting and relevant content to Pinterest |
| Issuu | <img src="https://static.isu.pub/fe/issuu-brandpages/s3/155/press/assets/brand_package_zip/issuu%20logos/png/issuu-logo-stacked-colour.png" width="100"> | [Issuu.com](https://issuu.com/)| Gensim’s LDA module lies at the very core of the analysis we perform on each uploaded publication to figure out what it’s all about.
| Sports Authority | <img src="https://upload.wikimedia.org/wikipedia/commons/6/6c/Sports_Authority_logo2011.jpg" width="100"> | [sportsauthority.com](https://en.wikipedia.org/wiki/Sports_Authority)| Text mining of customer surveys and social media sources |
| Search Metrics | <img src="http://www.searchmetrics.com/wp-content/uploads/Logo_searchmetrics_Webversion.png" width="100"> | [searchmetrics.com](http://www.searchmetrics.com/)| Gensim word2vec used for entity disambiguation in Search Engine Optimisation
| Cisco Security | <img src="https://supportforums.cisco.com/sites/default/files/legacy/1/6/1/2161-CiscoSystems.gif" width="100"> | [cisco.com](http://www.cisco.com/c/en/us/products/security/index.html)| Large-scale fraud detection
| 12K Research | <img src="https://static1.squarespace.com/static/548d6f40e4b0fb61d7b8f40b/t/57310800b09f95e472ba5dd1/1462831123953/12k-logo.png" width="100"> | [12k.co](https://12k.co/)| Document similarity analysis on media articles
| National Institutes of Health | <img src="https://www.nih.gov/sites/default/files/styles/featured_media_breakpoint-large/public/about-nih/2012-logo.png" width="100"> | [github/NIHOPA](https://github.com/NIHOPA/pipeline_word2vec)| Processing grants and publications with word2vec
| Codeq LLC | <img src="https://codeq.com/wp-content/themes/codeq/assets/img/logo.svg" width="100"> | [codeq.com](https://codeq.com)| Document classification with word2vec
| Mass Cognition | <img src="http://static1.squarespace.com/static/5637b16ee4b050255657c537/t/56a683bf9cadb6bf86a0ea13/1461016648294/?format=1500w" width="100"> | [masscognition.com](http://www.masscognition.com/) | Topic analysis service for consumer text data and general text data |
| Stillwater Supercomputing | <img src="http://www.stillwater-sc.com/img/stillwater-logo.png" width="100"> | [stillwater-sc.com](http://www.stillwater-sc.com/) | Document comprehension and association with word2vec |
| Channel 4 | <img src="http://www.channel4.com/static/info/images/lib/c4logo_2015_info_corporate.jpg" width="100"> | [channel4.com](http://www.channel4.com/) | Recommendation engine |
| Amazon | <img src="http://g-ec2.images-amazon.com/images/G/01/social/api-share/amazon_logo_500500._V323939215_.png" width="100"> | [amazon.com](http://www.amazon.com/) | Document similarity|
| RaRe Technologies | ![rare](docs/src/readme_images/rare.png) | [rare-technologies.com](http://rare-technologies.com) | Machine learning & NLP consulting and training. Creators and maintainers of Gensim. |
| Mindseye | ![mindseye](docs/src/readme_images/mindseye.png) | [mindseye.com](http://www.mindseyesolutions.com/) | Similarities in legal documents |
| Talentpair | ![talent-pair](docs/src/readme_images/talent-pair.png) | [talentpair.com](http://talentpair.com) | Data science driving high-touch recruiting |
| Tailwind | ![tailwind](docs/src/readme_images/tailwind.png)| [Tailwindapp.com](https://www.tailwindapp.com/)| Post interesting and relevant content to Pinterest |
| Issuu | ![issuu](docs/src/readme_images/issuu.png) | [Issuu.com](https://issuu.com/)| Gensim’s LDA module lies at the very core of the analysis we perform on each uploaded publication to figure out what it’s all about.
| Sports Authority | ![sports-authority](docs/src/readme_images/sports-authority.png) | [sportsauthority.com](https://en.wikipedia.org/wiki/Sports_Authority)| Text mining of customer surveys and social media sources |
| Search Metrics | ![search-metrics](docs/src/readme_images/search-metrics.png) | [searchmetrics.com](http://www.searchmetrics.com/)| Gensim word2vec used for entity disambiguation in Search Engine Optimisation
| Cisco Security | ![cisco](docs/src/readme_images/cisco.png) | [cisco.com](http://www.cisco.com/c/en/us/products/security/index.html)| Large-scale fraud detection
| 12K Research | ![12k](docs/src/readme_images/12k.png)| [12k.co](https://12k.co/)| Document similarity analysis on media articles
| National Institutes of Health | ![nih](docs/src/readme_images/nih.png) | [github/NIHOPA](https://github.com/NIHOPA/pipeline_word2vec)| Processing grants and publications with word2vec
| Codeq LLC | ![codeq](docs/src/readme_images/codeq.png) | [codeq.com](https://codeq.com)| Document classification with word2vec
| Mass Cognition | ![mass-cognition](docs/src/readme_images/mass-cognition.png) | [masscognition.com](http://www.masscognition.com/) | Topic analysis service for consumer text data and general text data |
| Stillwater Supercomputing | ![stillwater](docs/src/readme_images/stillwater.png) | [stillwater-sc.com](http://www.stillwater-sc.com/) | Document comprehension and association with word2vec |
| Channel 4 | ![channel4](docs/src/readme_images/channel4.png) | [channel4.com](http://www.channel4.com/) | Recommendation engine |
| Amazon | ![amazon](docs/src/readme_images/amazon.png) | [amazon.com](http://www.amazon.com/) | Document similarity|
| SiteGround Hosting | ![siteground](docs/src/readme_images/siteground.png) | [siteground.com](https://www.siteground.com/) | An ensemble search engine which uses different embeddings models and similarities, including word2vec, WMD, and LDA. |
| Juju | ![juju](docs/src/readme_images/juju.png) | [www.juju.com](http://www.juju.com/) | Provide non-obvious related job suggestions. |
| NLPub | ![nlpub](docs/src/readme_images/nlpub.png) | [nlpub.org](https://nlpub.org/) | Distributional semantic models including word2vec. |
|Capital One | ![capitalone](docs/src/readme_images/capitalone.png) | [www.capitalone.com](https://www.capitalone.com/) | Topic modeling for customer complaints exploration. |

-------

Expand Down
Loading

0 comments on commit ef5fe5d

Please sign in to comment.