-
Notifications
You must be signed in to change notification settings - Fork 4
Upcoming events
We are organizing a sprint before the PyconFR 2012 conference.
- Nelle Varoquaux - Isotonic regression
- Olivier Grisel (working with Gaël Varoquaux on joblib parallelism)
- Alexandre Gramfort
- Fabian Pedregosa (Things I could work on: Implement ranking algorithms (RankSVM, IntervalRank), help with the isotonic regression and group lasso pull request)
- Bertrand Thirion
- Gaël Varoquaux (working with Olivier Grisel on joblib parallelism)
- Alexandre Abraham
- Virgile Fritsch
- Nicolas Le Roux (providing machine learning expertise for RBM and DBN coding)
La Villette, Paris, the 13th & 14th of September, from 10:00 until 18:00. The sprint will take place in the 'Carrefour Numerique', floor -1 of the 'cité des sciences': http://www.pycon.fr/2012/venue/
Top priorities are merging: pull requests, fixing easyfix issues and improving documentation consistency.
In addition to the tasks listed below, it is useful to consider any issue in this list : https://github.com/scikit-learn/scikit-learn/issues
- Improve test coverage: Run 'make test-coverage' after installing the coverage module, find low hanging fruits to improve coverage, and add tests. Try to test the logic, and not simple aim for augmenting the number of lines covered.
- Finish estimator summary PR: https://github.com/scikit-learn/scikit-learn/pull/804
Improving and merging existing pull requests is the number one priority: https://github.com/scikit-learn/scikit-learn/pulls
There is a lot of very good code lying there, it often just needs a small amount of polishing
- Affinity propagation using sparse matrices: the affinity propagation algorithm (scikits.learn.cluster.affinity_propagation_) should be able to work on sparse input affinity matrices without converting them to dense. A good implementation should make this efficient on very large data.
- Improve the documentation: You understand some aspects machine-learning. You can help making the scikit rock without writing a line of code: http://scikit-learn.org/dev/developers/index.html#documentation. See also Documentation-related issues in the issue tracker.
- Text feature extraction (refactoring / API simplification) + hashing vectorizer: Olivier Grisel
- Nearest Neighbors Classification/Regression : allowing more flexible Bayesian priors (currently only a flat prior is used); implementing alternative distance metrics: Jake Vanderplas
- Group Lasso: Continue with pull request https://github.com/scikit-learn/scikit-learn/pull/947. Participants: @fabianp
Participants: @mblondel
- Code clean up
- Speed improvements: don't reallocate clusters, track clusters that didn't change, triangular inequality
- L1 distance: use L1 distance in e step and median (instead of mean) in m step
- Fuzzy K-means: k-means with fuzzy cluster membership (not the same as GMM)
- Move argmin and average operators to pairwise module (for L1/L2)
- Support chunk size argument in argmin operator
- Merge @ogrisel's branch
- Add a score function (opposite of the kmeans objective)
- Sparse matrices
- fit_transform
- more output options in transform (hard, soft, dense)
Participants: @larsmans
- EM algorithm for Naive Bayes (there is a pull request linguering)
- Fix utility code to handle partially labeled data sets
-
- Patch liblinear to have warm restart + LogisticRegressionCV.
- Comment (by Fabian): I tried this, take a look here: liblinear fork
- Locality Sensitive Hashing, talk to Brian Holt
- Fused Lasso
- Group Lasso, talk to Alex Gramfort (by email), or Fabian Pedregosa
- Manifold learning: improve MDS (talk to Nelle Varoquaux), t-SNE (talk to DWF)
- Sparse matrix support in dictionary learning module