Upcoming events

Sprint planning

PyconFR - Paris 13th-14th Sep 2012

We are organizing a sprint before the PyconFR 2012 conference.

People & tasks

Nelle Varoquaux - Isotonic regression
Olivier Grisel (working with Gaël Varoquaux on joblib parallelism)
Alexandre Gramfort
Fabian Pedregosa (Things I could work on: Implement ranking algorithms (RankSVM, IntervalRank), help with the isotonic regression and group lasso pull request)
Bertrand Thirion
Gaël Varoquaux (working with Olivier Grisel on joblib parallelism)
Alexandre Abraham
Virgile Fritsch
Nicolas Le Roux (providing machine learning expertise for RBM and DBN coding)

Location

La Villette, Paris, the 13th & 14th of September, from 10:00 until 18:00. The sprint will take place in the 'Carrefour Numerique', floor -1 of the 'cité des sciences': http://www.pycon.fr/2012/venue/

Tasks

Top priorities are merging: pull requests, fixing easyfix issues and improving documentation consistency.

In addition to the tasks listed below, it is useful to consider any issue in this list : https://github.com/scikit-learn/scikit-learn/issues

Easy

Improve test coverage: Run 'make test-coverage' after installing the coverage module, find low hanging fruits to improve coverage, and add tests. Try to test the logic, and not simple aim for augmenting the number of lines covered.
Finish estimator summary PR: https://github.com/scikit-learn/scikit-learn/pull/804

Branch merging

Improving and merging existing pull requests is the number one priority: https://github.com/scikit-learn/scikit-learn/pulls

There is a lot of very good code lying there, it often just needs a small amount of polishing

Not requiring expertise in machine learning

Affinity propagation using sparse matrices: the affinity propagation algorithm (scikits.learn.cluster.affinity_propagation_) should be able to work on sparse input affinity matrices without converting them to dense. A good implementation should make this efficient on very large data.

Machine learning tasks

Improve the documentation: You understand some aspects machine-learning. You can help making the scikit rock without writing a line of code: http://scikit-learn.org/dev/developers/index.html#documentation. See also Documentation-related issues in the issue tracker.

Text feature extraction (refactoring / API simplification) + hashing vectorizer: Olivier Grisel

Nearest Neighbors Classification/Regression : allowing more flexible Bayesian priors (currently only a flat prior is used); implementing alternative distance metrics: Jake Vanderplas

Group Lasso: Continue with pull request https://github.com/scikit-learn/scikit-learn/pull/947. Participants: @fabianp

K-means improvements

Participants: @mblondel

Code clean up
Speed improvements: don't reallocate clusters, track clusters that didn't change, triangular inequality
L1 distance: use L1 distance in e step and median (instead of mean) in m step
Fuzzy K-means: k-means with fuzzy cluster membership (not the same as GMM)
Move argmin and average operators to pairwise module (for L1/L2)
Support chunk size argument in argmin operator
Merge @ogrisel's branch
Add a score function (opposite of the kmeans objective)
Sparse matrices
fit_transform
more output options in transform (hard, soft, dense)

Semisupervised learning

Participants: @larsmans

EM algorithm for Naive Bayes (there is a pull request linguering)
Fix utility code to handle partially labeled data sets

More ambitious/long term tasks

Patch liblinear to have warm restart + LogisticRegressionCV.

Comment (by Fabian): I tried this, take a look here: liblinear fork
Locality Sensitive Hashing, talk to Brian Holt
Fused Lasso
Group Lasso, talk to Alex Gramfort (by email), or Fabian Pedregosa
Manifold learning: improve MDS (talk to Nelle Varoquaux), t-SNE (talk to DWF)
Sparse matrix support in dictionary learning module

Past sprints

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upcoming events

Sprint planning

PyconFR - Paris 13th-14th Sep 2012

People & tasks

Location

Tasks

Easy

Branch merging

Not requiring expertise in machine learning

Machine learning tasks

K-means improvements

Semisupervised learning

More ambitious/long term tasks

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally