Skip to content
GaelVaroquaux edited this page Sep 13, 2012 · 178 revisions

Sprint planning

PyconFR - Paris 13th-14th Sep 2012

We are organizing a sprint before the PyconFR 2012 conference.

People & tasks

  • Nelle Varoquaux - Isotonic regression
  • Olivier Grisel (working with Gaël Varoquaux on joblib parallelism)
  • Alexandre Gramfort
  • Fabian Pedregosa (Things I could work on: Implement ranking algorithms (RankSVM, IntervalRank), help with the isotonic regression and group lasso pull request)
  • Bertrand Thirion
  • Gaël Varoquaux (working with Olivier Grisel on joblib parallelism)
  • Alexandre Abraham
  • Virgile Fritsch
  • Nicolas Le Roux (providing machine learning expertise for RBM and DBN coding)

Location

La Villette, Paris, the 13th & 14th of September, from 10:00 until 18:00. The sprint will take place in the 'Carrefour Numerique', floor -1 of the 'cité des sciences': http://www.pycon.fr/2012/venue/

Tasks

Top priorities are merging: pull requests, fixing easyfix issues and improving documentation consistency.

In addition to the tasks listed below, it is useful to consider any issue in this list : https://github.com/scikit-learn/scikit-learn/issues

Easy

  • Improve test coverage: Run 'make test-coverage' after installing the coverage module, find low hanging fruits to improve coverage, and add tests. Try to test the logic, and not simple aim for augmenting the number of lines covered.
  • Finish estimator summary PR: https://github.com/scikit-learn/scikit-learn/pull/804

Branch merging

Improving and merging existing pull requests is the number one priority: https://github.com/scikit-learn/scikit-learn/pulls

There is a lot of very good code lying there, it often just needs a small amount of polishing

Not requiring expertise in machine learning

  • Affinity propagation using sparse matrices: the affinity propagation algorithm (scikits.learn.cluster.affinity_propagation_) should be able to work on sparse input affinity matrices without converting them to dense. A good implementation should make this efficient on very large data.

Machine learning tasks

  • Improve the documentation: You understand some aspects machine-learning. You can help making the scikit rock without writing a line of code: http://scikit-learn.org/dev/developers/index.html#documentation. See also Documentation-related issues in the issue tracker.
  • Text feature extraction (refactoring / API simplification) + hashing vectorizer: Olivier Grisel
  • Nearest Neighbors Classification/Regression : allowing more flexible Bayesian priors (currently only a flat prior is used); implementing alternative distance metrics: Jake Vanderplas
  • Group Lasso: Continue with pull request https://github.com/scikit-learn/scikit-learn/pull/947. Participants: @fabianp

K-means improvements

Participants: @mblondel

  • Code clean up
  • Speed improvements: don't reallocate clusters, track clusters that didn't change, triangular inequality
  • L1 distance: use L1 distance in e step and median (instead of mean) in m step
  • Fuzzy K-means: k-means with fuzzy cluster membership (not the same as GMM)
  • Move argmin and average operators to pairwise module (for L1/L2)
  • Support chunk size argument in argmin operator
  • Merge @ogrisel's branch
  • Add a score function (opposite of the kmeans objective)
  • Sparse matrices
  • fit_transform
  • more output options in transform (hard, soft, dense)

Semisupervised learning

Participants: @larsmans

  • EM algorithm for Naive Bayes (there is a pull request linguering)
  • Fix utility code to handle partially labeled data sets

More ambitious/long term tasks

  • Patch liblinear to have warm restart + LogisticRegressionCV.
    Comment (by Fabian): I tried this, take a look here: liblinear fork
  • Locality Sensitive Hashing, talk to Brian Holt
  • Fused Lasso
  • Group Lasso, talk to Alex Gramfort (by email), or Fabian Pedregosa
  • Manifold learning: improve MDS (talk to Nelle Varoquaux), t-SNE (talk to DWF)
  • Sparse matrix support in dictionary learning module

Past sprints

Clone this wiki locally