Skip to content
This repository has been archived by the owner on Jan 10, 2019. It is now read-only.

Statistical Analysis #9

Open
eliperelman opened this issue Dec 9, 2015 · 6 comments
Open

Statistical Analysis #9

eliperelman opened this issue Dec 9, 2015 · 6 comments

Comments

@eliperelman
Copy link

JavaScript lacks native functionality for statistical analysis, even for the most basic of functions:

  • Summation, mean, mode, etc.
  • Variance, standard deviation, variety of functions for probability and distribution
  • T-test, linear regression, multiple regression, etc.
@kgryte
Copy link

kgryte commented Dec 9, 2015

While JS lacks native functionality, many NPM modules already fill this gap. See distributions, compute, and other affiliated organizations.

The reality is JS will never have "native" support for everything, particularly for numeric and statistical computing, nor should we expect it to. Most Python and R functionality is provided by 3rd party libraries. We should expect something similar for JS.

Further, even native Math functionality is a bit dodgy, as the specifications often never stipulate particular algorithms, precision, etc, leaving implementors to roll their own. What this means is that, across browsers, you can have discrepancies in results from, say, Math.sin(), or differences in the robustness of things like random number generators (which are not particular robust).

I would probably argue that most functionality should be left to 3rd party libraries, as 3rd party libraries can iterate much faster than standards committees.

@emilbayes
Copy link

I think many of these are easily solved in userland, and should be. What this depends on is #7 and a high quality set of modules or a library. I don't see the lack of these functions in Javascript as a show stopper for doing statistical analysis. And something as simple as mean and mode have different interpretations depending on context.

For example in inferential statistic sample mean is usually defined to be sum(x)/(n - 1) as it does not make sense to have a mean of one value, and should therefore be undefined for n = 1. And population mean is usually sum(x)/N. Mode is tricky. What should the default behaviour be for a even number of observations? The average of the two middle values, the smaller or the larger one?

EDIT kgryte was that bit faster

@eliperelman
Copy link
Author

I am not arguing whether or not these should live in userland, rather that these are holes in the language. If someone wants to work towards improving standardization for scientific purposes, this is merely an area where it could be done. 😄

@max-mapper
Copy link
Contributor

I was reading TC39 meeting notes recently and there was something to the effect of "the Math functions in JS have never really been formally standardized, we kind of just let the implementers decide what to do".

There are downsides to this approach like Math.random. But my sense is that I think we're not going to see the Math stuff get overhauled too much by the standards bodies. However, it seems they are interested in providing new low level calls through things like SIMD or the 64 bit arithmetic proposal.

I'd be interested in knowing what low level requirements are missing to implement great statistics libs in JS.

One somewhat unique aspect of JS is that since it's pretty good at evented I/O it means streaming (or "on-line") stats algorithms could be worthwhile to implement in JS whereas in other languages managing evented I/O gets pretty complicated so people tend to stick with stats algorithms that buffer all the data into RAM and process it all at once.

@kgryte
Copy link

kgryte commented Dec 10, 2015

Re: low level requirements. 64-bit integers. The Math functions all operate on single numbers, not arrays, and, thus, I do not believe SIMD (especially long SIMD) would really be relevant for standard functions. Built-ins could accept arrays, but then that is a slippery-slope. If they accept arrays, they should probably accept anything array-like and then any new data structure which can be iterated. That the built-ins only accept a single numeric value punts the responsibility of applying those functions to other data structures to userland--which is the wise thing to do.

I think the biggest issue with the built-ins is at least getting the implementors to agree on minimum standards for precision and robustness. The fact that sin(), cos(), etc can vary from one environment to the next is a big obstacle in enabling reproducible numeric computation. So much so that we will probably roll our own sin(), cos(), and other functions in userland just to guarantee that behavior is the same regardless of browser/environment. That is a waste of time. Contrast this to Golang, where these functions have only a single implementation, thus providing a measure of reliability.

Re: streams. Yes, these have already been worked on. See, e.g., here, here, and other modules within the org. These APIs are going to change, but streaming stats was one of the original reasons I started work on building computational utilities in JS. Here, improving streams performance would be great.

@znmeb
Copy link

znmeb commented Jan 1, 2016

Here's a primitive example of the kind of thing people do these days - http://www.sumsar.net/blog/2015/12/bayes-js-a-small-library-for-doing-mcmc-in-the-browser/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants