Skip to content
This repository has been archived by the owner on Jan 10, 2019. It is now read-only.

N-dimensional arrays #7

Open
max-mapper opened this issue Dec 4, 2015 · 17 comments
Open

N-dimensional arrays #7

max-mapper opened this issue Dec 4, 2015 · 17 comments

Comments

@max-mapper
Copy link
Contributor

Numpy kicked off what eventually evolved into the Pydata stack. One of the core components of Numpy was an ndarray type. What's an ndarray? It's a way to efficiently store multidimensional data (n-dimension, where you pick the n, common ones are 1, 2, 3 and 4 dimensions but you can go as high as your data requires). For example, a photo would be a 2 dimensional ndarray, with X and Y coordinates of the pixels. A volume is 3 dimensional, with X, Y and Z, etc.

You can obviously do this with nested JavaScript Array objects for example, e.g.

var array = [
  [1, 0, 1],
  [2, 1, 0],
  [1, 3, 1]
]
var x = array[1][2]

The downside of this is memory overhead. Each array is a JS Object. This makes it slower to access data because of the pointer dereferencing from object -> value that JS has to do, and also makes it take up more memory to store data because of the overhead of storing JS Objects. It's also hard to lay out the data to exploit memory caching and hard to modify/slice.

For more of an introduction to ndarrays, watch this talk by @mikolalysenko https://www.youtube.com/watch?v=vNCeWK_Wb5k, specifically the first 10 minutes.

Numpy ndarrays are implemented in C, and exposed as a Python API. They can get around the issue with Object overhead mentioned above by doing memory management in C.

ndarrays in JavaScript

There are two projects (that I'm aware of) working on ndarrays in JS

At first glance they appear quite similar in API to each other. There are surely implementation differences, but I am not knowledgable enough about these to explain them here. Perhaps the authors will chime in below.

Both of these are written in pure JS. Typed Objects and SIMD could increase memory bandwidth for these implementations.

Another approach would be to write native bindings, for example the work-in-progress vectorious JS module has a Matrix library for 2 dimensional data that is backed by lapack and blas, both of which are well known, time tested linear algebra native libraries.

This native binding approach is similar to Numpy etc. The upside is being able to re-use existing native implementations, the downside is figuring out how to run it in the browser. See #6 for more discussion on this point.

In summary, there are a couple ndarray implementations you can use today, but there are definitely things that JS as a platform can add to make writing an ndarray implementation better in the future.

Open questions

  • What are the key differences between scijs/ndarray and dstructs/ndarray?
  • Could the lapack and blas bindings from @mateogianolio be used to write a native ndarray implementation (similar in API to numpy ndarrays)?
  • What are the key missing features from JS that would improve ndarrays the most?

Have feedback or comments? Leave a comment below

@mourner
Copy link

mourner commented Dec 10, 2015

Summoning @mikolalysenko to chime in about the topic. Also reported dstructs/ndarray#1.

@rreusser
Copy link

I'm typing this on a phone so I'll be brief, if I can. (Update: I couldn't) And I've been at best a (questionable) contributor to the scijs/ndarray ecosystem, but I do have many thoughts and took some time to familiarize myself with the pain points.

The differences:

Is it alright if I call it as I see it? As far as I understand (correct me if I'm wrong, and I'm gonna stop being so deferential), it's in large part a difference in intended use-case. Mikola's code tends to be well thought through, remarkably effective, bare-bones optimized, and has been run through the ringer because it's been around for a while, but at the same time can be rather dense (see: ndarray). Ndarray is solid but due to the distributed nature of the community, the contrib stuff (i.e. me) is a little fast and loose. Dstructs takes basically the opposite approach. It prioritizes clarity, robustness, and existence over optimization (though certainly not to the exclusion). It features heavy enumeration of input data types.

To put it differently, I think scijs/ndarray aims to be a workhorse around which rigor and validation can be built; dstructs aims to be closer to the interface itself. If not simply the interface. I'm sure I've generalized, misinterpreted and oversimplified. Please correct me.

My opinion:

I think the primary challenge, as much as a technical one, is the need for a community that can put together a clear vision and target (standards, architecture, etc) and then delegate responsibility.

If you want to know, I ran into trouble when I tried to write basic linear algebra stuff. I over-optimized some of it and the result is frankly a barely-usable mess. Like should it support lazy/sparse arrays even when that slows down the other stuff by 100-200%? Do I opt for dstructs-style enumeration? I really really like dstruct's consistency and utility functions as far as types (array vs typed arrays, for one) go. Scijs is a little loose around the edges and I run into it all the time as soon as I use it. And does an integration routine accept Arrays? Ndarrays? Both? Oof.

My really honest opinion:

Honestly, I think dstructs et al is where the effort should go. I haven't contributed—for no particularly good reasons except the rigor is a somewhat higher barrier to entry than tossing some code that I'm pretty sure is more or less okay onto scijs—but I think it comes closer to biting the bullet and accepting that some of this just comes down to verbosity and doing lots of grunt work to make the API rigorous and consistent.

On native bindings:

I'm not particularly interested in native bindings since if I didn't want to write for the web I'd be using Python. Which leads to emscripten or similar. The others involved here know way more.

I do know this though: writing linear algebra routines well is really, really challenging. I spent too long this summer working on an SVD decomposition from scratch (it's close, but close doesn't count). Getting that stuff right is a big investment. Otherwise it just sucks. And the standard stuff is all .f90 which doesn't transpile well or something. Honestly, I think if it were possible to plug into transpiled BLAS/Lapack that'd be best. It's just too hard to reinvent that stuff and people have high standards these days because everything else is so far ahead of JS. My intention was to write functions in the style of BLAS so that things could conceivably be swapped, but that was challenging for the reasons listed above. I could do better, but it's tough when it's not backed by a clear vision.

Plus, is this something vendors like Intel are working on yet? SIMD? BLAS? Seems like stuff they'd be into.

Missing features:

  • a long document where someone hashes out all the details like how we're gonna specify types and write addition and casting and whether it's gonna include validation and we specify precisely what this system looks like because I think the current challenges are as much architectural and organizational as they are nuts and bolts technical so we might as well at least address one of the hurdles.
  • funding? If someone could work on this all day that'd certainly help.

Bottom line:

I think the pieces are almost in place. The interest is here. I'm basically willing to give up all of my style preferences etc if I think my effort is going toward a common, coherent goal. You're all awesome. Can we solve this and build something great?

@rreusser
Copy link

I feel like my totally overwrought response killed any potential discussion. Sorry for that. A moment of weakness on my part. I think all I really intended or had to say is this:

There are difficult technical issues beneath the surface (how to handle BLAS/Lapack, in particular), but I think a lot of headway can be made just by straightening out how the status, structure and goals of these projects is communicated. At any rate, I'm going to put my money where my mouth is and try to start making headway again.

So again, sorry to hijack the discussion so quickly. Would love to hear thoughts on what we need to do to move this stuff forward.

@znmeb
Copy link

znmeb commented Dec 10, 2015

I'm not terribly familiar with scientific computing beyond the core dense and sparse matrix arena. But I can point to a (Matlab) project that boldly goes where few have gone before into practical uses of arrays beyond two dimensions:

MATLAB Tensor Toolbox Version 2.6 (released Feb. 6, 2015)

@mikolalysenko
Copy link

First, a few words on ndarray in general: The core module is actually really trivial! And people shouldn't be making that big a deal about it!

When I wrote it, what I really wanted to create was cwise, which is where all of the interesting stuff happens. An ndarray by itself does nothing, you need some module to act on it in order to get neat results.

The goal of ndarray is to provide some standard notation for working with higher dimensional views of 1D data stores. One important point is that users aren't supposed to directly call the get/set methods of ndarrays, especially in a loop. This will be slow, and in such a case you are better off just working directly with pointers into the underlying 1d array instead.

Another misconception about ndarrays is that I didn't intend for users to call get/set all the time. Doing this is slow, and is only meant for writing prototypes and debugging. Because ndarrays can be rotated and reordered in different ways, it is necessary to consider the memory alignment of the underlying store when you are writing such a loop in order to ensure that such operations are performant.

It is pretty easy to roll your own ndarray-compatible alternative, and I've considered doing this a few times in the past. For example, I could see there being some benefit for writing a bounds-checked ndarray that makes sure you never slice past the end of an array. You could even skip using ndarray altogether and just pass tuples of {shape, stride, offset, data} around if you really want to trim down to the smallest possible set of dependencies.

Originally, I had a couple of uses in mind when I created ndarray:

  • First, I wanted a flexible format for working with WebGL data. Things like packing vertex attributes and so on into flattened interleaved arrays are tedious, and one of my ideas with ndarray was to simplify this by using slices (or views) of the underlying buffers.
  • Second I wanted tools for working with images that would make it easy to select channels and subregions.
  • I also wanted to work with general volume/voxel data. This would be for 3D printing and for making minecraft-like games, for example voxel.js.

Linear algebra (and general tensor arithmetic) was and still remains a long term goal, but there are still lots of smaller and more immediate problems to solve. At the moment, I think scijs has a pretty reasonable solution for dense matrix arithmetic, but unfortunately this isn't that important. For most interesting applications, (with the one exception being dense SVDs/eigendecompositions), sparse linear algebra is far more important.

ndarray should be able to support a variety of sparse matrix data types, and we've already got stuff for working with hash maps, run-length-encoding (which generalizes CSR/CSC), and user defined data structures. While this all works right now, modules like cwise don't do any optimizations to speed up rle data which sort of defeats the point. Long term we need more modules that take advantage of sparsity in order for this to be really useful.

Applications of tensor algebra remain more speculative. I've read a fair bit on higher order singular value decompositions, but they are slow and difficult to compute. For example, just finding the rank of a 3rd order tensor is NP-hard. I do think it is an interesting and fundamental problem to research, but it just isn't something that we need to worry about doing in the browser in the immediate future.

@kgryte
Copy link

kgryte commented Dec 11, 2015

My understanding is that the primary intent of this issue is not so much which implementation is better or best or will be "THE" implementation. Rather, the intent is to determine what standards (JS, Node.js) need to move forward to better enable interfaces like ndarrays.

First, to echo @mikolalysenko and demystify what an ndarray is and is not, an ndarray is primarily an API to access elements which are stored contiguously (or sparsely) in memory. In JS, this means adding some sugar on top of, say, typed-arrays. Nothing special; nothing magical.

As Mikola mentioned, but just to reiterate, what matters is not so much the API (e.g., the presence of setters, getters, view creation methods, etc), but how ndarrays are used. This is the insight behind cwise and implementations in other platforms which take into account caching and access optimizations.

And further, ndarrays are useful and can be powerful, but they are also not the be-all, end-all data structure / API. Vanilla arrays, typed-arrays, sets, structs, data-frames, etc, are also immensely valuable and more powerful depending on the context.

Now, in terms of language features, my main gripe about any ndarray implementation or, more generally, any higher order data structure is verbosity stemming from a lack of operator overloading and value types. The lack of operator overloading requires functional interfaces for data access and manipulation. For instance, in MATLAB,

% Perform element-wise matrix addition:
A = B + C;

In JavaScript, using set and get methods

// Perform element-wise matrix addition in JavaScript:
let A = matrix( B.shape, 'float64' );
for ( let i = 0; i < B.shape[0]; i++ ) {
    for ( let j = 0; j < B.shape[1]; j++ ) {
        A.set( i, j ) = B.get( i, j ) + C.get( i, j );
    }
}

which can be optimized if we make assumptions about how a matrix \ ndarray is implemented (i.e., a view represents all underlying data)

const len = B.length;
let A = matrix( B.shape, 'float64' );
for ( let i = 0; i < len; i++ ) {
    A.data[ i ] = B.data[ i ] + C.data[ i ];
}

which can be consolidated into a function

let A = add( B, C );

Of course, under-the-hood, operator overloading is effectively the equivalent of add(); however, with operator overloading, the syntax is much more compact, equating to productivity gains, greater convenience, fewer bugs, and increased expressiveness.

This is particularly the case when we take into account things like broadcasting. For instance, in MATLAB,

% Add a matrix and a scalar:
A = B + 5;

which is often solved in JS by either defining a new method tailored to input argument types (e.g., ndarray-ops) or polymorphism (e.g., compute-add); for example,

let A = addmatscalar( B, 5 );

or

let A = add( B, 5 );

function add( x, y ) {
    if ( typeof x === 'matrix' ) {
        if ( typeof y === 'matrix' ) {
            // Do matrix-matrix addition...
        } else {
            // Do matrix-number addition...
        }
    } else {
        if ( typeof y === 'matrix' ) {
            // Do number-matrix addition...
        } else {
            // Do number-number addition...
        }
    }

}

The former approach leads to a Cambrian explosion of methods; the latter approach can prevent runtime optimization, as the compiler cannot make assumptions about input argument types.

Once again, under-the-hood, operator overloading has to also account for argument types, but hides this from the user. Currently, in JS, users are forced to deal with this explicitly, which, in my experience, is a sub-optimal experience.

Additionally, operator overloading would enable, e.g., a compact syntax for slicing. For example, in Python,

# Grab the first two rows and the odd columns
A = B[0:3, 1::2]

In MATLAB,

% Grab the first two rows and the odd columns:
A = B(1:2, 1:2:end);

In JavaScript, you achieve the same thing but have to use strings, which are then converted to subsequences of numeric values.

let A = B.sget( '0:3,1::2' );

The former is much more compact and does not require string parsing and subsequent type conversion.

Now, provided operator overloading, the ability to define custom value types would be immensely valuable. This would allow, e.g., a complex number value type and associated operator overloading logic. For example,

let ai = complex( real1, imag1 );
let bi = complex( real2, imag2 );

let ci = ai + bi;

A last comment is that, while python numeric computing tools are good, we should be leery about upholding it as the gold standard. We stand to learn a lot from the many other computing and programming environments out there: R, Python, Maple, Mathematica, MATLAB, Golang, Julia, and many others. To solve many of the issues documented in this organization, we stand to benefit by synthesizing the various approaches, both in terms of interfaces and implementations, and seeing what works and what does not before really saying what we need or what we don't need. If we have the opportunity to design better APIs and a better way of doing things, we should, and we would be wise to do so.

@znmeb
Copy link

znmeb commented Dec 11, 2015

@kgryte "A last comment is that, while python numeric computing tools are good, we should be leery about upholding it as the gold standard. We stand to learn a lot from the many other computing and programming environments out there: R, Python, Maple, Mathematica, MATLAB, Golang, Julia, and many others."

  1. Python and Golang (and C, Ruby [sorta kinda], Perl, Java and other JVM languages) are general-purpose languages with solid numerical programming libraries and applications.
  2. Maple and Mathematica are mathematical programming languages. They are designed to do math both numerically and symbolically. They're essentially descendants of Lisp / Maxima with better numeric capabilities.
  3. That leaves R, MATLAB and its open source descendants Octave, Scilab and FreeMat, Julia and FORTRAN as languages dedicated to numerical/scientific/statistical computing. Julia's very new and they're just building up a library, but really, if the task is to learn numerical computing from other environments, we need to look to numerical languages as the gold standard.

In particular I'd focus on Matlab because its syntax is array-based (Matrix Laboratory) and its run-time is highly optimized for number crunching. You can multiply two dense matrices in RAM at full CPU speed with Matlab with a single expression. Many of the number crunching goodies in Python were built by Python programmers who didn't want to be locked in to a closed-source package with a base price of something like $1000US per seat.

And if you want to go down the symbolic math road as well I'd look at Mathematica or Maxima. There's SymPy if you're more familiar with Python, but Mathematica and Maxima are full-strength symbol crunchers.

@mateogianolio
Copy link

First of all I agree with @kgryte that operator overloading would have been the best way. However that is not the case, which is why we are forced to improvise.

I had this thought in my mind that something like this would work:

// Matrix is just a wrapper for TypedArray with sugar (e.g. shape, stride)
let a = new Matrix([[1, 2], [3, 4]]),
    b = new Matrix([[1, 2], [3, 4]]);

a.add(b); // implies in-place addition, i.e. returns b added to a

Matrix.add(a, b); // implies copy addition, i.e. returns new instance

But I'm not so sure anymore and I posted it here to hear what you guys have to say. You can see the current Vectorious PR to follow progress.

On a brighter note, nBLAS should be fully functional now and I'm in the process of creating benchmarks to compare its performance to other libraries.

Here's a plot I threw together comparing double precision (Float64Array) vector addition, dot product and sum (1-norm) on my retina MBP:
nblas2

Furthermore, I don't understand how TypedArray.slice can be so slow. I compared three different methods for copying double precision vectors:

  1. cblas_dcopy5,667,520 ops/sec,
  2. regular for-loop941,099 ops/sec (create new typed array of the sum of the two lengths and fill with for-loop,
  3. TypedArray.slice140,371 ops/sec.

My opinion is that bindings to BLAS and LAPACK is the way to go, they are industry standard and incredibly high performance when optimized for your architecture.

@kgryte
Copy link

kgryte commented Dec 11, 2015

@mateogianolio Thanks for your comment. The bulk of it is probably more relevant to issue/6.

Re: TypedArray.slice. This is not surprising given that this a) is not widely supported and b) has undoubtedly not been optimized. Even Array.slice() has varying performance depending on the array size, etc.

Re: benchmarks. I am impressed that vanilla JS is only 4-5x slower. I would be curious to know how this varies with size. My hypothesis is that, for small data, the cost of delegating responsibility to BLAS or LAPACK outweighs benefits.

Re: bindings. Sure, bindings are good if they are available. The reality, however, is that JS does not just run on PCs having already installed libs. In fact, with recent IoT efforts by Samsung, Intel, IBM, etc, we should expect an ever larger number of devices running Node but with many being relatively primitive (ala raspberry pi or tessel), in which case we cannot presume bindings will be possible, or even desired. Further, as mentioned in the issue, how relevant bindings are for running in browser is debatable.

I believe that you also provide JS support where bindings are not possible. In which case, we are back to the central issue: what language features are needed to facilitate, in this case, ndarrays (and similar data structures) and their use in JavaScript.

@rreusser
Copy link

@mikolalysenko, I took a cheap shot at scijs and I feel pretty bad. One last time, sorry. The mediocre code I've put out there has been driving me crazy for a while. I feel like it's maybe actually a disservice to the community and I don't quite know how to make it better. I feel like that's due to laziness on my part and pluggable, type-agnostic tendencies of the npm community that don't quite compensate for JS's challenges listed above, but that's all I'll say about that.

To me, one feature of JS that makes it incredibly effective for scientific computing is distribution, i.e. browsers. nBlas looks great! I'm curious though, is it remotely feasible to compile BLAS/LAPACK for the browser? In an ideal world, I think we'd have native bindings for desktop/server code that are wonderfully compatible with cross-compiled code in the browser. I see hints of efforts of interfacing with BLAS here and there. @Planeshifter, do you have thoughts? Or is this a valid fortran llvm front-end that could be plugged into emscripten: flang?

See also: http://comments.gmane.org/gmane.comp.compilers.llvm.devel/65872

I'd like to publicly congratulate Alex for his excellent work this summer! He worked diligently
throughout the entire period, and he quite-successfully tackled an ambitious project. As a result, we
now have a Fortran frontend for LLVM capable of compiling real packages (BLAS, LAPACK, etc.), and correctly executing the test suites for those packages. Thanks to Alex, and to Google, we now have a solid foundation from which to build a community-developed LLVM Fortran frontend.

And just to clarify, does BLAS/LAPACK encompass all of the strides and offsets necessary to work with general ndarrays? See: dgemm

@rreusser
Copy link

On Flang github page:

Longer term:

  • Fortran90/95 support

Hmm… What am I missing? Aren't BLAS/LAPACK f90?

Edit: maybe they're referring to the f77 reference implementation?

@mhart
Copy link

mhart commented Dec 11, 2015

glMatrix has recently had some contributions from Mozilla that bring some SIMD performance improvements that are worth checking out.

@tab58
Copy link

tab58 commented Mar 15, 2016

Addressing one of @kgryte 's points, one of the potential problems with operator overloading is confusion of the underlying operations. Javascript already has enough issues with the "+" operator implicitly doing both addition and concatenation depending on the arguments, and then doing a conversion to a float if placed on the left side of a string. Although operator overloading may improve the visual appearance of code, it can serve to be a pain point later on. Even for a program designed for numerical computing, MATLAB has separate operators to denote things like matrix operations and element-wise operations.

Although the case could be argued that Javascript could also benefit from this, I don't think there's enough call to update the language for it. Although functions aren't pretty, I think it does enforce a certain type of operational clarity. Toji's glMatrix and THREE.js linear algebra functions are good examples of this. When a function is called, I know exactly what will happen inside it and I don't have to worry that there's some sort of implicit "magic" happening behind the scenes. One example of some "magic" that's messed me up in numerical JS is the implicit conversion to a 32-bit number when doing bitwise operations. IMO, less guesswork means better numerical code.

I think the nBLAS module is a great step forward to integrating the LAPACK and BLAS functionality into Javascript. Since these programs are de facto standards in the numerical computing community, I think leveraging that standard is important if we want to collaborate on an effort. If the Node.js and io.js communities could come together and agree, this community certainly can too. To me, it doesn't seem that there is too much difference in the way dstructs/ndarray and scijs/ndarray are structured from this format (I think the code review process is a separate issue). There does seem to be a lot more impetus behind scijs/ndarray, so that seems like a more natural starting point for a convergence.

Feel free to totally disagree with me, though.

@mateogianolio
Copy link

I just removed the nan dependency from nBLAS, so it's now completely dependency-free.

@rreusser
Copy link

@tab58 Agreed. My code/API design is not great. I've learned a lot since then, but struggle to find time to contribute. I go back and forth thinking on one hand that it needs a major cleanup sweep, peer review, etc and on the other that it's kind of a lost cause due to a poor start and general lack of interest. I'm not sure I'm capable of rescuing the situation, but if there were more convergence/enthusiasm on one front or another, I'd still love to do what I can to push things forward. To me, it's a numbers game: getting as many people as we can on the same page, organizing the thought, and dividing up the work. In other words, I think the technical problems are what they are, and they're a pain in the butt, and I wish some things were different, but I think it's still reasonable to hope to converge and do what we can.

Just a quick survey for anyone, either your own thoughts or thoughts of which you're aware:

  • How many people are actually interested in this?
  • Are there efforts that are pushing things forward and gaining any traction, perhaps in compile-to-js languages or some other approach? Thoughts on how that might be accomplished?
  • Thoughts on what sort of end-goal you'd like to achieve?

@mikolalysenko
Copy link

@mateogianolio why remove the nan dependency out of curiosity? Isn't the whole point of nan to abstract out all the volatile parts of the node/v8 API so your plugin can maintain compatibility? Assuming that responsibility yourself (by removing the dependency) seems like it is going to create a lot of extra work and set you up for future problems when those systems inevitably change.

@mateogianolio
Copy link

Most of the code wasn't nan-specific and the build time is at least half of what it was before. It only took about an hour to switch and I like to keep libraries as dependency-free as possible. At the moment I'd rather keep up with the changes than include another layer of abstraction, but if I change my mind in the future I can easily just switch back to using nan.

Nan itself has changed its API before so I really don't see the difference as I'd have to keep up with updates anyway.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants