Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-ordering matrices in Ginkgo #346

Closed
pratikvn opened this issue Sep 10, 2019 · 8 comments
Closed

Re-ordering matrices in Ginkgo #346

pratikvn opened this issue Sep 10, 2019 · 8 comments
Assignees
Labels
is:help-wanted Need ideas on how to solve this. is:idea Just a thought - if it's good, it could evolve into a proposal. is:new-feature A request or implementation of a feature that does not exist yet. is:proposal Maybe we should do something this way.
Milestone

Comments

@pratikvn
Copy link
Member

Reordering sparse matrices in Ginkgo.

Re-ordering a sparse matrix is quite an important algorithm to have. It can improve performance drastically particularly when some factorization routines are involved as in ParILU by reducing fill-in of the factors.

Some of the available re-ordering algorithms are:

  1. METIS: Fill-in reducing re-ordering: This algorithm was designed to specifically reduce fill-in when factorized as the name suggests and is available within the METIS library. It uses the multi-level nested dissection paradigm to re-order the sparse matrix.

  2. Approximate minimum degree reordering (AMD). An example of the implementation is in Suitesparse

  3. Reverse-Cuthill Mckhee reordering (RCM). This reordering is generally used to reduce the bandwidth of a sparse matrix. RCM at Wikipedia.

Currently, I see the interface with something of this fashion (With the METIS fill reduce as an example):

template< ... >
class MetisFillReduce :public EnablePolymorphicObject<MetisFillReduce<>, Reordering>
public:
// The adjacency matrix to be given to the reordering algorithm.
gko::LinOp get_adjacency_matrix() const { ... }

// The permutation computed by the reordering algorithm.
gko::Array<IndexType> get_permutation() const { ... }

// The inverse permutation computed by the reordering algorithm.
gko::Array<IndexType> get_inverse_permutation() const { ... }

GKO_CREATE_FACTORY_PARAMETERS()
{
// The vertex weights (for more advanced permutations) as a parameter.
Array<IndexType> GKO_FACTORY_PARAMETER(vertex_weights, nullptr);
}

protected:

// The operation that permutes the input matrix
void permute(LinOp* to_permute) const;

// The operation that inverse permutes the input matrix
void inverse_permute(LinOp* to_permute) const;

// To generate the permutation when passed in the matrix from the factory.
void generate() const;

Things to discuss:

  1. The factory can take in a normal matrix(any format) and internally convert it into an adjacency matrix and generate a permutation which would be a vector(gko::Array) which would contain the required permutation mappings for each row.

    1. We could also make this permutation vector a LinOp, which would apply to the different matrix formats and output the matrices with their permuted variants.
      • Pros: Much more cheaper than the alternative.
      • Cons: Mathematically less elegant.
    2. We could create a permutation matrix(sparse matrix) from this vector and this could be again applied to permute the matrix with matrix-matrix multiplication. For example, if A is the matrix that needs to be permuted, PA will permute the rows of the matrix and AP will permute the columns.
      • Pros: Mathematically elegant and extensible and representable as a linear operator
      • Cons: Expensive, needs an explicit sparse matrix- sparse matrix multiplication.

    Note: I guess if the first option(permutation vector) when abstracted as a LinOp would be equal to that of the permutation matrix and would not lose any elegance, so probably the way to go is storing the permutation as a vector(gko::Array).

  2. Similar to a Preconditionable interface we have for Preconditioners when used within solvers, it could also make sense to add a Reorderable interface to reorder the matrices as a preprocessor step.

Infact, it is also possible to see the Reordering as a preconditioner and use it as such, but the demarcation of the concept could be lost here.

  1. In general, when you re-order a sparse matrix involved with a solver, you would re-order the right hand side as well. The way this could be implemented efficiently also needs to be discussed.

  2. Another aspect that probably needs to be considered is if there could be some reason one would want to stack the reorderings. In this case we would need to think about some kind of a composition of re-orderings.

A simple example of the metis fill reduce implementation can be found on the metis-reorder branch. It is a work in progress.

@pratikvn pratikvn added is:help-wanted Need ideas on how to solve this. is:new-feature A request or implementation of a feature that does not exist yet. is:proposal Maybe we should do something this way. is:idea Just a thought - if it's good, it could evolve into a proposal. labels Sep 10, 2019
@yhmtsai
Copy link
Member

yhmtsai commented Sep 10, 2019

I prefer to use gko::Array. If someone really needs the real permutation matrix, conversion from array to Coo is easier than the other direction.
I am not sure that use permutation in preconditioner can help the solver.
AFAIK, use permutation to reduce the fill-in during the factorization of sparse matrix

@pratikvn
Copy link
Member Author

I think permutation can help in the solver, for example the RCM re-ordering should reduce the bandwidth of a sparse matrix and this could be beneficial for generation of good preconditioners for example in the case of ParILU or in the case of the block detection in the case of the Block Jacobi preconditioner. Of course, the benefit of using re-ordering compared to the cost of it is something that is not yet clear.

@thoasm
Copy link
Member

thoasm commented Sep 11, 2019

I prefer to not expose any gko::Array to the user in any shape or form. If you want to give access to the values of the permutation vector, I would just return a plain pointer, so other kernels could access it:

// The permutation computed by the reordering algorithm.
IndexType *get_permutation() const { ... }

// The inverse permutation computed by the reordering algorithm.
IndexType *get_inverse_permutation() const { ... }

The way you wrote it, you would need to create a new Array and copy the data at every access (I guess you meant a pointer to an Array, but I also think that that would be bad).

Currently, everything that a LinOp (Solver, preconditioner, matrix, ...) returns to the user is another LinOp, and breaking that pattern could lead to confusion, which is why I would prefer a new matrix LinOp (maybe matrix::permutation) that would model this (and store the actual permutation vector as a gko::Array).
An example on using it would be:

// A is the matrix that you want to permutate
// P is the permutation LinOp, storing the permutation information
auto B = matrix::Csr<>::create(exec);
P->apply(A, B);
// Now, B stores `P A`, which is the permutated matrix A

That would also work well with your proposition of a reordering interface (2.) similar to the preconditioner one (all preconditioners are also LinOps). However, there we should not use the apply by itself since it is very slow compared to just using the index array.
Maybe, it is a good idea to to specify a non-LinOp object, or just throw like we usually do if the input is not what we expect.

@pratikvn
Copy link
Member Author

The other option that is also possible and is also probably easier to implement is using a Permutable interface similar to a Transposable interface with a LinOp, which mathematically makes sense to me.

You can check out how it looks in this branch implemented for Dense- permute-interface.

This currently takes in an Array, but I guess it should be possible to have it as a LinOp which contains the permutation array as you suggest.

@thoasm
Copy link
Member

thoasm commented Sep 11, 2019

I would prefer a version that uses CRTP, so you would return the same matrix type it had previously. That way, you could directly use all the functionality again without the need for a static_cast<actual_type>. Also, I am still not a big fan of using the gko::Array, so here is a more C-like approach

template <typename ResultType, typename IndexType>
class Permutable {
public:
    virtual ~Permutable() = default;

    virtual std::unique_ptr<ResultType> row_permute(
        const IndexType *permutation_indices, size_type size) const = 0;

    virtual std::unique_ptr<ResultType> column_permute(
        const IndexType *permutation_indices, size_type size) const = 0;
};

@klausbu
Copy link

klausbu commented Sep 3, 2020

I am not an expert on bandwidth reduction but the GPS, Gibbs, Poole and Stockmeyer algorithm might be a faster option:

"Initially the most widely used of the heuristics was the Reverse Cuthill-McKee algorithm (RCM), a modification by Alan George of the original algorithm developed by CutHill and McKee in 1969. In 1976 the GPS algorithm (named after its developers Gibbs, Poole and Stockmeyer) was proposed. On average GPS and RCM find solutions of relative equal quality, but GPS is up to ten times as fast."

Source: http://ciprian-zavoianu.blogspot.com/2009/01/project-bandwidth-reduction.html

@lksriemer lksriemer mentioned this issue Sep 5, 2020
4 tasks
@lksriemer
Copy link
Contributor

@klausbu Just a quick note concerning the GPS algorithm:

  1. It also uses rooted level structures (even two, at least!), and generating one of these takes typically at least 1/3 of the runtime of the reordering part of RCM, though usually more like 1/2. Depending on the strategy for finding a starting node, which is relatively unrelated of the RCM/GPS choice, several of these rooted level structures have to be generated. Usually, computing these dominates rcm, and I would expect it to be the same for gps.
  2. The gps reordering's relevant parallel part should be similar to that of rcm, again one needs to wait for a level to be partially renumbered, before the next level can be started. There are some intricacies there though.

Based on these two observations I don't expect GPS to significantly outperform RCM for parallel implementations, on large matrices. On the other hand, a spectral pseudoperipheral node finder might yield significant speedups, that's future-talk though.

@upsj upsj added this to the Ginkgo 1.5.0 milestone May 7, 2021
@upsj
Copy link
Member

upsj commented Jul 13, 2022

Aiming to reduce duplicate discussions, I'll close this one, but we have it referenced in #940

@upsj upsj closed this as completed Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:help-wanted Need ideas on how to solve this. is:idea Just a thought - if it's good, it could evolve into a proposal. is:new-feature A request or implementation of a feature that does not exist yet. is:proposal Maybe we should do something this way.
Projects
None yet
Development

No branches or pull requests

8 participants