Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perhaps we can retire linalg_ext.reverse op? #16060

Closed
hanhanW opened this issue Jan 5, 2024 · 16 comments · Fixed by #17866
Closed

Perhaps we can retire linalg_ext.reverse op? #16060

hanhanW opened this issue Jan 5, 2024 · 16 comments · Fixed by #17866
Assignees
Labels
good first issue 🌱 Good for newcomers integrations/stablehlo StableHLO (JAX/TensorFlow/etc) import and conversion

Comments

@hanhanW
Copy link
Contributor

hanhanW commented Jan 5, 2024

I found that in tosa.reverse lowering, we are not using linalg_ext ops. It is using linalg.index and tensor.extract ops to gather the element. We can use the same pattern in stablehlo -> iree input conversion. If we do so, linalg_ext.reverse op won't have any users; we can retire the op.

@harrisonGPU
Copy link

Hello, @hanhanW
I really want to learn about this issue, so I am trying to fix it. Here is my code, but I'm not sure how to verify if it is correct. Could you please give me some suggestions? I am eager to implement them and resolve this #16069

@MaheshRavishankar
Copy link
Contributor

Hello, @hanhanW , I really want to learn about this issue, so I am trying to fix it. Here is my code, but I'm not sure how to verify if it is correct. Could you please give me some suggestions? I am eager to implement them and resolve this #16069

@TSWorld1314 sorry for the confusion. @hanhanW what is the issue with using reverse? Is it that it doesnt get fused with things? Thats valid. Can then verify that using tensor.extract for reverse operation still gets the operation to be vectorized. If so, then we can maybe drop reverse operation.

@hanhanW
Copy link
Contributor Author

hanhanW commented Jan 11, 2024

I mainly want to start a discussion about whether we want to use linalg_ext.reverse or not. I observed that the torch frontend and stablehlo frontend have different lowering behavior; I think we can revisit the needs of linalg_ext.reverse. I can see that the op stops fusion and vectorization, which might can be improved. I think we can vectorize tensor.extract and use it in IREE, but we will need someone to verify. (I can do it, but I don't find cycles. Because this is a fairly low priority at this moment) If it works better, we can consider to drop the linalg_ext.reverse op. It will allow us to maintain less LinalgExt ops.

@MaheshRavishankar
Copy link
Contributor

I mainly want to start a discussion about whether we want to use linalg_ext.reverse or not. I observed that the torch frontend and stablehlo frontend have different lowering behavior; I think we can revisit the needs of linalg_ext.reverse. I can see that the op stops fusion and vectorization, which might can be improved. I think we can vectorize tensor.extract and use it in IREE, but we will need someone to verify. (I can do it, but I don't find cycles. Because this is a fairly low priority at this moment) If it works better, we can consider to drop the linalg_ext.reverse op. It will allow us to maintain less LinalgExt ops.

@TSWorld1314 maybe you can help look into it

@harrisonGPU
Copy link

I mainly want to start a discussion about whether we want to use linalg_ext.reverse or not. I observed that the torch frontend and stablehlo frontend have different lowering behavior; I think we can revisit the needs of linalg_ext.reverse. I can see that the op stops fusion and vectorization, which might can be improved. I think we can vectorize tensor.extract and use it in IREE, but we will need someone to verify. (I can do it, but I don't find cycles. Because this is a fairly low priority at this moment) If it works better, we can consider to drop the linalg_ext.reverse op. It will allow us to maintain less LinalgExt ops.

@TSWorld1314 maybe you can help look into it

@MaheshRavishankar @hanhanW ,hello,I'm truly excited about tackling this verification task, but I have a few questions. Could you provide me with an example of a stablehlo frontend? Writing stablehlo isn't my strong suit yet, but I'm eager to learn. Also, I'm unsure if my code is correct; it might be necessary to compare the generated IR for verification. How can I use iree tools to generate stablehlo IR? I'm thrilled to have this opportunity to work on this task and help resolve this issue. Any advice on learning IREE would be greatly appreciated!

@hanhanW
Copy link
Contributor Author

hanhanW commented Jan 11, 2024

@TSWorld1314 there are good introductions about how to contribute to IREE and testing. If you want to learn more about IREE before jumping into the task, please take a look at the https://iree.dev/developers/

Thanks

@harrisonGPU
Copy link

https://iree.dev/developers/

@hanhanW I am really thankful for your reply. I will learn it quickly and then try to write an example for verification. I am really grateful to you!

@hanhanW
Copy link
Contributor Author

hanhanW commented Jan 11, 2024

If you are looking for an example input IR, here is one: https://github.com/openxla/iree/blob/e2e126ce061454ad71bc2f5c1c08b1efc982f4cd/compiler/plugins/input/StableHLO/stablehlo-iree/Conversion/test/stablehlo_to_linalg_ext.mlir#L489-L502

But it looks like you are not familiar with IREE and testing, so I suggest to study https://iree.dev first.

@harrisonGPU
Copy link

If you are looking for an example input IR, here is one:

https://github.com/openxla/iree/blob/e2e126ce061454ad71bc2f5c1c08b1efc982f4cd/compiler/plugins/input/StableHLO/stablehlo-iree/Conversion/test/stablehlo_to_linalg_ext.mlir#L489-L502

But it looks like you are not familiar with IREE and testing, so I suggest to study https://iree.dev first.

I'm truly grateful for your help. I'll start learning it as soon as possible! Thanks a lot, hanhanW

@MaheshRavishankar
Copy link
Contributor

If we can confirm that this operation is vectorized on the CPU backend then we can drop this operation. One way to confirm is to compile the above example with

iree-compile --iree-hal-target-backends=llvm-cpu -o output.vmfb --mlir-print-ir-after-all --mlir-disable-threading test.mlir 2> dump.mlir

The --mlir-print-ir-after-all prints the IR after each pass (and above it is redirected to dump.mlir). There if you look at the IR After LLVMCPULowerExecutableTargetPass (or you can post it here) and if that has vector. instructions, it means the backend can vectorize with using tensor.extract. Then we can drop the operation.

Actually as I write this linalg_ext.reverse is not vectorized anyway. So maybe we can drop the operation even if it doesnt vectorize, but good to check first.

harrisonGPU added a commit to harrisonGPU/iree that referenced this issue Jan 22, 2024
@harrisonGPU
Copy link

harrisonGPU commented Jan 22, 2024

hello, @hanhanW @MaheshRavishankar
I'm very sorry for not completing the task on time, but I have really dedicated a lot of time to learning IREE. This commit has some issues. When I use the following commands:

./iree-opt --split-input-file --iree-stablehlo-to-linalg-ext /home/Projects/IREE/iree/input/input.mlir -o /home/Projects/IREE/iree/input/output1.mlir

it results in an error log:

root@1ef1351cbf8e:/home/Projects/IREE/iree-build/tools# ./iree-opt --split-input-file --iree-stablehlo-to-linalg-ext $INPUT/input.mlir -o $INPUT/output.mlir
iree-opt: iree/third_party/llvm-project/llvm/include/llvm/Support/Casting.h:566: decltype(auto) llvm::cast(const From &) [To = mlir::TensorType, From = mlir::Type]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
Please report issues to https://github.com/openxla/iree/issues and include the crash backtrace.
Stack dump:
0.	Program arguments: ./iree-opt --split-input-file --iree-stablehlo-to-linalg-ext /home/Projects/IREE/iree/input/input.mlir -o /home/Projects/IREE/iree/input/output1.mli
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  iree-opt           0x000055765cf7b0eb __interceptor_backtrace + 91
1  libIREECompiler.so 0x00007f412d632458 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 712
2  libIREECompiler.so 0x00007f412d63383b
3  libIREECompiler.so 0x00007f412d62be81 llvm::sys::RunSignalHandlers() + 593
4  libIREECompiler.so 0x00007f412d635177
5  libc.so.6          0x00007f410fe16520
6  libc.so.6          0x00007f410fe6a9fc pthread_kill + 300
7  libc.so.6          0x00007f410fe16476 raise + 22
8  libc.so.6          0x00007f410fdfc7f3 abort + 211
9  libc.so.6          0x00007f410fdfc71b
10 libc.so.6          0x00007f410fe0de96
11 libIREECompiler.so 0x00007f412fd6760c
12 libIREECompiler.so 0x00007f414ff855ab mlir::tensor::ExtractOp::inferReturnTypes(mlir::MLIRContext*, std::optional<mlir::Location>, mlir::ValueRange, mlir::DictionaryAttr, mlir::OpaqueProperties, mlir::RegionRange, llvm::SmallVectorImpl<mlir::Type>&) + 1131
13 libIREECompiler.so 0x00007f414ff84f49 mlir::tensor::ExtractOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Value, mlir::ValueRange) + 1929
14 libIREECompiler.so 0x00007f41302c754b
15 libIREECompiler.so 0x00007f413063bb6d
16 libIREECompiler.so 0x00007f413063acc8
17 libIREECompiler.so 0x00007f4131ca375f
18 libIREECompiler.so 0x00007f414dc0febd
19 libIREECompiler.so 0x00007f414dc0e69e mlir::linalg::GenericOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::TypeRange, mlir::ValueRange, mlir::ValueRange, mlir::ArrayAttr, mlir::ArrayAttr, mlir::StringAttr, mlir::StringAttr, llvm::function_ref<void (mlir::OpBuilder&, mlir::Location, mlir::ValueRange)>, llvm::ArrayRef<mlir::NamedAttribute>) + 3134
20 libIREECompiler.so 0x00007f414dc10e0b mlir::linalg::GenericOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::TypeRange, mlir::ValueRange, mlir::ValueRange, llvm::ArrayRef<mlir::AffineMap>, llvm::ArrayRef<mlir::utils::IteratorType>, llvm::StringRef, llvm::StringRef, llvm::function_ref<void (mlir::OpBuilder&, mlir::Location, mlir::ValueRange)>, llvm::ArrayRef<mlir::NamedAttribute>) + 3595
21 libIREECompiler.so 0x00007f414dc13791 mlir::linalg::GenericOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::TypeRange, mlir::ValueRange, mlir::ValueRange, llvm::ArrayRef<mlir::AffineMap>, llvm::ArrayRef<mlir::utils::IteratorType>, llvm::function_ref<void (mlir::OpBuilder&, mlir::Location, mlir::ValueRange)>, llvm::ArrayRef<mlir::NamedAttribute>) + 2737
22 libIREECompiler.so 0x00007f413063a726
23 libIREECompiler.so 0x00007f4130639994
24 libIREECompiler.so 0x00007f4130296705
25 libIREECompiler.so 0x00007f414d1bc774 mlir::ConversionPattern::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&) const + 2228
26 libIREECompiler.so 0x00007f414d361712
27 libIREECompiler.so 0x00007f414d360d95
28 libIREECompiler.so 0x00007f412d3341a8
29 libIREECompiler.so 0x00007f414d364efe
30 libIREECompiler.so 0x00007f414d35b26d mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<mlir::LogicalResult (mlir::Pattern const&)>) + 6669
31 libIREECompiler.so 0x00007f414d1e2d86
32 libIREECompiler.so 0x00007f414d1e07f1
33 libIREECompiler.so 0x00007f414d1dd334
34 libIREECompiler.so 0x00007f414d1c69d1
35 libIREECompiler.so 0x00007f414d1c60af mlir::applyPartialConversion(llvm::ArrayRef<mlir::Operation*>, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, llvm::DenseSet<mlir::Operation*, llvm::DenseMapInfo<mlir::Operation*, void>>*) + 863
36 libIREECompiler.so 0x00007f414d1c7196 mlir::applyPartialConversion(mlir::Operation*, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, llvm::DenseSet<mlir::Operation*, llvm::DenseMapInfo<mlir::Operation*, void>>*) + 486
37 libIREECompiler.so 0x00007f413061749f
38 libIREECompiler.so 0x00007f412e45d901
39 libIREECompiler.so 0x00007f412e45d755
40 libIREECompiler.so 0x00007f412d3341a8
41 libIREECompiler.so 0x00007f412e4686ce
42 libIREECompiler.so 0x00007f412e448be5 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) + 5701
43 libIREECompiler.so 0x00007f412e44a52b mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) + 1931
44 libIREECompiler.so 0x00007f412e46221e
45 libIREECompiler.so 0x00007f412e460e59
46 libIREECompiler.so 0x00007f412e44f61a
47 libIREECompiler.so 0x00007f412e44d53d mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool) + 4877
48 libIREECompiler.so 0x00007f412e44c207 mlir::detail::OpToOpPassAdaptor::runOnOperation(bool) + 71
49 libIREECompiler.so 0x00007f412e45d85d
50 libIREECompiler.so 0x00007f412e45d755
51 libIREECompiler.so 0x00007f412d3341a8
52 libIREECompiler.so 0x00007f412e4686ce
53 libIREECompiler.so 0x00007f412e448be5 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) + 5701
54 libIREECompiler.so 0x00007f412e44a52b mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) + 1931
55 libIREECompiler.so 0x00007f412e451666 mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) + 598
56 libIREECompiler.so 0x00007f412e4511d7 mlir::PassManager::run(mlir::Operation*) + 4503
57 libIREECompiler.so 0x00007f412e3f0f5f
58 libIREECompiler.so 0x00007f412e3efea4
59 libIREECompiler.so 0x00007f412e3ef6f8
60 libIREECompiler.so 0x00007f412e3ef433
61 libIREECompiler.so 0x00007f412e43168f
62 libIREECompiler.so 0x00007f412e431175
63 libIREECompiler.so 0x00007f412e430627
64 libIREECompiler.so 0x00007f412e43031b
65 libIREECompiler.so 0x00007f412e42fd1b mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, bool, bool) + 5915
66 libIREECompiler.so 0x00007f412e3e5a15 mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) + 1253
67 libIREECompiler.so 0x00007f412d20bcd3
68 libIREECompiler.so 0x00007f412d20a6fe ireeOptRunMain + 494
69 iree-opt           0x000055765cfff862
70 libc.so.6          0x00007f410fdfdd90
71 libc.so.6          0x00007f410fdfde40 __libc_start_main + 128
72 iree-opt           0x000055765cf3f5b5
Aborted (core dumped)

My input mlir is:

func.func @reverse_dim1(%arg0: tensor<3x5xi32>) -> tensor<3x5xi32> {
  %0 = "stablehlo.reverse"(%arg0) {
    dimensions = array<i64: 1>
  } : (tensor<3x5xi32>) -> tensor<3x5xi32>
  return %0 : tensor<3x5xi32>
}

And I hope replace IREE::LinalgExt::ReverseOp.
I've been putting my utmost effort into resolving and debugging this issue, yet I haven't been able to identify its underlying cause. Could you offer some guidance? I'm eager to learn and apply your suggestions. Any assistance you could provide would be greatly appreciated!

@hanhanW
Copy link
Contributor Author

hanhanW commented Jan 22, 2024

I think you can compare your implementation with https://github.com/llvm/llvm-project/blob/a9c5bddc8f18926bac6dc224144a32512207bd38/mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp#L1736-L1792

12 libIREECompiler.so 0x00007f414ff855ab mlir::tensor::ExtractOp::inferReturnTypes(mlir::MLIRContext*, std::optional<mlir::Location>, mlir::ValueRange, mlir::DictionaryAttr, mlir::OpaqueProperties, mlir::RegionRange, llvm::SmallVectorImpl<mlir::Type>&) + 1131
13 libIREECompiler.so 0x00007f414ff84f49 mlir::tensor::ExtractOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Value, mlir::ValueRange) + 1929

The above log indicates that it crashes in op builder. So here is a bug in line 475:

Value extractedElement = nestedBuilder.create<tensor::ExtractOp>(nestedLoc, args[0], indices);

args are scalars, not input. You can not extract a scalar with indices from scalar types. The extract op should take a tensor input (which is input in the other implementation. It probably should be adaptor.getOperands()[0] in your implementation.

harrisonGPU added a commit to harrisonGPU/iree that referenced this issue Jan 23, 2024
harrisonGPU added a commit to harrisonGPU/iree that referenced this issue Jan 23, 2024
@harrisonGPU
Copy link

hello @hanhanW
I am really thankful for your help. I have already implemented this task with these two commits: commit1 and commit2.
When I use the following commands:

./iree-opt --split-input-file --iree-stablehlo-to-linalg-ext /home/Projects/IREE/iree/input/input.mlir -o /home/Projects/IREE/iree/input/output1.mlir

And input mlir is:

func.func @reverse_dim1(%arg0: tensor<3x5xi32>) -> tensor<3x5xi32> {
  %0 = "stablehlo.reverse"(%arg0) {
    dimensions = array<i64: 1>
  } : (tensor<3x5xi32>) -> tensor<3x5xi32>
  return %0 : tensor<3x5xi32>
}

this is output:

#map = affine_map<(d0, d1) -> (d0, d1)>
module {
  func.func @reverse_dim1(%arg0: tensor<3x5xi32>) -> tensor<3x5xi32> {
    %0 = tensor.empty() : tensor<3x5xi32>
    %c0 = arith.constant 0 : index
    %c3 = arith.constant 3 : index
    %c5 = arith.constant 5 : index
    %c1 = arith.constant 1 : index
    %1 = linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel", "parallel"]} ins(%arg0 : tensor<3x5xi32>) outs(%0 : tensor<3x5xi32>) {
    ^bb0(%in: i32, %out: i32):
      %2 = linalg.index 0 : index
      %3 = linalg.index 1 : index
      %c5_0 = arith.constant 5 : index
      %c1_1 = arith.constant 1 : index
      %4 = arith.subi %c5_0, %c1_1 : index
      %5 = arith.subi %4, %3 : index
      %extracted = tensor.extract %arg0[%2, %5] : tensor<3x5xi32>
      linalg.yield %extracted : i32
    } -> tensor<3x5xi32>
    return %0 : tensor<3x5xi32>
  }
}

Additionally, how can I demonstrate improvements over the previous version with some examples? I'm truly grateful for your assistance!

@hanhanW
Copy link
Contributor Author

hanhanW commented May 15, 2024

@TSWorld1314 there are no progress for months, so I assume that you don't work on it anymore.

@hanhanW
Copy link
Contributor Author

hanhanW commented May 15, 2024

@lialan I think this is a good starter task. One of expectation is to make sure that the op is vectorized. You can verify it with --mlir-print-ir-after-all --mlir-disable-threading and check the IR after GenericVectorization pass.

You can use https://github.com/iree-org/iree/blob/main/tests/e2e/stablehlo_ops/reverse.mlir as the input for iree-compile.

The command would be something like iree-compile --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-target-cpu-features=host ~/a.mlir.

@harrisonGPU
Copy link

@TSWorld1314 there are no progress for months, so I assume that you don't work on it anymore.

Okey, thanks hanhanW.

@lialan lialan linked a pull request Jul 12, 2024 that will close this issue
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this issue Jul 30, 2024
`LinalgExt::ReverseOp` is only lowered from `stablehlo::ReverseOp`. We
can expand `stablehlo::ReverseOp` to a different pattern and retire
`LinalgExt::ReverseOp`.

Fixes iree-org#16060

---------

Signed-off-by: Alan Li <me@alanli.org>
Signed-off-by: Lubo Litchev <lubol@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue 🌱 Good for newcomers integrations/stablehlo StableHLO (JAX/TensorFlow/etc) import and conversion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants