Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow callers to pass in output buffers #685

Open
tnibler opened this issue Sep 10, 2024 · 8 comments
Open

Allow callers to pass in output buffers #685

tnibler opened this issue Sep 10, 2024 · 8 comments

Comments

@tnibler
Copy link

tnibler commented Sep 10, 2024

I'm finding myself wanting to scale/blur/whatever images into an ndarray Array3. But imageproc functions all allocate their own output, which is suboptimal in many cases, for example when:

  • the user wants to get outputs into some multidimensional array
  • the same operation is performed repeatedly and output buffers could be reused instead of de- and reallocating in a loop
  • the user needs the output in some special buffer handed to them by the kernel or a device like a GPU

I'm not sure adding do_thing_into functions for every function in the library is the best way to do this and using inout parameters is cumbersome compared to the current default, so some more generic way to make operations (somewhat) agnostic to where they shove their output might be appropriate.

I'm willing to put some time into this, but given that this would break literally every API in the crate I wanted to ask for some opinions before maybe investigating how this could be done.

@ripytide
Copy link
Member

ripytide commented Sep 10, 2024

Have you seen the _mut() variants of many of the functions in the crate since they don't allocate as they take &mut Image argument? (Maybe not in the latest published version though since we've had a lot of changes since the last release)

As for processing on data from external sources I think you should be able to construct an ImageBuffer struct from any &[u8] though.

@tnibler
Copy link
Author

tnibler commented Sep 11, 2024

Those are in place though (at least the ones I found, like rotate and flip), which is different again so that makes three versions of every function: the current one returning a new image, _mut in-place and _into with the output buffer passed in. I'm not sure what the standard way of simulating overloaded functions is in Rust, but surely there is a nicer way than polluting the namespace and autocomplete with 3+ versions of every function, what do you think?

@ripytide
Copy link
Member

ripytide commented Sep 11, 2024

See #622 where I had similar thoughts.
There are a few tradeoffs involved in the current method but the new doc macro has helped somewhat in preventing duplicated doc-comments and since the non-mut variant usually calls the mut variant that also prevents code duplication.

@tnibler
Copy link
Author

tnibler commented Sep 11, 2024

What do you think about something like this?

You would only have to implement op_into (and in-place versions if possible) and get the rest generated automatically.

@ripytide
Copy link
Member

I think that could work, but some downsides I can think of might be:

  • What about methods like replace which takes two images, would that require another trait or just references into the op struct which could get messy due to the lifetime generics (would that clash with the trait declarations lack of generics?)?
  • Extra level of abstraction required in order to build the op struct in order to call the operation may be confusing or less ergonomic for one-off calls but might be more ergonomic for repeated calls, as opposed to making a closure for repeated regular function calls using the same args.
  • Docs might be worse since now rust-analyzer might show the traits doc comment rather than the op structs implementation of the op traits doc comment. Also docs.rs might be worse for this since it hides trait impl docs for structs way down the page of structs.

@ripytide
Copy link
Member

ripytide commented Sep 11, 2024

On the other hand standardizing image ops into traits does make it possible to do more advanced usage in generic scenarios such as taking any image ops as an input itself like higher kinded programming does with functions:

fn apply_op_twice<O>(op : O, image: Image) where O: ImageOp {
     Op.call(image);
     Op.call(image);
}

Maybe we could even do both methods and write a proc macro which would generate our current non-trait free-standing functions which internally then construct the op struct and then call the structs op impl. This way we get the best of both worlds and give the users the flexibility to choose which method they prefer. At the cost of increased beginner difficulty.

@tnibler
Copy link
Author

tnibler commented Sep 12, 2024

Okay, so since there is some interest I'll try and come up with some possible design ideas that can take into account some of the more complex cases and the points you mentioned.

Docs might be worse...

I think you'd just put the docs for how a filter works on the struct for it, so they appear at the very top and not in the trait docs.

but might be more ergonomic for repeated calls

More than ergonomics actually, since SomeFilter::new(...) can precompute stuff like kernels that can then be reused for mutiple operations afterwards. So depending on the application performance could be a tiny bit better (if a kernel is particularly expensive to compute and you're operating on tiny images I guess, it's probably negligible in most normal cases).
Or more realistically: an operation needs some other temporary buffer of known size, which can now be allocated once an then reused instead of reallocating a new one every iteration.

@tnibler
Copy link
Author

tnibler commented Sep 27, 2024

Ok, I have thought about some of these points and hope to flesh them out in a master's thesis, so I'll have some better insights after that hopefully.

The ndarray redesign (rust-ndarray/ndarray#1272) is also something to watch w.r.t how they handle shapes, types and internal buffer representations (alignment, padding, padding between rows to keep them aligned etc). Being able to finely control memory layouts, doing operations in place where possible, avoiding copying etc. while composing image processing pipelines would really improve upon what OpenCV has to offer here in terms of potential for optimizing locality, fusing kernels, offloading to accelerators and all that.

Curious what others here think, because I think basically reimplementing OpenCV in the age of projects like pytorch is a little bit of a wasted effort. Declarative approaches where operations are completely abstracted away from the actual representation in memory such that optimizations like loop blocking, kernel fusion can be applied over the whole pipeline are the future IMO. I definitely understand if this isn't meant to be a research project tho so excuse my ramblings in that case :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants