Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve shape estimator performance for vehicles #1019

Closed
3 tasks done
VRichardJP opened this issue Jun 1, 2022 · 13 comments · Fixed by #1292
Closed
3 tasks done

Improve shape estimator performance for vehicles #1019

VRichardJP opened this issue Jun 1, 2022 · 13 comments · Fixed by #1292
Assignees
Labels
component:perception Advanced sensor data processing and environment understanding. (auto-assigned) type:new-feature New functionalities or additions, feature requests.

Comments

@VRichardJP
Copy link
Contributor

VRichardJP commented Jun 1, 2022

Checklist

  • I've read the contribution guidelines.
  • I've searched other issues and no duplicate issues were found.
  • I've agreed with the maintainers that I can plan this task.

Description

As I have reported in some other discussion, I face important performance issue with my setup. I have identified a couple nodes that are unexpectedly slow on my machine. The shape_estimator is one of them.

I use the following test environment: my vehicle is parked and surrounded by a 3~4 other vehicles.
test_env

With this environment, the ShapeEstimationNode::callback() takes around 120ms to process each apollo/labeled_clusters message:
base_result

For just a couple vehicles, it seems way to slow.

The shape_estimator node implements the algorithm from paper "Efficient L-Shape Fitting for Vehicle Detection Using Laser Scanners". Interestingly, the "Algorithm 2" that is reproduced from the original paper is actually a very naive and inefficient minima finder:

// Paper : Algo.2 Search-Based Rectangle Fitting
std::vector<std::pair<float /*theta*/, float /*q*/>> Q;
constexpr float angle_resolution = M_PI / 180.0;
for (float theta = min_angle; theta <= max_angle + epsilon; theta += angle_resolution) {
Eigen::Vector2f e_1;
e_1 << std::cos(theta), std::sin(theta); // col.3, Algo.2
Eigen::Vector2f e_2;
e_2 << -std::sin(theta), std::cos(theta); // col.4, Algo.2
std::vector<float> C_1; // col.5, Algo.2
std::vector<float> C_2; // col.6, Algo.2
for (const auto & point : cluster) {
C_1.push_back(point.x * e_1.x() + point.y * e_1.y());
C_2.push_back(point.x * e_2.x() + point.y * e_2.y());
}
float q = calcClosenessCriterion(C_1, C_2); // col.7, Algo.2
Q.push_back(std::make_pair(theta, q)); // col.8, Algo.2
}
float theta_star{0.0}; // col.10, Algo.2
float max_q = 0.0;
for (size_t i = 0; i < Q.size(); ++i) {
if (max_q < Q.at(i).second || i == 0) {
max_q = Q.at(i).second;
theta_star = Q.at(i).first;
}
}

In order to find the maximum of the calcClosenessCriterion function, the algorithm actually tries all the possible values between min_angle and max_angle (with fixed step angle_precision). With the current implementation, the function is evaluated 90 times (from 0 to 90 degrees with 1 degree step). A way better approach would be to use an actual minima finder algorithm instead, especially if we consider how continuous the closeness function looks like (in red below):

Screenshot from 2022-06-01 11-26-33

Note: this would make autowarefoundation/autoware#384 obsolete

Purpose

Improve shape_estimator performance for vehicle obstacles

Possible approaches

As explained, a first step would be to use a minima finder function.

For example, I tested brent_find_minima from boost library. There are maybe better options, I am no expert, I just picked what looked the simplest to use with no extra dependencies.

I modified the Algo.2 section like so:

  // Paper : Algo.2 Search-Based Rectangle Fitting
  auto closeness_func = [&](float theta) {
    Eigen::Vector2f e_1;
    e_1 << std::cos(theta), std::sin(theta);  // col.3, Algo.2
    Eigen::Vector2f e_2;
    e_2 << -std::sin(theta), std::cos(theta);  // col.4, Algo.2
    std::vector<float> C_1;                    // col.5, Algo.2
    std::vector<float> C_2;                    // col.6, Algo.2
    for (const auto & point : cluster) {
      C_1.push_back(point.x * e_1.x() + point.y * e_1.y());
      C_2.push_back(point.x * e_2.x() + point.y * e_2.y());
    }
    float q = calcClosenessCriterion(C_1, C_2);
    return -q; // so that brent_find_minima actually finds the maximum 
  };

  int bits = 4; // max is std::numeric_limits<float>::digits/2 = 12
  boost::uintmax_t max_iter = 20;
  std::pair<float, float> min = boost::math::tools::brent_find_minima(closeness_func, min_angle, max_angle+epsilon, bits, max_iter);
  float theta_star = min.first;
  
  const float sin_theta_star = std::sin(theta_star);
  const float cos_theta_star = std::cos(theta_star);

With bits = 6, processing time went down to ~40ms:
brent_6_digit

And with bits = 4, processing time went down to ~30ms:
brent_4_bit

Even with bits = 4, I think the precision should be better than the current one (angle_resolution is 0.01745 rad), yet it is 4 times faster on my machine.
From the Rviz view, bounding boxes don't look any worse than with the default algorithm.

Definition of done

  • shape_estimator can perform in "real-time" in an environment with several vehicle obstacles around the ego vehicle.
@BonoloAWF BonoloAWF added priority:high High urgency and importance. component:perception Advanced sensor data processing and environment understanding. (auto-assigned) labels Jun 1, 2022
@yukkysaito
Copy link
Contributor

@VRichardJP Thank you for the good proposal!
I'd like to measure on my PC, Cloud you share the rosbag including /perception/object_recognition/detection/apollo/labeld_clusters

@VRichardJP
Copy link
Contributor Author

@yukkysaito I am sorry, I cannot easily share such data.
But I think it should be possible to use the demo rosbag as reference since many vehicles are visible.

@mitsudome-r
Copy link
Member

@VRichardJP This looks like a good improvement. Would you be able to create a PR for the proposal (maybe after @yukkysaito confirmed the feature)?

@VRichardJP
Copy link
Contributor Author

No problem, I will wait for @yukkysaito to confirm the change does not impact detection quality

@mitsudome-r
Copy link
Member

@angry-crab has volunteered to do the further investigation.

@angry-crab
Copy link
Contributor

After a brief investigation about Brent's method and other numerical optimization algorithms, one concern is that Brent's method is not guaranteed to converge, which means there might be some potential issues undetected. I think we could add a parameter to shape_estimation node to be able to select among different optimization algorithms. I'll also test other numerical methods for this issue.

@VRichardJP
Copy link
Contributor Author

I think being able to chose between multiple implementation would be great.

According to boost documentation and wikipedia, Brent's method should at least converge to a local minima. It would be interesting to plot -calcClosenessCriterion, but from the paper it really looks like the function has only 1 minima, except when the minima is exactly at 0/90 degrees, in which case the function behavior may be strange (maybe?). But even if it was a thing, it should be able to fix that using a slighly larger search interval (e.g. [0, 100])

@xmfcx
Copy link
Contributor

xmfcx commented Jun 28, 2022

OpenCV has this: https://theailearner.com/2020/11/03/opencv-minimum-area-rectangle/

(Just for minimum area rectangle criteria)

@angry-crab
Copy link
Contributor

Due to the issue of broken docker image, testing of this issue will be postponed. CUDA

@VRichardJP
Copy link
Contributor Author

I feel a bit stupid, but I just realized the performance reported in #1019 (comment) has been retrieved from a debug build...

Using optimizations from release and the same test environment than before, I get around ~4 ms for the current implementation (or ~1 ms per box), and ~1 ms with the brent's method (with bits = 6). So the second method is still around 4 times faster than current implementation, yet with a similar precision.

That said, their is no real performance issue in the first place. Even in a crowded parking area it would still be negligible. I would understand if you don't want to introduce some convoluted algorithm just for a few ms...

@angry-crab
Copy link
Contributor

@VRichardJP Thank you very much for your efforts. I think 4 times faster is a great improvement.
@mitsudome-r Do you think we should include this improvement?

@mitsudome-r
Copy link
Member

I think we can still include this improvement in Autoware.Universe to give more options to the users.

@angry-crab angry-crab linked a pull request Jul 8, 2022 that will close this issue
4 tasks
@angry-crab
Copy link
Contributor

I've created a draft PR regarding this issue. Please feel free to make any comment.

@mitsudome-r mitsudome-r added type:new-feature New functionalities or additions, feature requests. and removed priority:high High urgency and importance. labels Jul 12, 2022
satoshi-ota pushed a commit to satoshi-ota/autoware.universe that referenced this issue Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:perception Advanced sensor data processing and environment understanding. (auto-assigned) type:new-feature New functionalities or additions, feature requests.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants