Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recurrent crashes in goal_planner: segmentation fault linked to freespace_planning_algorithms::AstarSearch deallocation #5154

Open
3 tasks done
kyoichi-sugahara opened this issue Sep 27, 2023 · 23 comments
Labels
component:planning Route planning, decision-making, and navigation. (auto-assigned) status:help-wanted Assistance or contributors needed. status:stale Inactive or outdated issues. (auto-assigned) type:bug Software flaws or errors.

Comments

@kyoichi-sugahara
Copy link
Contributor

kyoichi-sugahara commented Sep 27, 2023

Checklist

  • I've read the contribution guidelines.
  • I've searched other issues and no duplicate issues were found.
  • I'm convinced that this is not my fault but a bug.

Description

While executing the goal_planner, the program crashes due to a segmentation fault. Based on the stack trace, the issue seems to arise when an std::unordered_map, holding values of type freespace_planning_algorithms::AstarNode, is being deallocated.

goal_planner_issue_5154.webm

Expected behavior

The intended behavior is for the nodes to remain alive, and for freespace_planning_algorithms::AstarNode to successfully generate a path to the goal and reach it without issues.

Actual behavior

It's not guaranteed to be reproducible 100% of the time when generating paths using freespace_planning_algorithms, but after several repetitions, the node eventually crashes.

Here is the stack trace:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  __gnu_cxx::new_allocator<std::__detail::_Hash_node<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode>, false> >::deallocate (__t=1, __p=0x7f1cb8926eb0, this=0x7f1ce82e3778) at /usr/include/c++/11/ext/new_allocator.h:132
132       deallocate(_Tp* __p, size_type __t __attribute__ ((__unused__)))
[Current thread is 1 (Thread 0x7f1d32506540 (LWP 669228))]
(gdb) bt
#0  __gnu_cxx::new_allocator<std::__detail::_Hash_node<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode>, false> >::deallocate (__t=1, __p=0x7f1cb8926eb0, this=0x7f1ce82e3778) at /usr/include/c++/11/ext/new_allocator.h:132
#1  std::allocator_traits<std::allocator<std::__detail::_Hash_node<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode>, false> > >::deallocate (__n=1, __p=0x7f1cb8926eb0, __a=...) at /usr/include/c++/11/bits/alloc_traits.h:496
#2  std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode>, false> > >::_M_deallocate_node_ptr (__n=0x7f1cb8926eb0, this=0x7f1ce82e3778)
    at /usr/include/c++/11/bits/hashtable_policy.h:1905
#3  std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode>, false> > >::_M_deallocate_node (__n=0x7f1cb8926eb0, this=0x7f1ce82e3778)
    at /usr/include/c++/11/bits/hashtable_policy.h:1895
#4  std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode>, false> > >::_M_deallocate_nodes (__n=0x7f1cba511170, this=0x7f1ce82e3778)
    at /usr/include/c++/11/bits/hashtable_policy.h:1916
#5  std::_Hashtable<unsigned int, std::pair<unsigned int const, freespace_planning_algorithms::AstarNode>, std::allocator<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode> >, std::__detail::_Select1st, std::equal_to<unsigned int>, std::hash<unsigned int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::clear (this=0x7f1ce82e3778)
    at /usr/include/c++/11/bits/hashtable.h:2320
#6  std::_Hashtable<unsigned int, std::pair<unsigned int const, freespace_planning_algorithms::AstarNode>, std::allocator<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode> >, std::__detail::_Select1st, std::equal_to<unsigned int>, std::hash<unsigned int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::~_Hashtable (this=0x7f1ce82e3778, 
    __in_chrg=<optimized out>) at /usr/include/c++/11/bits/hashtable.h:1532
#7  std::unordered_map<unsigned int, freespace_planning_algorithms::AstarNode, std::hash<unsigned int>, std::equal_to<unsigned int>, std::allocator<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode> > >::~unordered_map (
    this=0x7f1ce82e3778, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/unordered_map.h:102
#8  freespace_planning_algorithms::AstarSearch::~AstarSearch (this=0x7f1ce82e3550, __in_chrg=<optimized out>)
    at /home/kyoichi-sugahara/workspace/pilot-auto.awf-latest/src/autoware/universe/planning/freespace_planning_algorithms/include/freespace_planning_algorithms/astar_search.hpp:106
#9  freespace_planning_algorithms::AstarSearch::~AstarSearch (this=0x7f1ce82e3550, __in_chrg=<optimized out>)
    at /home/kyoichi-sugahara/workspace/pilot-auto.awf-latest/src/autoware/universe/planning/freespace_planning_algorithms/include/freespace_planning_algorithms/astar_search.hpp:106
#10 0x00007f1d09e33cfb in std::default_delete<freespace_planning_algorithms::AbstractPlanningAlgorithm>::operator() (
    __ptr=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/unique_ptr.h:79
#11 std::unique_ptr<freespace_planning_algorithms::AbstractPlanningAlgorithm, std::default_delete<freespace_planning_algorithms::AbstractPlanningAlgorithm> >::~unique_ptr (this=0x7f1ce805edc8, __in_chrg=<optimized out>)
    at /usr/include/c++/11/bits/unique_ptr.h:361
#12 behavior_path_planner::FreespacePullOver::~FreespacePullOver (this=0x7f1ce805e990, __in_chrg=<optimized out>)
    at /home/kyoichi-sugahara/workspace/pilot-auto.awf-latest/src/autoware/universe/planning/behavior_path_planner/include/behavior_path_planner/utils/goal_planner/freespace_pull_over.hpp:35
#13 behavior_path_planner::FreespacePullOver::~FreespacePullOver (this=0x7f1ce805e990, __in_chrg=<optimized out>)
    at /home/kyoichi-sugahara/workspace/pilot-auto.awf-latest/src/autoware/universe/planning/behavior_path_planner/include/behavior_path_planner/utils/goal_planner/freespace_pull_over.hpp:35
#14 0x00007f1d09cb5538 in std::default_delete<behavior_path_planner::PullOverPlannerBase>::operator() (
    __ptr=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/unique_ptr.h:79
--Type <RET> for more, q to quit, c to continue without paging--c
#15 std::unique_ptr<behavior_path_planner::PullOverPlannerBase, std::default_delete<behavior_path_planner::PullOverPlannerBase> >::~unique_ptr (this=0x7f1ce8055950, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/unique_ptr.h:361
#16 behavior_path_planner::GoalPlannerModule::~GoalPlannerModule (this=0x7f1ce80552a0, __in_chrg=<optimized out>) at /home/kyoichi-sugahara/workspace/pilot-auto.awf-latest/src/autoware/universe/planning/behavior_path_planner/include/behavior_path_planner/scene_module/goal_planner/goal_planner_module.hpp:111
#17 0x00007f1d09cb560d in behavior_path_planner::GoalPlannerModule::~GoalPlannerModule (this=0x7f1ce80552a0, __in_chrg=<optimized out>) at /home/kyoichi-sugahara/workspace/pilot-auto.awf-latest/src/autoware/universe/planning/behavior_path_planner/include/behavior_path_planner/scene_module/goal_planner/goal_planner_module.hpp:111
#18 0x000055da144e6832 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() ()
#19 0x00007f1d09b7d605 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7f1cf402d5d8, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:705
#20 std::__shared_ptr<behavior_path_planner::SceneModuleInterface, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7f1cf402d5d0, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:1154
#21 std::shared_ptr<behavior_path_planner::SceneModuleInterface>::~shared_ptr (this=0x7f1cf402d5d0, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr.h:122
#22 std::_Destroy<std::shared_ptr<behavior_path_planner::SceneModuleInterface> > (__pointer=0x7f1cf402d5d0) at /usr/include/c++/11/bits/stl_construct.h:151
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __gnu_cxx::new_allocator<std::__detail::_Hash_node<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode>, false> >::deallocate (__t=1, __p=0x7f1cb8926eb0, this=0x7f1ce82e3778) at /usr/include/c++/11/ext/new_allocator.h:132
132       deallocate(_Tp* __p, size_type __t __attribute__ ((__unused__)))
[Current thread is 1 (Thread 0x7f1d32506540 (LWP 669228))]
(gdb) bt
#0  __gnu_cxx::new_allocator<std::__detail::_Hash_node<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode>, false> >::deallocate (__t=1, __p=0x7f1cb8926eb0, this=0x7f1ce82e3778) at /usr/include/c++/11/ext/new_allocator.h:132
#1  std::allocator_traits<std::allocator<std::__detail::_Hash_node<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode>, false> > >::deallocate (__n=1, __p=0x7f1cb8926eb0, __a=...) at /usr/include/c++/11/bits/alloc_traits.h:496
#2  std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode>, false> > >::_M_deallocate_node_ptr (__n=0x7f1cb8926eb0, this=0x7f1ce82e3778)
    at /usr/include/c++/11/bits/hashtable_policy.h:1905
#3  std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode>, false> > >::_M_deallocate_node (__n=0x7f1cb8926eb0, this=0x7f1ce82e3778)
    at /usr/include/c++/11/bits/hashtable_policy.h:1895
#4  std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode>, false> > >::_M_deallocate_nodes (__n=0x7f1cba511170, this=0x7f1ce82e3778)
    at /usr/include/c++/11/bits/hashtable_policy.h:1916
#5  std::_Hashtable<unsigned int, std::pair<unsigned int const, freespace_planning_algorithms::AstarNode>, std::allocator<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode> >, std::__detail::_Select1st, std::equal_to<unsigned int>, std::hash<unsigned int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::clear (this=0x7f1ce82e3778)
    at /usr/include/c++/11/bits/hashtable.h:2320
#6  std::_Hashtable<unsigned int, std::pair<unsigned int const, freespace_planning_algorithms::AstarNode>, std::allocator<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode> >, std::__detail::_Select1st, std::equal_to<unsigned int>, std::hash<unsigned int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::~_Hashtable (this=0x7f1ce82e3778, 
    __in_chrg=<optimized out>) at /usr/include/c++/11/bits/hashtable.h:1532
#7  std::unordered_map<unsigned int, freespace_planning_algorithms::AstarNode, std::hash<unsigned int>, std::equal_to<unsigned int>, std::allocator<std::pair<unsigned int const, freespace_planning_algorithms::AstarNode> > >::~unordered_map (
    this=0x7f1ce82e3778, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/unordered_map.h:102
#8  freespace_planning_algorithms::AstarSearch::~AstarSearch (this=0x7f1ce82e3550, __in_chrg=<optimized out>)
    at /home/kyoichi-sugahara/workspace/pilot-auto.awf-latest/src/autoware/universe/planning/freespace_planning_algorithms/include/freespace_planning_algorithms/astar_search.hpp:106
#9  freespace_planning_algorithms::AstarSearch::~AstarSearch (this=0x7f1ce82e3550, __in_chrg=<optimized out>)
    at /home/kyoichi-sugahara/workspace/pilot-auto.awf-latest/src/autoware/universe/planning/freespace_planning_algorithms/include/freespace_planning_algorithms/astar_search.hpp:106
#10 0x00007f1d09e33cfb in std::default_delete<freespace_planning_algorithms::AbstractPlanningAlgorithm>::operator() (
    __ptr=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/unique_ptr.h:79
#11 std::unique_ptr<freespace_planning_algorithms::AbstractPlanningAlgorithm, std::default_delete<freespace_planning_algorithms::AbstractPlanningAlgorithm> >::~unique_ptr (this=0x7f1ce805edc8, __in_chrg=<optimized out>)
    at /usr/include/c++/11/bits/unique_ptr.h:361
#12 behavior_path_planner::FreespacePullOver::~FreespacePullOver (this=0x7f1ce805e990, __in_chrg=<optimized out>)
    at /home/kyoichi-sugahara/workspace/pilot-auto.awf-latest/src/autoware/universe/planning/behavior_path_planner/include/behavior_path_planner/utils/goal_planner/freespace_pull_over.hpp:35
#13 behavior_path_planner::FreespacePullOver::~FreespacePullOver (this=0x7f1ce805e990, __in_chrg=<optimized out>)
    at /home/kyoichi-sugahara/workspace/pilot-auto.awf-latest/src/autoware/universe/planning/behavior_path_planner/include/behavior_path_planner/utils/goal_planner/freespace_pull_over.hpp:35
#14 0x00007f1d09cb5538 in std::default_delete<behavior_path_planner::PullOverPlannerBase>::operator() (
    __ptr=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/unique_ptr.h:79
--Type <RET> for more, q to quit, c to continue without paging--c
#15 std::unique_ptr<behavior_path_planner::PullOverPlannerBase, std::default_delete<behavior_path_planner::PullOverPlannerBase> >::~unique_ptr (this=0x7f1ce8055950, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/unique_ptr.h:361
#16 behavior_path_planner::GoalPlannerModule::~GoalPlannerModule (this=0x7f1ce80552a0, __in_chrg=<optimized out>) at /home/kyoichi-sugahara/workspace/pilot-auto.awf-latest/src/autoware/universe/planning/behavior_path_planner/include/behavior_path_planner/scene_module/goal_planner/goal_planner_module.hpp:111
#17 0x00007f1d09cb560d in behavior_path_planner::GoalPlannerModule::~GoalPlannerModule (this=0x7f1ce80552a0, __in_chrg=<optimized out>) at /home/kyoichi-sugahara/workspace/pilot-auto.awf-latest/src/autoware/universe/planning/behavior_path_planner/include/behavior_path_planner/scene_module/goal_planner/goal_planner_module.hpp:111
#18 0x000055da144e6832 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() ()
#19 0x00007f1d09b7d605 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7f1cf402d5d8, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:705
#20 std::__shared_ptr<behavior_path_planner::SceneModuleInterface, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7f1cf402d5d0, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:1154
#21 std::shared_ptr<behavior_path_planner::SceneModuleInterface>::~shared_ptr (this=0x7f1cf402d5d0, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr.h:122
#22 std::_Destroy<std::shared_ptr<behavior_path_planner::SceneModuleInterface> > (__pointer=0x7f1cf402d5d0) at /usr/include/c++/11/bits/stl_construct.h:151

Steps to reproduce

Please use attached lanelet map
virtual_G_dev_road_shoulder.zip

  1. Place obstacles to recreate a scene where a complicated trajectory should be generated.
  2. Repeat the following tests:
    a. Set ego vehicle initial pose.
    b. Set various pattern of goal pose.
  3. At some point, I encountered the problem described in this issue.

Possible causes

  • mutex in the goal_planner, which runs in multi-threading, may not be operating correctly.
  • During the execution of freespace_planning_algorithms, there is a release of memory in the planner manager module.
  • There might be a standalone bug in Astar.

Additional context

This issue results in an occasional crash of the node, affecting the reliability of the path planning process, and thereby requires prompt attention and resolution.

@kosuke55
Copy link
Contributor

Thanks for the issue!
Similar problems occurred with both start/goal planner. The logs are a little different, but I think it is the same kind of bug.

It seems to be caused by the fact that open_list_ sometimes contains 0x0.
image

@kyoichi-sugahara
Copy link
Contributor Author

@NorahXiong
Hello, and sorry for sudden mention.
I haven't been able to identify the cause yet, and I apologize if this turns out to be unrelated, but I was wondering if there might be a possibility that the issue is being influenced by this PR? If this change is found to be unrelated, I sincerely apologize for any inconvenience.

@BonoloAWF BonoloAWF added the type:bug Software flaws or errors. label Sep 28, 2023
@kyoichi-sugahara kyoichi-sugahara changed the title Recurrent Crashes in goal_planner: Segmentation Fault Linked to freespace_planning_algorithms::AstarNode Deallocation Recurrent crashes in goal_planner: segmentation fault linked to freespace_planning_algorithms::AstarNode deallocation Sep 29, 2023
@kyoichi-sugahara kyoichi-sugahara changed the title Recurrent crashes in goal_planner: segmentation fault linked to freespace_planning_algorithms::AstarNode deallocation Recurrent crashes in goal_planner: segmentation fault linked to freespace_planning_algorithms::AstarSearch deallocation Sep 29, 2023
@NorahXiong
Copy link
Contributor

@NorahXiong Hello, and sorry for sudden mention. I haven't been able to identify the cause yet, and I apologize if this turns out to be unrelated, but I was wondering if there might be a possibility that the issue is being influenced by this PR? If this change is found to be unrelated, I sincerely apologize for any inconvenience.

I tried many times but the crash never happened and no clue was found in the related code, Is there any special step not mentioned in the reproduce steps?
image

@NorahXiong
Copy link
Contributor

Thanks for the issue! Similar problems occurred with both start/goal planner. The logs are a little different, but I think it is the same kind of bug.

It seems to be caused by the fact that open_list_ sometimes contains 0x0. image

I think the 0x0 elements are in the underlying wrapping data structure (vector) rather than in the queue. It would not be very likely that the empty pointers are pushed into the queue as the pointers have all been visited before being pushed.

@kyoichi-sugahara
Copy link
Contributor Author

@NorahXiong
Thank you so much for the reponse.
I tried to reproduce problem again and I successfully reproduced the probelem(It's very difficult to reproduce problem though...)
The reproducibility is not perfect.
The situation is

  • goal is inside of bus
  • ego vehicle is stucked
  • change currnet pose
goal_planner_node_die.webm

@kosuke55
Copy link
Contributor

kosuke55 commented Oct 2, 2023

@NorahXiong
Thanks for the reply and for trying!
In my case, I was able to reproduce it by putting the goal many times.

issue_5154-2023-10-02_22.37.17_mini.mp4

@kosuke55 kosuke55 added the component:planning Route planning, decision-making, and navigation. (auto-assigned) label Oct 2, 2023
@kyoichi-sugahara kyoichi-sugahara added the status:help-wanted Assistance or contributors needed. label Oct 3, 2023
Copy link

stale bot commented Dec 2, 2023

This pull request has been automatically marked as stale because it has not had recent activity.

@stale stale bot added the status:stale Inactive or outdated issues. (auto-assigned) label Dec 2, 2023
@kosuke55
Copy link
Contributor

@NorahXiong
We have not been able to fix this issue. We are sorry to bother you, but could you please try to reproduce it? 🙏

@stale stale bot removed the status:stale Inactive or outdated issues. (auto-assigned) label Dec 14, 2023
@kosuke55
Copy link
Contributor

@VRichardJP
I think you are familiar with memory, so if you could advise me on this I would appreciate it. 🙇

@VRichardJP
Copy link
Contributor

It looks like a concurrency issue, as the program seems to crash at different places:

Did you try to run the program with valgrind? e.g. with launch-prefix="gnome-terminal -- valgrind --tool=memcheck --leak-check=yes "

@kosuke55
Copy link
Contributor

kosuke55 commented Dec 14, 2023

@VRichardJP
thanks for the advice. We did not use valgrind so we will try!

@VRichardJP
Copy link
Contributor

VRichardJP commented Dec 14, 2023

I am not sure I totally understand how the modules are running, but the StartPlannerModule creates a new callback queue for the FreespacePullOut module here:

freespace_planner_timer_cb_group_ =
node.create_callback_group(rclcpp::CallbackGroupType::MutuallyExclusive);
freespace_planner_timer_ = rclcpp::create_timer(
&node, clock_, freespace_planner_period_ns,
std::bind(&StartPlannerModule::onFreespacePlannerTimer, this),
freespace_planner_timer_cb_group_);

Is my understanding correct?

  • behavior path planner manager create/destroy planning modules at runtime depending on the situation.
  • the manager calls the modules run() function one after the other.
    Obviously, both don't happen at the same time.

Then, what happens to the callback created by FreespacePullOut? I am not sure what is the behavior when the default callback queue is used. Maybe the module share the same callback queue than the manager. In such case the timed callbacks are mutually exclusive and the freespace pull out timed callback cannot run at the same time than the manager.

But here, the callback is running in a different queue, so you may have a thread running inside FreespacePullOut timed callback while another thread is in the manager and trying to destroy the module (or modifying some data required by the module).

For instance, what happens if you put a sleep right before the planFreespacePath() line here:

if (isStuck() && is_new_costmap) {
planFreespacePath();
}

I guess it will crash right away.

@kosuke55
Copy link
Contributor

@VRichardJP
Thank you very much for your very detailed look!!

Is my understanding correct?
・behavior path planner manager create/destroy planning modules at runtime depending on the situation.
・the manager calls the modules run() function one after the other.
Obviously, both don't happen at the same time.

yes, your understanding is correct.

The manager deletes or std::move modules depending on the situation.
For example, when a new route is received, the module instance is cleared.
https://github.com/autowarefoundation/autoware.universe/blob/feat/avoidance_pull_over/planning/behavior_path_planner/src/behavior_path_planner_node.cpp#L379

If FreespacePullOut is running in a separate thread at this time, it is possible that the data could be rewritten and crashed.

So, I feel that locking the clear from manager while FreespacePullOut's callback is running, or as a separate instance of FreespacePullOut (building a server), etc. might be a solution.

For instance, what happens if you put a sleep right before the planFreespacePath() line here:
autoware.universe/planning/behavior_path_start_planner_module/src/start_planner_module.cpp
Lines 99 to 101 in 2252226
if (isStuck() && is_new_costmap) {
planFreespacePath();
}
I guess it will crash right away.For instance, what happens if you put a sleep right before the planFreespacePath() line here:

I would like to confirm this as well, but I wonder if it dies during the planFreespacePath process and then sleeps before that process? Is the intention to generate a time delay so that clearing is more likely to occur during the planFreespacePath process ?Specifically, should I perform the same reproduction method by doing the following?

 if (isStuck() && is_new_costmap) { 
   std::this_thread::sleep_for(std::chrono::seconds(10));
   planFreespacePath(); 
 } 

@VRichardJP
Copy link
Contributor

@kosuke55
Yes, if the issue is what I think it is, then while you have one thread sleeping before planFreespacePath() the behavior path planner manager will continue doing its work:

 if (isStuck() && is_new_costmap) { 
   std::this_thread::sleep_for(std::chrono::seconds(10));
   planFreespacePath(); 
 } 

In particular, if you reset the goal in that 10s window, the freespace object (or things it is refering to) are likely to be destroyed/moved, and I guess you will get some sort of segmentation fault.

@NorahXiong
Copy link
Contributor

@NorahXiong We have not been able to fix this issue. We are sorry to bother you, but could you please try to reproduce it? 🙏

Have you found out the reason leading to the segmentation fault? I'm sorry I have to try it again later if you still need.

@kosuke55
Copy link
Contributor

@NorahXiong
Sorry for the delay. No we have not been able to proceed with any analysis yet.

@NorahXiong
Copy link
Contributor

@kosuke55 @kyoichi-sugahara
I tried again but still no segmentation fault occurred. Here's the video link.
Any suggestions to help me reproducing the bug?

@kosuke55
Copy link
Contributor

kosuke55 commented Feb 5, 2024

@NorahXiong
Thank you very much for trying again.
ego vehicle is needed to be in parking_lot to run FreespacePullOver(). (and better to be also in lane)
parking_lot is light yellow area and the red rectangle is expample of ego positon.

2024-02-05_23-01

@kosuke55
Copy link
Contributor

kosuke55 commented Feb 5, 2024

@NorahXiong
oh, sorry currently it hve to be close enough to goal to execute the goal planner, and the braking distance determines that distance. If this is made large enough, it will be triggered regardless of the braking distance.

https://github.com/tier4/autoware_launch/blob/awf-latest/autoware_launch/config/planning/scenario_planning/lane_driving/behavior_planning/behavior_path_planner/goal_planner/goal_planner.param.yaml#L45

minimum_request_length: 100.0

And the goal is needed to be put in the road_shoulder

@kosuke55
Copy link
Contributor

kosuke55 commented Feb 5, 2024

@NorahXiong

As @VRichardJP indicated, sleep could easily produce a crash.

  if (isStuck() && is_new_costmap && needPathUpdate(path_update_duration)) {
    std::this_thread::sleep_for(std::chrono::seconds(10));
    planFreespacePath();
  }
freespace_pull_over-2024-02-05_23.40.27.mp4

@kosuke55
Copy link
Contributor

kosuke55 commented Feb 5, 2024

#6322 may fix the issue, we will test more

@NorahXiong
Copy link
Contributor

NorahXiong commented Feb 6, 2024

@kosuke55
I followed your steps by

  1. setting minimum_request_length to 100.0;
  2. setting sleeping time before freespace planning;
  3. setting the goal on the road shoulder.
    Test result: No crash occurred.

Information may help you confirm the cause:

  1. My latest tests(today and yesterday) were taken in native environment(Ubuntu22.04).
  2. My previous tests were taken in a docker environment and the lane-driving related node died once or twice during tests.

Env Info:

  1. Ubuntu 22.04.3 LTS
  2. glibc 2.35

image
image

test_of_issue_5154_new.mp4

Copy link

stale bot commented Apr 30, 2024

This pull request has been automatically marked as stale because it has not had recent activity.

@stale stale bot added the status:stale Inactive or outdated issues. (auto-assigned) label Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:planning Route planning, decision-making, and navigation. (auto-assigned) status:help-wanted Assistance or contributors needed. status:stale Inactive or outdated issues. (auto-assigned) type:bug Software flaws or errors.
Projects
None yet
Development

No branches or pull requests

5 participants