Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collect --clean run on two sosreports of the same system hangs (or segfaults) #3271

Closed
pmoravec opened this issue Jun 12, 2023 · 2 comments
Closed

Comments

@pmoravec
Copy link
Contributor

pmoravec commented Jun 12, 2023

When running sos collect --clean --nodes node1, I can easily hit a segfault (and sporadically an execution stuck after "Obfuscation completed").

Minimalistic reproducer:

  • have python3-magic or python3-file-magic installed (mandatory condition)
  • allow ssh password-less connection to a system node1
  • run sos collect --nodes node1 --clean --batch -o host,networking

The crash is a race condition that almost always happens outside gdb but just sporadically inside gdb. With backtraces:

Thread 0x7fffe74b0640 
throwflag=0, f=Frame 0x7fffe6b675b0, for file /usr/lib/python3.9/site-packages/magic.py, line 148, in file 
throwflag=0, f=Frame 0x7fffd822e500, for file /usr/lib/python3.9/site-packages/magic.py, line 267, in detect_from_filename 
throwflag=0, f=Frame 0x7fffe0016630, for file /root/sos-main/sos/utilities.py, line 100, in file_is_binary 
throwflag=0, f=Frame 0x7fffd822e290, for file /root/sos-main/sos/cleaner/archives/__init__.py, line 387, in should_remove_file 
throwflag=0, f=Frame 0x7fffd8007f10, for file /root/sos-main/sos/cleaner/__init__.py, line 652, in obfuscate_report 
throwflag=0, f=Frame 0x7fffe74e8800, for file /usr/lib64/python3.9/concurrent/futures/thread.py, line 58, in run 
throwflag=0, f=Frame 0x7fffe74e8610, for file /usr/lib64/python3.9/concurrent/futures/thread.py, line 83, in _worker 
throwflag=0, f=Frame 0x7fffe6bc6740, for file /usr/lib64/python3.9/threading.py, line 917, in run 
throwflag=0, f=Frame 0x7fffe6bc2640, for file /usr/lib64/python3.9/threading.py, line 980, in _bootstrap_inner 
throwflag=0, f=Frame 0x7fffe6bc6580, for file /usr/lib64/python3.9/threading.py, line 937, in _bootstrap 
Thread 0x7fffe84d8640 
throwflag=0, f=Frame 0x5555564a69a0, for file /usr/lib/python3.9/site-packages/magic.py, line 148, in file 
throwflag=0, f=Frame 0x7fffd80101f0, for file /usr/lib/python3.9/site-packages/magic.py, line 267, in detect_from_filename 
throwflag=0, f=Frame 0x7fffea25e230, for file /root/sos-main/sos/utilities.py, line 100, in file_is_binary 
throwflag=0, f=Frame 0x7fffd84fc690, for file /root/sos-main/sos/cleaner/archives/__init__.py, line 387, in should_remove_file 
throwflag=0, f=Frame 0x7fffe00008e0, for file /root/sos-main/sos/cleaner/__init__.py, line 652, in obfuscate_report 
throwflag=0, f=Frame 0x7fffe755edd0, for file /usr/lib64/python3.9/concurrent/futures/thread.py, line 58, in run 
throwflag=0, f=Frame 0x7fffe75039f0, for file /usr/lib64/python3.9/concurrent/futures/thread.py, line 83, in _worker 
throwflag=0, f=Frame 0x7fffe7502c80, for file /usr/lib64/python3.9/threading.py, line 917, in run 
throwflag=0, f=Frame 0x7fffe74f0240, for file /usr/lib64/python3.9/threading.py, line 980, in _bootstrap_inner 
throwflag=0, f=Frame 0x7fffe7502ac0, for file /usr/lib64/python3.9/threading.py, line 937, in _bootstrap 
Thread 0x7ffff796e740 
throwflag=0, f=Frame 0x7fffe75015b0, for file /usr/lib64/python3.9/threading.py, line 1080, in _wait_for_tstate_lock 
throwflag=0, f=Frame 0x7fffe74ffa60, for file /usr/lib64/python3.9/threading.py, line 1060, in join 
throwflag=0, f=Frame 0x7fffe7503230, for file /usr/lib64/python3.9/concurrent/futures/thread.py, line 235, in shutdown 
throwflag=0, f=Frame 0x7fffe6bc3d60, for file /root/sos-main/sos/cleaner/__init__.py, line 556, in obfuscate_report_paths 
throwflag=0, f=Frame 0x555555c6a600, for file /root/sos-main/sos/cleaner/__init__.py, line 368, in execute 
throwflag=0, f=Frame 0x5555558ddbb0, for file /root/sos-main/sos/collector/__init__.py, line 1327, in create_cluster_archive 
throwflag=0, f=Frame 0x555555c6f530, for file /root/sos-main/sos/collector/__init__.py, line 1262, in collect 
throwflag=0, f=Frame 0x7fffe75a41f0, for file /root/sos-main/sos/collector/__init__.py, line 1183, in execute 
throwflag=0, f=Frame 0x7fffe7598860, for file /root/sos-main/sos/__init__.py, line 193, in execute 

Different files are passed to python-magic concurrently:

#28 0x00007ffff7d63aa5 in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7fffe0016630, for file /root/sos-main/sos/utilities.py, line 100, in file_is_binary (fname='/var/tmp/sos.du1e5_78/cleaner/sosreport-pmoravec-rhel9-2023-06-12-fuirrpw/sos_commands/networking/ethtool_--phy-statistics_ens192'), tstate=0x555555c6b6e0) at /usr/src/debug/python3.9-3.9.14-1.el9_1.1.x86_64/Include/internal/pycore_ceval.h:40

#31 0x00007ffff7d63aa5 in _PyEval_EvalFrame (throwflag=0, f=Frame 0x7fffea25e230, for file /root/sos-main/sos/utilities.py, line 100, in file_is_binary (fname='/var/tmp/sos.du1e5_78/cleaner/sosreport-pmoravec-rhel9-2023-06-12-wlbauou/sos_commands/host/find_._-maxdepth_2_-type_l_-ls'), tstate=0x5555559a7890) at /usr/src/debug/python3.9-3.9.14-1.el9_1.1.x86_64/Include/internal/pycore_ceval.h:40

It sounds like python3-file-magic-5.39-10.el9.noarch (seen also on python3-magic-5.33-21.el8.noarch) can't cope with concurrent calls of detect_from_filename?

@pmoravec
Copy link
Contributor Author

Curiously, sos on Fedora (with python3-file-magic-5.44-3.fc38.noarch) is not affected - I can't reproduce it there.

@pmoravec
Copy link
Contributor Author

.. and the bug is (fixed) in python-file-magic: having pip install file-magic does not lead to segfaults, while having pip install --force-reinstall -v file-magic==0.4.0 (same like RHEL9 has), I can reproduce the segfault.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant