Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net_connections is returning incorrect pid #1013

Closed
joeisca opened this issue Apr 19, 2017 · 13 comments
Closed

net_connections is returning incorrect pid #1013

joeisca opened this issue Apr 19, 2017 · 13 comments

Comments

@joeisca
Copy link

joeisca commented Apr 19, 2017

On FreeBSD, running psutil.net_connections() is returning connections with incorrect pids.

I tried to match a connection, by status='ESTABLISHED', and laddr= and was given the expected connections, but the pid on one of the processes was reported as another processes pid.

As this was tied to a process timeout program, this gave unexpected and catastrophic results, depending on which process was inserted in place of the pid that should have been reported.

@giampaolo
Copy link
Owner

Uhm, this is bad. For reference, on FreeBSD global and per-process connections are retrieved differently. The part associating the PID to the connection is here:

psutil_get_pid_from_sock(int sock_hash) {

@giampaolo
Copy link
Owner

Mmmm are you sure about this? Can you provide a script to reproduce the issue?
I've just tried to run netstat.py script and the PIDs look correct:

vagrant@freebsd/vagrant/psutil$ python scripts/netstat.py 
Proto Local address                  Remote address                 Status        PID    Program name
tcp6  :::22                          -                              LISTEN        756    sshd
udp6  :::514                         -                              NONE          557    syslogd
tcp   0.0.0.0:22                     -                              LISTEN        756    sshd
tcp   10.0.2.15:22                   10.0.2.2:54596                 ESTABLISHED   968    sshd
udp   0.0.0.0:514                    -                              NONE          557    syslogd

@giampaolo
Copy link
Owner

OK, I've been lucky and by accident I added a test case which exposes this issue:

======================================================================
FAIL: test_connections.TestConnectedSocketPairs.test_unix
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/vagrant/psutil/psutil/tests/test_connections.py", line 223, in test_unix
    self.check_socket(server, conn=server_conn)
  File "/vagrant/psutil/psutil/tests/test_connections.py", line 110, in check_socket
    compare_procsys_connections(os.getpid(), cons)
  File "/vagrant/psutil/psutil/tests/test_connections.py", line 68, in compare_procsys_connections
    assert proc_cons == sys_cons, (proc_cons, sys_cons)
AssertionError: ([pconn(fd=-1, family=1, type=1, laddr='', raddr=None, status='NONE'), pconn(fd=-1, family=1, type=1, laddr='/tmp/$testfnrDzK4W', raddr=None, status='NONE')], [(-1, 1, 1, '/tmp/$testfnrDzK4W', None, 'NONE')])

So far I've only been able to reproduce it with UNIX sockets (net_connections('unix') or net_connections('all')). Did you also notice whether this occurs with UNIX connections only?

@giampaolo
Copy link
Owner

Sorry, since you mentioned status == 'ESTABLISHED' I suppose you see this also with AF_INET / TCP sockets.

@joeisca
Copy link
Author

joeisca commented Apr 29, 2017 via email

@giampaolo
Copy link
Owner

giampaolo commented May 14, 2017

@glebius is this something you may wanna look at? I'm afraid I don't know how to fix this and it's high priority for FreeBSD users.

@glebius
Copy link
Contributor

glebius commented May 14, 2017

Pull request sent (#1070).

@giampaolo
Copy link
Owner

Thanks a lot @glebius! Your PR fixed the PID mismatch issue which is a big step forward but net_connections returns less results than Process.connections() and sockstat, which is another separate issue:

======================================================================
FAIL: test_connections.TestSystemWideConnections.test_multi_sockets_procs
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/vagrant/psutil/psutil/tests/test_connections.py", line 446, in test_multi_sockets_procs
    expected)
AssertionError: 6 != 7

======================================================================
FAIL: test_connections.TestSystemWideConnections.test_multi_socks
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/vagrant/psutil/psutil/tests/__init__.py", line 706, in wrapper
    return fun(*args, **kwargs)
  File "/vagrant/psutil/psutil/tests/test_connections.py", line 416, in test_multi_socks
    self.assertEqual(len(socks), len(cons))
AssertionError: 7 != 6

In this case the socket not listed is the UNIX socket but I'm not sure if it's relevant.
The reason this happens is because certain PIDs cannot be determined (-1) and as such they are skipped:

sockstat does not have this problem so we're doing something wrong in psutil.

@giampaolo
Copy link
Owner

Also (yet another separate issue) I see sockstat is able to determine connections' fd whereas psutil is not, so that's another thing we should take from sockstat.

@glebius
Copy link
Contributor

glebius commented May 16, 2017

Hey, that was a puzzle for me! Spent 2 hours, while expected 15 minutes issue. So, what is going on here.

In psutil/tests/__init__.py there is create_sockets(), which calls unix_socketpair(). The problem is
that unix_socketpair() has nothing to do with UNIX socketpair(2) syscall, and this is very confusing.
Real socketpair(2) syscall creates two sockets, that are connected. Neither is a server and neither
is client. They are just connected. What unix_socketpair() does: it creates listening socket, it
creates other socket (name it client), it connects client to the server socket, but it doesn't accept
the connection on the server side. So what happens in the kernel is that 3 sockets exist: server
listening socket, client socket and a newborn socket hanging on the accept queue of the server.
The newborn socket isn't yet known to the process. The process must call accept(2) to get a file
descriptor of that socket. And, yes, the file descriptor for it doesn't yet exist. So, we have a valid
UNIX socket without file descriptor. So, when we grab 'kern.files' sysctl, and then we grab
'net.local.stream.pcblist', in the latter we have a socket that shall not have a match in the
array obtained in 'kern.files', and this is correct.

What can be done here? Either you need to remove the unix_socketpair() function and use
socket.socketpair() Python function. That may not work on Windows. Either you need to accept
the socket in the unix_socketpair(), and make it return 3 sockets: (server, client, accepted).
And in this case better rename it from unix_socketpair() to something else.

@giampaolo
Copy link
Owner

giampaolo commented May 16, 2017

Oh I see. Yes, I agree the function name is confusing as it collides with socketpair(2).

I don't understand why this is an issue though as both sockstats utility and psutil.Process.connections() (which uses a different routine than psutil.net_connections()) are able to fetch that socket.

I think the problem is that that specific socket is skipped because we can't determine the PID (-1), hence why the test fails (proc connections are 7, sys connections are 6).

If I put a debugger in test_multi_socks and call sockstat I see it's able to see it:

(Pdb) os.system('sockstat | grep %s' % os.getpid())
vagrant  python2.7  10279 3  tcp4   *:56890               *:*
vagrant  python2.7  10279 6  udp4   *:56155               *:*
vagrant  python2.7  10279 7  tcp6   *:56892               *:*
vagrant  python2.7  10279 8  udp6   *:54982               *:*
vagrant  python2.7  10279 9  stream /tmp/$testfnQSBFCX
vagrant  python2.7  10279 10 stream -> /tmp/$testfnQSBFCX
vagrant  python2.7  10279 11 dgram  /tmp/$testfnCXlcW3
(Pdb) os.system('sockstat | grep %s | wc -l' % os.getpid())
 7

@glebius
Copy link
Contributor

glebius commented May 18, 2017

Yes, you were right, the non-accepted socket lead me to wrong path. The problem appeared to be set truncation. Pull request sent.

@giampaolo
Copy link
Owner

Thanks. Fixed in #1079. Closing this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants