Skip to content
thiell edited this page Mar 4, 2012 · 2 revisions

This proposes to compare ClusterShell and the famous pdsh which clush aims to replace and provide more extended features.

Compatible

First of all, ClusterShell was developed to be easily used by people previously using pdsh. As a consequence, the command line tools like clush and clubak supports very similar behaviour and options.

clush

  • clush standard command line is the same:
$ pdsh -w foo[1-5] echo "Hello World"

is

$ clush -w foo[1-5] echo "Hello World"
  • host selection options are supported (-w -x -g -X)
  • ssh related options are supported (-f -t -u -l)
  • File copies are supported. Equivalent to pdcp and rpdcp are available through clush options

And other ones. All simple pdsh command could be adapted simply changing the command name to clush.

clubak

clubak is a replacement tool fordshbak, which is commonly used with pdsh to regroup similar outputs. clubak feature is directly available in clush. You do not have to call another external tool. If you need it anyway:
clubak
and
clubak -c
are supported.

But there are plugins

Pdsh offers possibilities to add plugins to connect nodes or select them. Those plugins should dynamic libraries using pdsh C interface. ClusterShell provides 3 ways to extend its features which can be simply shell commands or Python extensions.

  • NodeGroups provides an easy way to plug clush to any external node database.
  • Softwate used to connect to other nodes could be easily done implementing a new Python class.

Most of pdsh plugin feature could be available with clush.

More features

But ClusterShell does not aim to reimplement pdsh in Python. There is much more features!

Group of node handling

ClusterShell introduces the nodeset command and its backend which ables to easily manipulates ranges of nodes.

$ nodeset -c nova[0-7,32-159]
136
$ nodeset -f nova[0-7,32-159] nova[160-163]
nova[0-7,32-163]
$ nodeset -f @oss,@mds
node[2-9]

All details are available in the nodeset, NodeSet and NodeGroups wiki pages.

Integrated dshbak

For some reasons its common to cancel of pdsh execution because a node is hang. If you are also using dshbak, due to the pipe, all nodes output will be lost.

$ pdsh -w foo[1-5] ls /remote/nfs/ | dshbak -c

Now hit Ctrl-C. No output will be printed, even if all nodes have successfully run the command.

  • Output is not lost even if you hit Ctrl+C
$ clush -b -w foo[1-5] uname -r
Warning: Caught keyboard interrupt!
---------------
foo[2-4] (3)
---------------
2.6.31.6-145.fc11
---------------
foo5
---------------
2.6.18-164.11.1.el5
Keyboard interrupt (foo1 did not complete).

Nice outputs

ClusterShell improves administrator experience with several new features like:

  • Automatic same output merging
  • Stdout and stderr handling
  • Nodeset size, colors, ...

Easy modification of ssh options

$ clush '-o -X' -w foo[1-5] xterm

Supports stdin forwarding

  • Diff /etc/motd content with the same file on a group of nodes
$ cat /etc/motd | clush -b -w foo[1-5] diff - /etc/motd
  • Binary content is supported:
$ tar -Cf - /tmp | clush -w foo[1-5] tar xfv - 

Library

ClusterShell was first intended to be an event-based, distributed, command execution library, in Python. All command line tool features are accessible through the Python API to offer possibilities to easily write sequential or event-based program.

Some of the possibilities are presented in the following topics:

And it is very fast !

Some could say that as ClusterShell is a Python library, it should be slow. Here is a short benchmark comparing a clush command and pdsh command and compute the time they needed to run a simple command on a lot of nodes. As you can see, ClusterShell outperforms pdsh mostly all the time. As soon as more than 100 nodes are involved, ClusterShell is faster and scales better. The more nodes you add the larger the difference is.

There is a very little overhead due to Python interpretor that become insignificant when you are running real commands. Moreover Python language helps a lot in doing easy developing of ClusterShell where raw C could be really a pain.