Skip to content
Henrik Bengtsson edited this page Feb 11, 2015 · 9 revisions

BatchJobs allows the construction of a cluster of machines which are not managed by a true batch system like TORQUE, but are instead only accessible by SSH log in. As configuring the package works a bit differently in this case and you might wonder how jobs are scheduled to nodes and so on, we will cover all necessary details on this page and also present some nice convenience functions to manage and query such a cluster.

Requirements

  • All machines must run Linux or a Unix-like operating system. Mac OS is fine, too.
  • All machines (head and compute nodes) need a shared file system, e.g. a NFS mount.
  • BatchJobs (including all dependencies) must be installed on all machines. The package version must be identical.
  • You must configure the machines so that passwordless SSH authentication works. Some documentation is provided in the Arch Linux Wiki.

Configuration

Assume you have three compute nodes named tukey, rao and kendall each having 6 cpu cores. In your configuration file you would now set up your cluster functions as follows:

cluster.functions = makeClusterFunctionsSSH(
  makeSSHWorker(nodename="tukey"),
  makeSSHWorker(nodename="rao"),
  makeSSHWorker(nodename="kendall")
)
staged.queries = TRUE

Basically, that's all you have to do to get started if R is in your default PATH. You could now use BatchJobs and BatchExperiments to compute on these machines. BatchJobs will auto-detect the numbers of cores on each machine and start your jobs via R CMD BATCH which is issued through an ssh command.

But let's look at a few more details.

RHOME and R options on the nodes

Let's make it a bit more complicated and assume that on tukey and rao R is installed under "/opt/R/R-current" but on kendall R is installed under "/usr/local/R/". To cope with such an installation you would use the rhome argument of makeSSHWorker:

cluster.functions = makeClusterFunctionsSSH(
  makeSSHWorker(nodename="tukey", rhome="/opt/R/R-current"),
  makeSSHWorker(nodename="rao", rhome="/opt/R/R-current")
  makeSSHWorker(nodename="kendall", rhome="/usr/local/R"),
)
staged.queries = TRUE

If you do not configure rhome, like we did in the first example, the R installation which is on your PATH is used.

In some cases you might also want to control how R is invoked on the nodes when jobs are run. The default flags are

 --no-save --no-restore --no-init-file --no-site-file

If you want to change this, use the r.options argument of makeSSHWorker.

Number of cores on the nodes

These are auto-detected via /proc/cpuinfo. But this takes a little bit of time each time the configuration is sourced (e.g., at package start up), so you might want to set this manually, because you usually know the number of cores in advance:

  makeSSHWorker(nodename="tukey", ncpus=6)

This also allows you to restrict the number of cores used for parallel computation, but there are a few more sophisticated options for such a resource management available.

Resource management

In most environments, you are not the only user is allowed to compute on the configured compute nodes. Competing users might start lengthy R processes or other jobs at any point in time. We should make it clear that BatchJobs with our SSH mode will in such scenarios not give you the full, reasonable features that a true batch system would provide. We will not re-implement them for the SSH mode, if you need fine grained control over resource allocation and job scheduling, consider istalling a job scheduler such as SLURM.

Not all is lost however, there are still lots of nice options to allow you some control over your SSH slaves. These ensure that you will not over-allocate your nodes with jobs and we can also be nice and conservatively leave a few computational resources a for our competing colleagues.

Here are the resource management options in detail that you can set individually for each SSH worker:

  • You can set a max.load. If the load of the node during the last 5 minutes is this value or higher, we consider it occupied an will not schedule a job to it. The default value for this is ncpus-1. If you have the machine for yourself and want to be a bit more aggressive you could set it to ncpus-0.5.
  • You can set a maximum number of jobs per registry. If you set this to 3, no more than 3 jobs of your current registry will ever be scheduled to that worker concurrently. The default value for this is ncpus.
  • You can adjust the process priority by "niceing" your jobs invoked by R CMD BATCH. The Linux command nice will be used for this. Niceness range from -20 (most favorable scheduling) to 19 (least favorable). The default is not to call nice.
  • The running R jobs on each node are monitored. If more than 3x ncpus R jobs are running, no job is submitted to this node (currently hard-wired). We also count the number of "expensive" R jobs. These are defined as jobs which currently create more than 50% load. If the number of these jobs is ncpus or more we also consider the node completely occupied and do not schedule any further jobs on it until the load drops.

Let's assume that kendall should not be used extensively because somebody else wants to use the machine at the same time:

makeSSHWorker(nodename="kendall", ncpus=6, max.load=4, max.jobs=2)

This will make sure that we will never increase the load to more than 4 on this node and we also will only run 2 jobs concurrently.

How the scheduling works

Our implemented SSH scheduler works in the following way: For each job that is submitted via submitJobs, the scheduler figures out which nodes are currently available. This means:

  • The load must be less than max.load.
  • The number of jobs from our registry on this node must be less than max.jobs.
  • The number of "expensive" R processes of any user in total on this node must be less than ncpus.

From the available workers one is then selected randomly (workers with a lower load are selected with higher probability) and the job is started. If no worker is currently available, the waiting mechanism of submitJobs is invoked and after some seconds the scheduler tries to submit again.

Debugging

If you run into problems on your system you have a couple of debugging options. Here are a few hints to get you started:

  • Run the function debugSSH(nodename, rhome, ...) for a node. The function will perform various tests on this node using the passed worker options. Note that this function does not access nor use information specified for your cluster functions in your configuration. Once you've passed all tests and thereby found working arguments for your worker you should transfer required options to your configuration file. If you run into errors and cannot figure out yourself where the problem lies you should contact us.
  • With a valid config file for SSH mode, you can set the option debug=TRUE and the run a simple test, like submitting one trivial job. This will output all issued system commands and their results on your R console, so you (or we) can inspect them.

Administration and cluster queries

We have created a couple of convenience functions in BatchJobs to make routine tasks on SSH clusters a bit simpler.

Accessing R installation information

Maybe you are unsure how R is actually set up on some nodes and want to check that. You could then simply do

getSSHWorkersInfo(c("tukey", "rao"))
Calling function on: tukey,rao.
Node: tukey
R version 2.15.2 (2012-10-26)
Platform: x86_64-unknown-linux-gnu (64-bit)
R Home: /opt/R/R-2.15.2/lib64/R
First lib path: /home/bischl/R/x86_64-unknown-linux-gnu-library/2.15

Node: rao
R version 2.15.2 (2012-10-26)
Platform: x86_64-unknown-linux-gnu (64-bit)
R Home: /opt/R/R-2.15.2/lib64/R
First lib path: /home/bischl/R/x86_64-unknown-linux-gnu-library/2.15

and have a look at R versions, installation paths and lib paths.

The function will display a warning if the first lib path on the worker is not writable as this indicates potential problems in the configuration of the node and installPackagesOnSSHWorkers will not work.

Installing R packages

Installing packages is equally simple for the whole cluster or a subset of machines:

installPackagesOnSSHWorkers(nodenames=c("tukey", "rao"), pkgs="BBmisc")

The command above will successively log into each machine and simply call install.packages on that node. You will see the console output of the package installation on your console on the master.

You could also install the packages in parallel by doing

installPackagesOnSSHWorkers(nodenames=c("tukey", "rao"), pkgs="BBmisc", consecutive=FALSE)

but you will not see any console output during package installation in that case. Also only do this if you are sure that on the node the package will be installed in different directories and you do not write in parallel to the same place.

Querying the state of the cluster

If you would like to figure out how many computational resources are currently available on your SSH cluster, simply call showClusterStatus and have a look at the resulting table:

         ncpus load n.rprocs n.rprocs.50 n.jobs
kendall      6 3.00        8           3      0
rao          6 3.03        8           3      0
tukey        6 9.00        7           5      0

This displays the number of cpus, running R processes, running R processes with more than 50% load and - if you pass a registry to showClusterState(reg) - the number of running jobs of this registry for each node.

Executing arbitrary code on the workers

Can also be done.

callFunctionOnSSHWorkers(c("tukey", "rao"), fun=sessionInfo, simplify=FALSE)

This will call any function on the specified (here: empty) arguments an each node. The execution can be performed in parallel or consecutively, and in the latter case console output can be displayed on the master. Only use this for short administrative tasks, real computation should be done with the usual registry / mapping mechanism from BatchJobs! If you find yourself using this facility for day to day system administration tasks, look at better options such as [Cluster SSH](http://sourceforge.net/apps/mediawiki/clusterssh/index.php?title=Main_Page Cluster SSH) or Fabric.

Submitting jobs

There are only two minor issues specific to SSH clusters that should be mentioned

  1. If at some point your cluster is completely occupied, the jobs submission procedure will go to sleep. You will sometimes see something like this in the status bar: `Status: 1, zzz=10.0s msg=Workers busy: LLJ'. This just gives you a small hint, WHY your machines are occupied. Their is a single character for each node (in the same order as you specified them in the config), and the meaning of the characters is:
    • L: Load too high, more than max.load.
    • J: max.jobs from the registry already running.
    • R: Too many expensive R processes running (across all users on that node).
    • r: Too many R processes (> 3 x ncpus).
  2. If you have many jobs that you cannot schedule all at once, it can be nice to combine this waiting mechanism with a terminal multiplexer such as screen or tmux. Open up R inside a multiplexed session, submit the jobs (maybe configuring the waiting-mechanism so it retries to submit extremely often) and detach. Jobs will now be continuously submitted to the workers once they become available for a long time.