dmlc · RAMitchell · May 28, 2019 · May 27, 2019 · May 28, 2019
diff --git a/doc/gpu/index.rst b/doc/gpu/index.rst
@@ -67,11 +67,6 @@ The experimental parameter ``single_precision_histogram`` can be set to True to
 
 The device ordinal can be selected using the ``gpu_id`` parameter, which defaults to 0.
 
-Multiple GPUs can be used with the ``gpu_hist`` tree method using the ``n_gpus`` parameter. which defaults to 1. If this is set to -1 all available GPUs will be used.  If ``gpu_id`` is specified as non-zero, the selected gpu devices will be from ``gpu_id`` to ``gpu_id+n_gpus``, please note that ``gpu_id+n_gpus`` must be less than or equal to the number of available GPUs on your system.  As with GPU vs. CPU, multi-GPU will not always be faster than a single GPU due to PCI bus bandwidth that can limit performance.
-
-.. note:: Enabling multi-GPU training
-
-  Default installation may not enable multi-GPU training. To use multiple GPUs, make sure to read :ref:`build_gpu_support`.
 
 The GPU algorithms currently work with CLI, Python and R packages. See :doc:`/build` for details.
 
@@ -82,6 +77,24 @@ The GPU algorithms currently work with CLI, Python and R packages. See :doc:`/bu
   param['max_bin'] = 16
   param['tree_method'] = 'gpu_hist'
 
+
+Single Node Multi-GPU
+=====================
+Multiple GPUs can be used with the ``gpu_hist`` tree method using the ``n_gpus`` parameter. which defaults to 1. If this is set to -1 all available GPUs will be used.  If ``gpu_id`` is specified as non-zero, the selected gpu devices will be from ``gpu_id`` to ``gpu_id+n_gpus``, please note that ``gpu_id+n_gpus`` must be less than or equal to the number of available GPUs on your system.  As with GPU vs. CPU, multi-GPU will not always be faster than a single GPU due to PCI bus bandwidth that can limit performance.
+
+.. note:: Enabling multi-GPU training
+
+  Default installation may not enable multi-GPU training. To use multiple GPUs, make sure to read :ref:`build_gpu_support`.
+XGBoost supports multi-GPU training on a single machine via specifying the `n_gpus' parameter.
+
+
+Multi-node Multi-GPU Training
+=============================
+XGBoost supports fully distributed GPU training using `Dask
+<https://dask.org/>`_. See Python documentation :ref:`dask_api` and worked examples `here
+<https://github.com/dmlc/xgboost/tree/master/demo/dask>`_.
+
+
 Objective functions
 ===================
 Most of the objective functions implemented in XGBoost can be run on GPU.  Following table shows current support status.
@@ -209,6 +222,7 @@ References
 Contributors
 =======
 Many thanks to the following contributors (alphabetical order):
+
 * Andrey Adinets
 * Jiaming Yuan
 * Jonathan C. McKinney

diff --git a/doc/python/python_api.rst b/doc/python/python_api.rst
@@ -74,6 +74,8 @@ Callback API
 
 .. autofunction:: xgboost.callback.early_stop
 
+.. _dask_api:
+
 Dask API
 --------
 .. automodule:: xgboost.dask
@@ -83,3 +85,4 @@ Dask API
 .. autofunction:: xgboost.dask.create_worker_dmatrix
 
 .. autofunction:: xgboost.dask.get_local_data
+
diff --git a/python-package/xgboost/dask.py b/python-package/xgboost/dask.py
@@ -43,6 +43,7 @@ def _start_tracker(n_workers):
 def get_local_data(data):
     """
     Unpacks a distributed data object to get the rows local to this worker
+
     :param data: A distributed dask data object
     :return: Local data partition e.g. numpy or pandas
     """
@@ -107,6 +108,7 @@ def run(client, func, *args):
     dask by default, unless the user overrides the nthread parameter.
 
     Note: Windows platforms are not officially supported. Contributions are welcome here.
+
     :param client: Dask client representing the cluster
     :param func: Python function to be executed by each worker. Typically contains xgboost
     training code.