Skip to content

Commit

Permalink
drm/ttm: Schedule delayed_delete worker closer
Browse files Browse the repository at this point in the history
Try to allocate system memory on the NUMA node the device is closest to
and try to run delayed_delete workers on a CPU of this node as well.

To optimize the memory clearing operation when a TTM BO gets freed by
the delayed_delete worker, scheduling it closer to a NUMA node where the
memory was initially allocated helps avoid the cases where the worker
gets randomly scheduled on the CPU cores that are across interconnect
boundaries such as xGMI, PCIe etc.

This change helps USWC GTT allocations on NUMA systems (dGPU) and AMD
APU platforms such as GFXIP9.4.3.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231111130856.1168304-1-rajneesh.bhardwaj@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>
  • Loading branch information
rajbhar authored and ChristianKoenigAMD committed Nov 27, 2023
1 parent 38f922a commit b0a7ce5
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 3 deletions.
8 changes: 7 additions & 1 deletion drivers/gpu/drm/ttm/ttm_bo.c
Original file line number Diff line number Diff line change
Expand Up @@ -370,7 +370,13 @@ static void ttm_bo_release(struct kref *kref)
spin_unlock(&bo->bdev->lru_lock);

INIT_WORK(&bo->delayed_delete, ttm_bo_delayed_delete);
queue_work(bdev->wq, &bo->delayed_delete);

/* Schedule the worker on the closest NUMA node. This
* improves performance since system memory might be
* cleared on free and that is best done on a CPU core
* close to it.
*/
queue_work_node(bdev->pool.nid, bdev->wq, &bo->delayed_delete);
return;
}

Expand Down
6 changes: 4 additions & 2 deletions drivers/gpu/drm/ttm/ttm_device.c
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,8 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func
if (ret)
return ret;

bdev->wq = alloc_workqueue("ttm", WQ_MEM_RECLAIM | WQ_HIGHPRI, 16);
bdev->wq = alloc_workqueue("ttm",
WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16);
if (!bdev->wq) {
ttm_global_release();
return -ENOMEM;
Expand All @@ -213,7 +214,8 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func
bdev->funcs = funcs;

ttm_sys_man_init(bdev);
ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32);

ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32);

bdev->vma_manager = vma_manager;
spin_lock_init(&bdev->lru_lock);
Expand Down

0 comments on commit b0a7ce5

Please sign in to comment.