容器裝置介面 (CDI)
容器裝置介面 (CDI) 是一個旨在標準化裝置(如 GPU、FPGA 和其他硬體加速器)如何暴露給容器並被容器使用的規範。其目標是為在容器化環境中使用硬體裝置提供更一致和安全的機制,解決與裝置特定設定和配置相關的挑戰。
除了使容器能夠與裝置節點互動外,CDI 還允許您為裝置指定額外配置,例如環境變數、主機掛載(例如共享物件)和可執行鉤子。
開始使用
要開始使用 CDI,您需要設定相容的環境。這包括安裝 Docker v27+ 並配置 CDI,以及 Buildx v0.22+。
您還需要在以下位置之一使用 JSON 或 YAML 檔案建立裝置規範:
/etc/cdi
/var/run/cdi
/etc/buildkit/cdi
注意如果您直接使用 BuildKit,可以透過在
buildkitd.toml
配置檔案的cdi
部分設定specDirs
選項來更改位置。如果您使用 Docker Daemon 和docker
驅動程式進行構建,請參閱配置 CDI 裝置文件。
注意如果您正在 WSL 上建立容器構建器,需要確保已安裝 Docker Desktop 並啟用了 WSL 2 GPU 半虛擬化。還需要 Buildx v0.27+ 才能在容器中掛載 WSL 庫。
使用簡單的 CDI 規範進行構建
讓我們從一個簡單的 CDI 規範開始,它將一個環境變數注入到構建環境中,並將其寫入 /etc/cdi/foo.yaml
cdiVersion: "0.6.0"
kind: "vendor1.com/device"
devices:
- name: foo
containerEdits:
env:
- FOO=injected
檢查 default
構建器,驗證 vendor1.com/device
是否被檢測為裝置
$ docker buildx inspect
Name: default
Driver: docker
Nodes:
Name: default
Endpoint: default
Status: running
BuildKit version: v0.23.2
Platforms: linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/amd64/v4, linux/386
Labels:
org.mobyproject.buildkit.worker.moby.host-gateway-ip: 172.17.0.1
Devices:
Name: vendor1.com/device=foo
Automatically allowed: false
GC Policy rule#0:
All: false
Filters: type==source.local,type==exec.cachemount,type==source.git.checkout
Keep Duration: 48h0m0s
Max Used Space: 658.9MiB
GC Policy rule#1:
All: false
Keep Duration: 1440h0m0s
Reserved Space: 4.657GiB
Max Used Space: 953.7MiB
Min Free Space: 2.794GiB
GC Policy rule#2:
All: false
Reserved Space: 4.657GiB
Max Used Space: 953.7MiB
Min Free Space: 2.794GiB
GC Policy rule#3:
All: true
Reserved Space: 4.657GiB
Max Used Space: 953.7MiB
Min Free Space: 2.794GiB
現在讓我們建立一個 Dockerfile 來使用這個裝置
# syntax=docker/dockerfile:1-labs
FROM busybox
RUN --device=vendor1.com/device \
env | grep ^FOO=
這裡我們使用 RUN --device
命令,並設定 vendor1.com/device
,它請求規範中可用的第一個裝置。在這種情況下,它使用 foo
,即 /etc/cdi/foo.yaml
中的第一個裝置。
注意
RUN --device
命令僅在labs
通道中提供,從 Dockerfile frontend v1.14.0-labs 開始,目前尚未在穩定語法中可用。
現在讓我們構建這個 Dockerfile
$ docker buildx build .
[+] Building 0.4s (5/5) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 155B 0.0s
=> resolve image config for docker-image://docker/dockerfile:1-labs 0.1s
=> CACHED docker-image://docker/dockerfile:1-labs@sha256:9187104f31e3a002a8a6a3209ea1f937fb7486c093cbbde1e14b0fa0d7e4f1b5 0.0s
=> [internal] load metadata for docker.io/library/busybox:latest 0.1s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
ERROR: failed to build: failed to solve: failed to load LLB: device vendor1.com/device=foo is requested by the build but not allowed
它失敗了,因為裝置 vendor1.com/device=foo
沒有被構建自動允許,如上面 buildx inspect
輸出所示
Devices:
Name: vendor1.com/device=foo
Automatically allowed: false
要允許該裝置,您可以使用 docker buildx build
命令的 --allow
標誌
$ docker buildx build --allow device .
或者您可以在 CDI 規範中設定 org.mobyproject.buildkit.device.autoallow
註解,以自動允許所有構建使用該裝置
cdiVersion: "0.6.0"
kind: "vendor1.com/device"
devices:
- name: foo
containerEdits:
env:
- FOO=injected
annotations:
org.mobyproject.buildkit.device.autoallow: true
現在再次使用 --allow device
標誌執行構建
$ docker buildx build --progress=plain --allow device .
#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 159B done
#1 DONE 0.0s
#2 resolve image config for docker-image://docker/dockerfile:1-labs
#2 DONE 0.1s
#3 docker-image://docker/dockerfile:1-labs@sha256:9187104f31e3a002a8a6a3209ea1f937fb7486c093cbbde1e14b0fa0d7e4f1b5
#3 CACHED
#4 [internal] load metadata for docker.io/library/busybox:latest
#4 DONE 0.1s
#5 [internal] load .dockerignore
#5 transferring context: 2B done
#5 DONE 0.0s
#6 [1/2] FROM docker.io/library/busybox:latest@sha256:f85340bf132ae937d2c2a763b8335c9bab35d6e8293f70f606b9c6178d84f42b
#6 CACHED
#7 [2/2] RUN --device=vendor1.com/device env | grep ^FOO=
#7 0.155 FOO=injected
#7 DONE 0.2s
構建成功,輸出顯示 FOO
環境變數已按 CDI 規範的規定注入到構建環境中。
設定支援 GPU 的容器構建器
在本節中,我們將向您展示如何使用 NVIDIA GPU 設定容器構建器。自 Buildx v0.22 以來,當建立新的容器構建器時,如果主機在核心中安裝了 GPU 驅動程式,則會自動向容器構建器新增 GPU 請求。這類似於使用 --gpus=all
與 docker run
命令。
注意我們製作了一個特殊製作的 BuildKit 映象,因為當前的 BuildKit 釋出映象基於 Alpine,不支援 NVIDIA 驅動程式。以下映象基於 Ubuntu,並安裝了 NVIDIA 客戶端庫,如果構建期間請求了裝置,則在容器構建器中為您的 GPU 生成 CDI 規範。此映象暫時託管在 Docker Hub 上的
crazymax/buildkit:v0.23.2-ubuntu-nvidia
下。
現在讓我們使用 Buildx 建立一個名為 gpubuilder
的容器構建器
$ docker buildx create --name gpubuilder --driver-opt "image=crazymax/buildkit:v0.23.2-ubuntu-nvidia" --bootstrap
#1 [internal] booting buildkit
#1 pulling image crazymax/buildkit:v0.23.2-ubuntu-nvidia
#1 pulling image crazymax/buildkit:v0.23.2-ubuntu-nvidia 1.0s done
#1 creating container buildx_buildkit_gpubuilder0
#1 creating container buildx_buildkit_gpubuilder0 8.8s done
#1 DONE 9.8s
gpubuilder
讓我們檢查這個構建器
$ docker buildx inspect gpubuilder
Name: gpubuilder
Driver: docker-container
Last Activity: 2025-07-10 08:18:09 +0000 UTC
Nodes:
Name: gpubuilder0
Endpoint: unix:///var/run/docker.sock
Driver Options: image="crazymax/buildkit:v0.23.2-ubuntu-nvidia"
Status: running
BuildKit daemon flags: --allow-insecure-entitlement=network.host
BuildKit version: v0.23.2
Platforms: linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
Labels:
org.mobyproject.buildkit.worker.executor: oci
org.mobyproject.buildkit.worker.hostname: d6aa9cbe8462
org.mobyproject.buildkit.worker.network: host
org.mobyproject.buildkit.worker.oci.process-mode: sandbox
org.mobyproject.buildkit.worker.selinux.enabled: false
org.mobyproject.buildkit.worker.snapshotter: overlayfs
Devices:
Name: nvidia.com/gpu
On-Demand: true
GC Policy rule#0:
All: false
Filters: type==source.local,type==exec.cachemount,type==source.git.checkout
Keep Duration: 48h0m0s
Max Used Space: 488.3MiB
GC Policy rule#1:
All: false
Keep Duration: 1440h0m0s
Reserved Space: 9.313GiB
Max Used Space: 93.13GiB
Min Free Space: 188.1GiB
GC Policy rule#2:
All: false
Reserved Space: 9.313GiB
Max Used Space: 93.13GiB
Min Free Space: 188.1GiB
GC Policy rule#3:
All: true
Reserved Space: 9.313GiB
Max Used Space: 93.13GiB
Min Free Space: 188.1GiB
我們可以看到在構建器中檢測到 nvidia.com/gpu
供應商作為裝置,這意味著檢測到了驅動程式。
您可以選擇使用 nvidia-smi
檢查容器中是否提供了 NVIDIA GPU 裝置
$ docker exec -it buildx_buildkit_gpubuilder0 nvidia-smi -L
GPU 0: Tesla T4 (UUID: GPU-6cf00fa7-59ac-16f2-3e83-d24ccdc56f84)
使用 GPU 支援進行構建
讓我們建立一個簡單的 Dockerfile 來使用 GPU 裝置
# syntax=docker/dockerfile:1-labs
FROM ubuntu
RUN --device=nvidia.com/gpu nvidia-smi -L
現在使用我們之前建立的 gpubuilder
構建器執行構建
$ docker buildx --builder gpubuilder build --progress=plain .
#0 building with "gpubuilder" instance using docker-container driver
...
#7 preparing device nvidia.com/gpu
#7 0.000 > apt-get update
...
#7 4.872 > apt-get install -y gpg
...
#7 10.16 Downloading NVIDIA GPG key
#7 10.21 > apt-get update
...
#7 12.15 > apt-get install -y --no-install-recommends nvidia-container-toolkit-base
...
#7 17.80 time="2025-04-15T08:58:16Z" level=info msg="Generated CDI spec with version 0.8.0"
#7 DONE 17.8s
#8 [2/2] RUN --device=nvidia.com/gpu nvidia-smi -L
#8 0.527 GPU 0: Tesla T4 (UUID: GPU-6cf00fa7-59ac-16f2-3e83-d24ccdc56f84)
#8 DONE 1.6s
您可能已經注意到,步驟 #7
正在透過安裝客戶端庫和工具包來準備 nvidia.com/gpu
裝置,以生成 GPU 的 CDI 規範。
然後使用 GPU 裝置在容器中執行 nvidia-smi -L
命令。輸出顯示了 GPU UUID。
您可以使用以下命令檢查容器構建器中生成的 CDI 規範
$ docker exec -it buildx_buildkit_gpubuilder0 cat /etc/cdi/nvidia.yaml
對於此處使用的 EC2 例項 g4dn.xlarge
,它看起來像這樣
cdiVersion: 0.6.0
containerEdits:
deviceNodes:
- path: /dev/nvidia-modeset
- path: /dev/nvidia-uvm
- path: /dev/nvidia-uvm-tools
- path: /dev/nvidiactl
env:
- NVIDIA_VISIBLE_DEVICES=void
hooks:
- args:
- nvidia-cdi-hook
- create-symlinks
- --link
- ../libnvidia-allocator.so.1::/usr/lib/x86_64-linux-gnu/gbm/nvidia-drm_gbm.so
hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
- args:
- nvidia-cdi-hook
- create-symlinks
- --link
- libcuda.so.1::/usr/lib/x86_64-linux-gnu/libcuda.so
hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
- args:
- nvidia-cdi-hook
- enable-cuda-compat
- --host-driver-version=570.133.20
hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
- args:
- nvidia-cdi-hook
- update-ldcache
- --folder
- /usr/lib/x86_64-linux-gnu
hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
mounts:
- containerPath: /run/nvidia-persistenced/socket
hostPath: /run/nvidia-persistenced/socket
options:
- ro
- nosuid
- nodev
- bind
- noexec
- containerPath: /usr/bin/nvidia-cuda-mps-control
hostPath: /usr/bin/nvidia-cuda-mps-control
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-cuda-mps-server
hostPath: /usr/bin/nvidia-cuda-mps-server
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-debugdump
hostPath: /usr/bin/nvidia-debugdump
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-persistenced
hostPath: /usr/bin/nvidia-persistenced
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-smi
hostPath: /usr/bin/nvidia-smi
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libcuda.so.570.133.20
hostPath: /usr/lib/x86_64-linux-gnu/libcuda.so.570.133.20
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libcudadebugger.so.570.133.20
hostPath: /usr/lib/x86_64-linux-gnu/libcudadebugger.so.570.133.20
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.570.133.20
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.570.133.20
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.570.133.20
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.570.133.20
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-gpucomp.so.570.133.20
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-gpucomp.so.570.133.20
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.570.133.20
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.570.133.20
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-nscq.so.570.133.20
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-nscq.so.570.133.20
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.570.133.20
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.570.133.20
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.570.133.20
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.570.133.20
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.570.133.20
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.570.133.20
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11.so.570.133.20
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11.so.570.133.20
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.570.133.20
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.570.133.20
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /lib/firmware/nvidia/570.133.20/gsp_ga10x.bin
hostPath: /lib/firmware/nvidia/570.133.20/gsp_ga10x.bin
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /lib/firmware/nvidia/570.133.20/gsp_tu10x.bin
hostPath: /lib/firmware/nvidia/570.133.20/gsp_tu10x.bin
options:
- ro
- nosuid
- nodev
- bind
devices:
- containerEdits:
deviceNodes:
- path: /dev/nvidia0
name: "0"
- containerEdits:
deviceNodes:
- path: /dev/nvidia0
name: GPU-6cf00fa7-59ac-16f2-3e83-d24ccdc56f84
- containerEdits:
deviceNodes:
- path: /dev/nvidia0
name: all
kind: nvidia.com/gpu
恭喜您使用 BuildKit 和 CDI 完成了第一次 GPU 裝置構建。