0%

【异构计算】在Docker中使用GPU

我们在 GPU 与 CUDA 编程入门 这篇博客中初步介绍了如何Linux上使用GPU的方法,随着容器和k8s的迅猛发展,人们对于在容器中使用GPU的需求越发强烈。本文将基于前文,继续介绍如何在容器中使用GPU,进一步地,介绍在Kubernetes中如何调度GPU,并以Tensorflow为例,介绍如何基于Docker搭建部署了GPU的深度学习开发环境。

NVIDIA Container Toolkit

背景介绍

容器最早是用于无缝部署基于CPU的应用,它们对于硬件和平台是无感知的,但是显然这种使用场景对于GPU并不适用。对于不同的GPU,需要机器安装不同的硬件驱动,这极大限制了在容器中使用GPU。为了解决这个问题,最早的一种使用方法是在容器中完全重新安装一次NVIDIA驱动,然后将在容器启动的时候将GPU以字符设备 /dev/nvidia0 的方式传递给容器。然而这种方法要求容器中安装的驱动版本与Host上的驱动版本完全一致,同一个Docker Image不能在各个机器上复用,这极大的限制了容器的扩展性。

为了解决上述问题,容器必须对于 NVIDIA 驱动是无感知的,基于此 NVIDIA 推出了 NVIDIA Container Toolkit

nvidia-gpu-docker

如上图所示, NVIDIA 将原来 CUDA 应用依赖的API环境划分为两个部分:

  • 驱动级API:由libcuda.so.major.minor动态库和内核module提供支持,图中表示为CUDA Driver
    • 驱动级API属于底层API,每当NVIDIA公司释放出某一个版本的驱动时,如果你要升级主机上的驱动,那么内核模块和libcuda.so.major.minor这2个文件就必须同时升级到同一个版本,这样原有的程序才能正常工作,
    • 不同版本的驱动不能同时存在于宿主机上
  • 非驱动级API:由动态库libcublas.so等用户空间级别的API组成,图中表示为CUDA Toolkit
    • 非驱动级API的版本号是以Toolkit自身的版本号来管理, 比如cuda-10,cuda-11
    • 不同版本的Toolkit可以同时运行在相同的宿主机上
    • 非驱动级API算是对驱动级API的一种更高级的封装,最终还是要调用驱动级API来实现功能

为了让使用GPU的容器更具可扩展性,关于非驱动级的API被 NVIDIA 打包进了 NVIDIA Container Toolkit,因此在容器中使用GPU之前,每个机器需要先安装好NVIDIA驱动,之后配置好 NVIDIA Container Toolkit之后,就可以在容器中方便使用GPU了。

整体架构

NVIDIA 的容器工具包本质是使用一个nvidia-runc的方式来提供GPU容器的创建, 在用户创建出来的OCI spec上补上几个hook函数,来达到GPU设备运行的准备工作。具体包括以下几个组件,从上到下展示如图:

  • nvidia-docker2
  • nvidia-container-runtime
  • nvidia-container-toolkit
  • libnvidia-container

下面对这几个组件依次介绍:

libnvidia-container

libnvidia-container 提供了一个 library 和一个配置 GNU/Linux 的 Container 使用 NVIDIA GPU 的 client nvidia-container-clilibnvidia-container 的实现依赖于 kernel primitives,并且是对于 container runtime 是无关的。

nvidia-container-cli 的主要作用就是将 NVIDIA GPU 注入到容器中,包括 /dev/nvidia0 设备挂载等操作。下面是抓到的日志信息,可以看到其主要操作包括:

  • 加载内核模块,包括 nvidia/nvidia_uvm/nvidia_modeset 等
  • 创建字符设备,包括 nvidia0,nvidiactl,nvidia-uvm,nvidia-modeset 等
  • Mount GPU设备、NVIDIA 相关库等
1
2
3
4
5
6
7
I0301 09:23:38.589710 4693 nvc_mount.c:218] zz: mounting /dev/nvidiactl at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/dev/nvidiactl
I0301 09:23:38.589733 4693 nvc_mount.c:509] whitelisting device node 195:255
I0301 09:23:38.589789 4693 nvc_mount.c:218] zz: mounting /dev/nvidia-uvm at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/dev/nvidia-uvm
I0301 09:23:38.589809 4693 nvc_mount.c:509] whitelisting device node 233:0
I0301 09:23:38.589855 4693 nvc_mount.c:218] zz: mounting /dev/nvidia-uvm-tools at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/dev/nvidia-uvm-tools
I0301 09:23:38.589875 4693 nvc_mount.c:509] whitelisting device node 233:1
I0301 09:23:38.589959 4693 nvc_mount.c:218] zz: mounting /dev/nvidia0 at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/dev/nvidia0

libnvidia-container 代码中查看,可以看到

1
2
if (xmount(err, src, dst, NULL, MS_BIND, NULL) < 0)
goto fail;

具体就是将 /dev/nvidia0 设备 bind mount 到 container roofs 的 /dev/nvidia0上。

下面是详细日志信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
-- WARNING, the following logs are for debugging purposes only --

I0301 09:23:38.570499 4693 nvc.c:374] initializing library context (version=1.3.3, build=bd9fc3f2b642345301cb2e23de07ec5386232317)
I0301 09:23:38.570591 4693 nvc.c:346] using root /
I0301 09:23:38.570600 4693 nvc.c:347] using ldcache /etc/ld.so.cache
I0301 09:23:38.570606 4693 nvc.c:348] using unprivileged user 65534:65534
I0301 09:23:38.570624 4693 nvc.c:391] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0301 09:23:38.570708 4693 nvc.c:393] dxcore initialization failed, continuing assuming a non-WSL environment
I0301 09:23:38.571856 4698 nvc.c:274] loading kernel module nvidia
I0301 09:23:38.572227 4698 nvc.c:278] running mknod for /dev/nvidiactl
I0301 09:23:38.572285 4698 nvc.c:282] running mknod for /dev/nvidia0
I0301 09:23:38.572324 4698 nvc.c:286] running mknod for all nvcaps in /dev/nvidia-caps
I0301 09:23:38.572342 4698 nvc.c:292] loading kernel module nvidia_uvm
I0301 09:23:38.572382 4698 nvc.c:296] running mknod for /dev/nvidia-uvm
I0301 09:23:38.572472 4698 nvc.c:301] loading kernel module nvidia_modeset
I0301 09:23:38.572606 4698 nvc.c:305] running mknod for /dev/nvidia-modeset
I0301 09:23:38.572891 4699 driver.c:101] starting driver service
I0301 09:23:38.576138 4693 nvc_container.c:388] configuring container with 'compute utility video supervised'
I0301 09:23:38.576499 4693 nvc_container.c:408] setting pid to 4658
I0301 09:23:38.576510 4693 nvc_container.c:409] setting rootfs to /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged
I0301 09:23:38.576516 4693 nvc_container.c:410] setting owner to 0:0
I0301 09:23:38.576522 4693 nvc_container.c:411] setting bins directory to /usr/bin
I0301 09:23:38.576528 4693 nvc_container.c:412] setting libs directory to /usr/lib/x86_64-linux-gnu
I0301 09:23:38.576534 4693 nvc_container.c:413] setting libs32 directory to /usr/lib/i386-linux-gnu
I0301 09:23:38.576541 4693 nvc_container.c:414] setting cudart directory to /usr/local/cuda
I0301 09:23:38.576547 4693 nvc_container.c:415] setting ldconfig to @/sbin/ldconfig (host relative)
I0301 09:23:38.576553 4693 nvc_container.c:416] setting mount namespace to /proc/4658/ns/mnt
I0301 09:23:38.576559 4693 nvc_container.c:418] setting devices cgroup to /sys/fs/cgroup/devices/docker/5e868a1fa27cc187630a4b41cbdc8cc50b29a1aa35984f69f00b298db75caf4d
I0301 09:23:38.576571 4693 nvc_info.c:680] requesting driver information with ''
I0301 09:23:38.578231 4693 nvc_info.c:169] selecting /usr/lib64/vdpau/libvdpau_nvidia.so.418.126.02
I0301 09:23:38.578374 4693 nvc_info.c:169] selecting /usr/lib64/libnvoptix.so.418.126.02
I0301 09:23:38.578442 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-tls.so.418.126.02
I0301 09:23:38.578478 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-rtcore.so.418.126.02
I0301 09:23:38.578515 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-ptxjitcompiler.so.418.126.02
I0301 09:23:38.578565 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-opticalflow.so.418.126.02
I0301 09:23:38.578614 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-opencl.so.418.126.02
I0301 09:23:38.578651 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-ml.so.418.126.02
I0301 09:23:38.578699 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-ifr.so.418.126.02
I0301 09:23:38.578749 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-glvkspirv.so.418.126.02
I0301 09:23:38.578782 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-glsi.so.418.126.02
I0301 09:23:38.578815 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-glcore.so.418.126.02
I0301 09:23:38.578851 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-fbc.so.418.126.02
I0301 09:23:38.578896 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-fatbinaryloader.so.418.126.02
I0301 09:23:38.578941 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-encode.so.418.126.02
I0301 09:23:38.578989 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-eglcore.so.418.126.02
I0301 09:23:38.579029 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-compiler.so.418.126.02
I0301 09:23:38.579071 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-cfg.so.418.126.02
I0301 09:23:38.579115 4693 nvc_info.c:169] selecting /usr/lib64/libnvidia-cbl.so.418.126.02
I0301 09:23:38.579150 4693 nvc_info.c:169] selecting /usr/lib64/libnvcuvid.so.418.126.02
I0301 09:23:38.579314 4693 nvc_info.c:169] selecting /usr/lib64/libcuda.so.418.126.02
I0301 09:23:38.579405 4693 nvc_info.c:169] selecting /usr/lib64/libGLX_nvidia.so.418.126.02
I0301 09:23:38.579440 4693 nvc_info.c:169] selecting /usr/lib64/libGLESv2_nvidia.so.418.126.02
I0301 09:23:38.579474 4693 nvc_info.c:169] selecting /usr/lib64/libGLESv1_CM_nvidia.so.418.126.02
I0301 09:23:38.579507 4693 nvc_info.c:169] selecting /usr/lib64/libEGL_nvidia.so.418.126.02
I0301 09:23:38.579547 4693 nvc_info.c:169] selecting /usr/lib/vdpau/libvdpau_nvidia.so.418.126.02
I0301 09:23:38.579588 4693 nvc_info.c:169] selecting /usr/lib/libnvidia-tls.so.418.126.02
I0301 09:23:38.579626 4693 nvc_info.c:169] selecting /usr/lib/libnvidia-ptxjitcompiler.so.418.126.02
I0301 09:23:38.579673 4693 nvc_info.c:169] selecting /usr/lib/libnvidia-opticalflow.so.418.126.02
I0301 09:23:38.579721 4693 nvc_info.c:169] selecting /usr/lib/libnvidia-opencl.so.418.126.02
I0301 09:23:38.579755 4693 nvc_info.c:169] selecting /usr/lib/libnvidia-ml.so.418.126.02
I0301 09:23:38.579802 4693 nvc_info.c:169] selecting /usr/lib/libnvidia-ifr.so.418.126.02
I0301 09:23:38.579850 4693 nvc_info.c:169] selecting /usr/lib/libnvidia-glvkspirv.so.418.126.02
I0301 09:23:38.579883 4693 nvc_info.c:169] selecting /usr/lib/libnvidia-glsi.so.418.126.02
I0301 09:23:38.579916 4693 nvc_info.c:169] selecting /usr/lib/libnvidia-glcore.so.418.126.02
I0301 09:23:38.579960 4693 nvc_info.c:169] selecting /usr/lib/libnvidia-fbc.so.418.126.02
I0301 09:23:38.580007 4693 nvc_info.c:169] selecting /usr/lib/libnvidia-fatbinaryloader.so.418.126.02
I0301 09:23:38.580039 4693 nvc_info.c:169] selecting /usr/lib/libnvidia-encode.so.418.126.02
I0301 09:23:38.580085 4693 nvc_info.c:169] selecting /usr/lib/libnvidia-eglcore.so.418.126.02
I0301 09:23:38.580121 4693 nvc_info.c:169] selecting /usr/lib/libnvidia-compiler.so.418.126.02
I0301 09:23:38.580158 4693 nvc_info.c:169] selecting /usr/lib/libnvcuvid.so.418.126.02
I0301 09:23:38.580209 4693 nvc_info.c:169] selecting /usr/lib/libcuda.so.418.126.02
I0301 09:23:38.580258 4693 nvc_info.c:169] selecting /usr/lib/libGLX_nvidia.so.418.126.02
I0301 09:23:38.580291 4693 nvc_info.c:169] selecting /usr/lib/libGLESv2_nvidia.so.418.126.02
I0301 09:23:38.580324 4693 nvc_info.c:169] selecting /usr/lib/libGLESv1_CM_nvidia.so.418.126.02
I0301 09:23:38.580356 4693 nvc_info.c:169] selecting /usr/lib/libEGL_nvidia.so.418.126.02
W0301 09:23:38.580373 4693 nvc_info.c:350] missing library libnvidia-allocator.so
W0301 09:23:38.580381 4693 nvc_info.c:350] missing library libnvidia-ngx.so
W0301 09:23:38.580387 4693 nvc_info.c:354] missing compat32 library libnvidia-cfg.so
W0301 09:23:38.580393 4693 nvc_info.c:354] missing compat32 library libnvidia-allocator.so
W0301 09:23:38.580405 4693 nvc_info.c:354] missing compat32 library libnvidia-ngx.so
W0301 09:23:38.580412 4693 nvc_info.c:354] missing compat32 library libnvidia-rtcore.so
W0301 09:23:38.580418 4693 nvc_info.c:354] missing compat32 library libnvoptix.so
W0301 09:23:38.580424 4693 nvc_info.c:354] missing compat32 library libnvidia-cbl.so
I0301 09:23:38.580710 4693 nvc_info.c:276] selecting /usr/bin/nvidia-smi
I0301 09:23:38.580731 4693 nvc_info.c:276] selecting /usr/bin/nvidia-debugdump
I0301 09:23:38.580750 4693 nvc_info.c:276] selecting /usr/bin/nvidia-persistenced
I0301 09:23:38.580771 4693 nvc_info.c:276] selecting /usr/bin/nvidia-cuda-mps-control
I0301 09:23:38.580791 4693 nvc_info.c:276] selecting /usr/bin/nvidia-cuda-mps-server
I0301 09:23:38.580821 4693 nvc_info.c:438] listing device /dev/nvidiactl
I0301 09:23:38.580827 4693 nvc_info.c:438] listing device /dev/nvidia-uvm
I0301 09:23:38.580833 4693 nvc_info.c:438] listing device /dev/nvidia-uvm-tools
I0301 09:23:38.580839 4693 nvc_info.c:438] listing device /dev/nvidia-modeset
W0301 09:23:38.580868 4693 nvc_info.c:321] missing ipc /var/run/nvidia-persistenced/socket
W0301 09:23:38.580884 4693 nvc_info.c:321] missing ipc /tmp/nvidia-mps
I0301 09:23:38.580891 4693 nvc_info.c:745] requesting device information with ''
I0301 09:23:38.588416 4693 nvc_info.c:628] listing device /dev/nvidia0 (GPU-a99e5631-5bcb-b5e1-9a08-e64a25effe1e at 00000000:00:05.0)
I0301 09:23:38.588494 4693 nvc_mount.c:354] mounting tmpfs at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/proc/driver/nvidia
I0301 09:23:38.588826 4693 nvc_mount.c:112] mounting /usr/bin/nvidia-smi at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/bin/nvidia-smi
I0301 09:23:38.588866 4693 nvc_mount.c:112] mounting /usr/bin/nvidia-debugdump at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/bin/nvidia-debugdump
I0301 09:23:38.588899 4693 nvc_mount.c:112] mounting /usr/bin/nvidia-persistenced at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/bin/nvidia-persistenced
I0301 09:23:38.588945 4693 nvc_mount.c:112] mounting /usr/bin/nvidia-cuda-mps-control at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/bin/nvidia-cuda-mps-control
I0301 09:23:38.588980 4693 nvc_mount.c:112] mounting /usr/bin/nvidia-cuda-mps-server at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/bin/nvidia-cuda-mps-server
I0301 09:23:38.589177 4693 nvc_mount.c:112] mounting /usr/lib64/libnvidia-ml.so.418.126.02 at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.418.126.02
I0301 09:23:38.589226 4693 nvc_mount.c:112] mounting /usr/lib64/libnvidia-cfg.so.418.126.02 at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.418.126.02
I0301 09:23:38.589268 4693 nvc_mount.c:112] mounting /usr/lib64/libcuda.so.418.126.02 at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/lib/x86_64-linux-gnu/libcuda.so.418.126.02
I0301 09:23:38.589308 4693 nvc_mount.c:112] mounting /usr/lib64/libnvidia-opencl.so.418.126.02 at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.418.126.02
I0301 09:23:38.589350 4693 nvc_mount.c:112] mounting /usr/lib64/libnvidia-ptxjitcompiler.so.418.126.02 at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.418.126.02
I0301 09:23:38.589391 4693 nvc_mount.c:112] mounting /usr/lib64/libnvidia-fatbinaryloader.so.418.126.02 at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.126.02
I0301 09:23:38.589440 4693 nvc_mount.c:112] mounting /usr/lib64/libnvidia-compiler.so.418.126.02 at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.418.126.02
I0301 09:23:38.589484 4693 nvc_mount.c:112] mounting /usr/lib64/vdpau/libvdpau_nvidia.so.418.126.02 at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/lib/x86_64-linux-gnu/libvdpau_nvidia.so.418.126.02
I0301 09:23:38.589524 4693 nvc_mount.c:112] mounting /usr/lib64/libnvidia-encode.so.418.126.02 at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.418.126.02
I0301 09:23:38.589566 4693 nvc_mount.c:112] mounting /usr/lib64/libnvidia-opticalflow.so.418.126.02 at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.418.126.02
I0301 09:23:38.589611 4693 nvc_mount.c:112] mounting /usr/lib64/libnvcuvid.so.418.126.02 at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/lib/x86_64-linux-gnu/libnvcuvid.so.418.126.02
I0301 09:23:38.589631 4693 nvc_mount.c:534] creating symlink /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.1
I0301 09:23:38.589657 4693 nvc_mount.c:534] creating symlink /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so -> libnvidia-opticalflow.so.1
I0301 09:23:38.589710 4693 nvc_mount.c:218] zz: mounting /dev/nvidiactl at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/dev/nvidiactl
I0301 09:23:38.589733 4693 nvc_mount.c:509] whitelisting device node 195:255
I0301 09:23:38.589789 4693 nvc_mount.c:218] zz: mounting /dev/nvidia-uvm at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/dev/nvidia-uvm
I0301 09:23:38.589809 4693 nvc_mount.c:509] whitelisting device node 233:0
I0301 09:23:38.589855 4693 nvc_mount.c:218] zz: mounting /dev/nvidia-uvm-tools at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/dev/nvidia-uvm-tools
I0301 09:23:38.589875 4693 nvc_mount.c:509] whitelisting device node 233:1
I0301 09:23:38.589959 4693 nvc_mount.c:218] zz: mounting /dev/nvidia0 at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/dev/nvidia0
I0301 09:23:38.590046 4693 nvc_mount.c:422] mounting /proc/driver/nvidia/gpus/0000:00:05.0 at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged/proc/driver/nvidia/gpus/0000:00:05.0
I0301 09:23:38.590069 4693 nvc_mount.c:509] whitelisting device node 195:0
I0301 09:23:38.590097 4693 nvc_ldcache.c:360] executing /sbin/ldconfig from host at /var/lib/docker/overlay2/09af6a668c457545500c0bc7e152195750c2ccfe948daeae6d8a573a1e738ba0/merged
W0301 09:23:38.598613 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.418.67 is empty, not checked.
W0301 09:23:38.598725 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.418.67 is empty, not checked.
W0301 09:23:38.599063 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.1 is empty, not checked.
W0301 09:23:38.599085 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1 is empty, not checked.
W0301 09:23:38.599139 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libvdpau_nvidia.so.1 is empty, not checked.
W0301 09:23:38.599310 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.418.67 is empty, not checked.
W0301 09:23:38.599355 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.418.67 is empty, not checked.
W0301 09:23:38.599417 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libcuda.so.1 is empty, not checked.
W0301 09:23:38.599468 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libcuda.so is empty, not checked.
W0301 09:23:38.599589 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvcuvid.so.418.67 is empty, not checked.
W0301 09:23:38.599609 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so is empty, not checked.
W0301 09:23:38.599628 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1 is empty, not checked.
W0301 09:23:38.599729 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libcuda.so.418.67 is empty, not checked.
W0301 09:23:38.599749 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1 is empty, not checked.
W0301 09:23:38.599770 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 is empty, not checked.
W0301 09:23:38.599819 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libvdpau_nvidia.so.418.67 is empty, not checked.
W0301 09:23:38.599839 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 is empty, not checked.
W0301 09:23:38.599860 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.418.67 is empty, not checked.
W0301 09:23:38.599951 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.418.67 is empty, not checked.
W0301 09:23:38.600048 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.67 is empty, not checked.
W0301 09:23:38.600056 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.418.67 is empty, not checked.
W0301 09:23:38.600076 4693 utils.c:121] /sbin/ldconfig: File /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.1 is empty, not checked.
I0301 09:23:38.622024 4693 nvc.c:429] shutting down library context
I0301 09:23:38.628575 4699 driver.c:156] terminating driver service
I0301 09:23:38.629013 4693 driver.c:196] driver service terminated successfully

nvidia-container-toolkit

nvidia-container-toolkitrunCPreStart Hook 的时候调用,此时 Container 已经被创建,但是还没有启动。nvidia-container-toolkit 的主要作用是搜集信息(比如 container 的 roofs 路径),搜集在 config.json 的信息,拼凑起来 nvidia-container-cli 的参数,并调用 nvidia-container-cli,调用参数为:

1
2
3
4
5
6
7
8
9
10
11
/usr/bin/nvidia-container-cli
--load-kmods # Load kernel modules
--debbug=/var/log/nvidia-container-toolkit.log # Log debug information configure
--ldconfig=@/sbin/ldconfig
---device=all # Device UUID(s) or index(es) to isolate
--compute # Enable compute capability
--utility # Enable utility capability
--video # Enable video capability
--require=cuda>=9.0
--pid=3409 # Container PID
/var/lib/ddocker/overlay2/f5a884006ac0e5a1390809bf09209d9e47f2a400d305048b358d7fed735ef799/merged

nvidia-container-runtime

在执行 docker run 的时候,加上 --runtime=nvidia 参数,就会将 docker 的 runtime 从 runC 变成 nvidia-container-runtimenvidia-docker-runtime 本质上就是对 runC 的一个简单封装,它把 runC Spec 当作输入,将 nvidia-container-toolkit 作为 PreStart Hook,然后调用 runC

github.com/NVIDIA/nvidia-container-runtime/src/main.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
func addNVIDIAHook(spec *specs.Spec) error {
path, err := exec.LookPath("nvidia-container-runtime-hook")
args := []string{path}
spec.Hooks.Prestart = append(spec.Hooks.Prestart, specs.Hook{
Path: path,
Args: append(args, "prestart"),
})

return nil
}

func execRunc() {
runcPath, err := exec.LookPath("docker-runc") //没找到
runcPath, err = exec.LookPath("runc")
syscall.Exec(runcPath, append([]string{runcPath}, os.Args[1:]...), os.Environ())
}

func main() {
addNVIDIAHook(&spec)
execRunc()
}

注意,这里的 nvidia-container-runtime-hook 实际上就是执行 /usr/bin/nvidia-container-toolkit 的软链接。

当安装了 nvidia-container-runtime 之后,需要修改 Docker 的 daemon.json 来使其生效,或者显示制定 --runtime 参数。

1
2
3
4
5
6
7
8
9
/etc/docker/daemon.json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}

nvidia-docker2

nvidia-docker2 是整个 NVIDIA Container Toolkit 中唯一与 docker 相关的包,它的作用在用户 docker run/create 的时候,添加 --runtime=nvidia 的参数,然后调用上面的 nvidia-container-runtime 进行后面的一系列操作,将 GPU 注入到容器中。它也支持设置 NV_GPU 参数来指定哪一个 GPU 来注射到 容器中。

nvidia-docker 本质上就是一个 Shell 脚本,内容如下所示:

github.com/NVIDIA/nvidia-docker/nvidia-docker
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#! /bin/bash
# Copyright (c) 2017-2018, NVIDIA CORPORATION. All rights reserved.

NV_DOCKER=${NV_DOCKER:-"docker"}

DOCKER_ARGS=()
NV_DOCKER_ARGS=()
while [ $# -gt 0 ]; do
arg=$1
shift
DOCKER_ARGS+=("$arg")
case $arg in
run|create)
NV_DOCKER_ARGS+=("--runtime=nvidia")
if [ ! -z "${NV_GPU}" ]; then
NV_DOCKER_ARGS+=(-e NVIDIA_VISIBLE_DEVICES="${NV_GPU// /,}")
fi
break
;;
version)
printf "NVIDIA Docker: @VERSION@\n"
break
;;
--)
break
;;
esac
done

if [ ! -z $NV_DEBUG ]; then
set -x
fi

exec $NV_DOCKER "${DOCKER_ARGS[@]}" "${NV_DOCKER_ARGS[@]}" "$@"

部署验证

这里仍然基于腾讯云的 CentOS 7机器为例演示如何在安装配置 NVIDIA Container Toolkit,对于更多的平台可以参考其官方文档

安装 Docker CE

1
2
3
$ curl https://get.docker.com | sh \
&& sudo systemctl start docker \
&& sudo systemctl enable docker

安装 NVIDIA Container Toolkit

Setup the stable repository and the GPG key:

1
2
3
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

Install the nvidia-docker2 package (and dependencies) after updating the package listing:

1
$ sudo apt-get update
1
$ sudo apt-get install -y nvidia-docker2

Restart the Docker daemon to complete the installation after setting the default runtime:

1
$ sudo systemctl restart docker

At this point, a working setup can be tested by running a base CUDA container:

1
$ sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

This should result in a console output shown below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 34C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

配置 NVIDIA Runtime

To register the nvidia runtime, use the method below that is best suited to your environment. You might need to merge the new argument with your existing configuration. Three options are available:

Systemd drop-in file

1
$ sudo mkdir -p /etc/systemd/system/docker.service.d
1
2
3
4
5
$ sudo tee /etc/systemd/system/docker.service.d/override.conf <<EOF
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
EOF
1
2
$ sudo systemctl daemon-reload \
&& sudo systemctl restart docker

Daemon configuration file

The nvidia runtime can also be registered with Docker using the daemon.json configuration file:

1
2
3
4
5
6
7
8
9
10
$ sudo tee /etc/docker/daemon.json <<EOF
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
EOF
1
sudo pkill -SIGHUP dockerd

You can optionally reconfigure the default runtime by adding the following to /etc/docker/daemon.json:

1
"default-runtime": "nvidia"

Command Line

Use dockerd to add the nvidia runtime:

1
$ sudo dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime [...]

在k8s中管理GPU

为了在 k8s 中管理和使用GPU,我们除了需要配置 NVIDIA Container Toolkit,还需要安装NVIDIA推出的 NVIDIA/k8s-device-plugin,具体安装可以参考 我的这篇博文。上面的步骤加起来显得还是有些繁琐,如果你直接使用腾讯云 TKE 的话,在集群添加装有GPU的Node时候,就会自动帮你安装配置好 NVIDIA Container ToolkitNVIDIA/k8s-device-plugin,十分方便。接下来我们以Tensorflow为例,演示在 k8s 环境运行有GPU的Tensorflow。

单机版Tensorflow

首先是单机版的Tensorflow,执行 kubectl apply -f tensorflow.yaml来运行 Jupiter Notebook

tensorflow.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
apiVersion: apps/v1
kind: Deployment
metadata:
name: tensorflow
labels:
k8s-app: tensorflow
spec:
replicas: 1
selector:
matchLabels:
k8s-app: tensorflow
template:
metadata:
labels:
k8s-app: tensorflow
spec:
containers:
- name: tensorflow
image: tensorflow/tensorflow:2.2.1-gpu-py3-jupyter
ports:
- containerPort: 8888
resources:
limits:
cpu: 4
memory: 2Gi
requests:
cpu: 2
memory: 1Gi
---
apiVersion: v1
kind: Service
metadata:
name: jupyter-service
spec:
type: NodePort
ports:
- port: 80
targetPort: 8888
name: tensorflow
selector:
k8s-app: tensorflow

我们看到容器很快运行起来,根据 http:<nodeIP>:<nodePort> 可以访问到 Jupiter Notebook,但是显示需要token:

查看 Tensorflow 日志,可以获得 token:aa06c9f12d80adac1a6288b97bf8030522cecc92202dbb20

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[root@VM-1-14-centos single]# kubectl get pods
NAME READY STATUS RESTARTS AGE
tensorflow-6cbc85744b-c567p 1/1 Running 0 7m37s
[root@VM-1-14-centos single]# kubectl logs tensorflow-6cbc85744b-c567p

________ _______________
___ __/__________________________________ ____/__ /________ __
__ / _ _ \_ __ \_ ___/ __ \_ ___/_ /_ __ /_ __ \_ | /| / /
_ / / __/ / / /(__ )/ /_/ / / _ __/ _ / / /_/ /_ |/ |/ /
/_/ \___//_/ /_//____/ \____//_/ /_/ /_/ \____/____/|__/


WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.

To avoid this, run the container by specifying your user's userid:

$ docker run -u $(id -u):$(id -g) args...

[I 04:47:52.083 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
[I 04:47:52.315 NotebookApp] Serving notebooks from local directory: /tf
[I 04:47:52.315 NotebookApp] Jupyter Notebook 6.1.4 is running at:
[I 04:47:52.315 NotebookApp] http://tensorflow-6cbc85744b-c567p:8888/?token=aa06c9f12d80adac1a6288b97bf8030522cecc92202dbb20
[I 04:47:52.315 NotebookApp] or http://127.0.0.1:8888/?token=aa06c9f12d80adac1a6288b97bf8030522cecc92202dbb20
[I 04:47:52.315 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 04:47:52.319 NotebookApp]

To access the notebook, open this file in a browser:
file:///root/.local/share/jupyter/runtime/nbserver-1-open.html
Or copy and paste one of these URLs:
http://tensorflow-6cbc85744b-c567p:8888/?token=aa06c9f12d80adac1a6288b97bf8030522cecc92202dbb20
or http://127.0.0.1:8888/?token=aa06c9f12d80adac1a6288b97bf8030522cecc92202dbb20
[I 04:49:28.692 NotebookApp] 302 GET / (172.16.0.193) 0.57ms
[I 04:49:28.700 NotebookApp] 302 GET /tree? (172.16.0.193) 0.67ms

登陆之后即可看到 Jupiter Notebook

新建Notebook,运行命令如下:

可以看到,TensorFlow 支持在GPU上的运算

  • "/device:GPU:0":TensorFlow 可见的机器上第一个 GPU 的速记表示法。
  • "/job:localhost/replica:0/task:0/device:GPU:0":TensorFlow 可见的机器上第一个 GPU 的完全限定名称。

分布式Tensorflow

整体架构:

这个架构图是分布式tensorflow的实战图,其中有

  • 两个参数服务
  • 多个worker服务
  • 还有个shuffle和抽样的服务

shuffle就是对样根据其标签进行混排,然后对外提供batch抽样服务(可以是有放回和无放回,抽样是一门科学,详情可以参考抽样技术一书),每个batch的抽样是由每个worker去触发,worker拿到抽样的数据样本ID后就去基于kubernetes构建的分布式数据库里边提取该batchSize的样本数据,进行训练计算,由于分布式的tensorflow能够保证异步梯度下降算法,所以每次训练batch数据的时候都会基于最新的参数迭代,然而,更新参数操作就是两个参数服务做的,架构中模型(参数)的存储在NFS中,这样以来,参数服务与worker就可以共享参数了,最后说明一下,我们训练的所有数据都是存储在分布式数据库中(数据库的选型可以根据具体的场景而定)。为什么需要一个shuffle和抽样的服务,因为当数据量很大的时候,我们如果对所有的样本数据进行shuffle和抽样计算的话会浪费很大的资源,因此需要一个这样的服务专门提取数据的(id,label)来进行混排和抽样,这里如果(id, label)的数据量也很大的时候我们可以考虑基于spark 来分布式的进行shuffle和抽样,目前spark2.3已经原生支持kubernetes调度

首先是 Parameter Server

tf-ps.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: tensorflow-ps
spec:
replicas: 1
template:
metadata:
labels:
name: tensorflow-ps
role: ps
spec:
containers:
- name: ps
image: tensorflow/tensorflow:2.2.1-gpu-py3-jupyter
ports:
- containerPort: 2222
resources:
limits:
cpu: 4
memory: 2Gi
requests:
cpu: 2
memory: 1Gi
volumeMounts:
- mountPath: /datanfs
readOnly: false
name: nfs
volumes:
- name: nfs
nfs:
server: 你的nfs服务地址
path: "/data/nfs"
---
apiVersion: v1
kind: Service
metadata:
name: tensorflow-ps-service
labels:
name: tensorflow-ps
role: service
spec:
ports:
- port: 2222
targetPort: 2222
selector:
name: tensorflow-ps

然后是 Worker

tf-worker.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: tensorflow-worker
spec:
replicas: 3
template:
metadata:
labels:
name: tensorflow-worker
role: worker
spec:
containers:
- name: worker
image: tensorflow/tensorflow:2.2.1-gpu-py3-jupyter
ports:
- containerPort: 2222
resources:
limits:
cpu: 4
memory: 2Gi
requests:
cpu: 2
memory: 1Gi
volumeMounts:
- mountPath: /datanfs
readOnly: false
name: nfs
volumes:
- name: nfs
nfs:
server: 你的nfs服务地址
path: "/data/nfs"
---
apiVersion: v1
kind: Service
metadata:
name: tensorflow-wk-service
labels:
name: tensorflow-worker
spec:
ports:
- port: 2222
targetPort: 2222
selector:
name: tensorflow-worker

参考资料