도커에서 텐서플로우 GPU를 사용하기 위한 환경설정

텐서플로우를 사용하기 위한 환경 설정은 여러가지 방법이 있지만 개발된 딥러닝 알고리즘에 따라 사용되는 버전과 종속성이 다양하기 때문에 도커 환경에서 설치하면 쉽게 버전별 관리가 가능합니다.

오늘은 도커 환경에서 텐서플로우를 사용하기 위한 환경설정방법을 알아보도록 하겠습니다.

도커를 이용해서 텐서플로우를 설치할 때 특징은 아래와 같습니다.

CUDA 툴킷을 설치할 필요없이 호스트에 Nvidia GPU 드라이버만 있도 사용이 가능합니다. 즉, 도커에는 Tensorflow와 그 버전에 맞는 CUDA 환경이 이미 갖추어져 있습니다.
텐서플로우 버전에 맞는 이미지를 다운받아서 사용하므로 버전종속성 문제를 해결할 수 있습니다.
다만, 호스트에 Docker 가 설치되어 있어야 합니다.(Docker를 설치)

* 도커 사용법은 아래 게시물을 참고하시기 바랍니다.

https://swiftcam.tistory.com/404

Docker의 개념, 설치 및 사용방법

1. Docker 의 개념 Image 환경을 구성하기 위한 Instruction이 들어가 있는 저장소 Container Image의 Instruction을 이용하여 만들어진 환경 하나의 Image를 이용하여 여러 종류의 Container를 만들어낼 수 있음..

swiftcam.tistory.com

제가 사용했던 개발환경은 아래와 같습니다.

Intel I7 CPU + Nvidia GTX1070(nvidia-470 driver installed)
Ubuntu 20.04

1. Tensorflow 이미지 설치

tensorflow 이미지는 docker hub(tensorflow/tensorflow)에서 내려받을 수 있습니다.
내려받을 때 여러가지 태그옵션이 적용가능한데 태그는 아래를 참조하시면 됩니다.

아래의 경우는 2.1.0 GPU버전와 python3 최신버전을 받을 경우입니다. 옵션은 '-' 표시와 함께 계속 추가할 수 있고, jupyter 노트북도 사용할 수 있습니다.

swift@Eagle:~/Desktop$ docker pull tensorflow/tensorflow:2.1.0-gpu-py3

2.1.0-gpu-py3: Pulling from tensorflow/tensorflow
7ddbc47eeb70: Already exists 
c1bbdc448b72: Already exists 
8c3b70e39044: Already exists 
45d437916d57: Already exists 
d8f1569ddae6: Already exists 
85386706b020: Already exists 
ee9b457b77d0: Already exists 
bebfcc1316f7: Pull complete 
644140fd95a9: Pull complete 
d6c0f989e873: Pull complete 
7a8e64f26211: Pull complete 
c33b03e4dd22: Pull complete 
bca93af797c1: Pull complete 
47f6c197be35: Pull complete 
e5da48aa9554: Pull complete 
ca68d98a90c4: Pull complete 
Digest: sha256:1010e051dde4a9b62532a80f4a9a619013eafc78491542d5ef5da796cc2697ae
Status: Downloaded newer image for tensorflow/tensorflow:2.1.0-gpu-py3
docker.io/tensorflow/tensorflow:2.1.0-gpu-py3

swift@Eagle:~/Desktop$ 
swift@Eagle:~/Desktop$ 
swift@Eagle:~/Desktop$ docker images

REPOSITORY              TAG             IMAGE ID       CREATED        SIZE
ros/melodic             base                    1e0c5294cec0   2 weeks ago    3.01GB
hello-world             latest                  feb5d9fea6a5   4 months ago   13.3kB
tensorflow/tensorflow   2.1.0-gpu-py3-jupyter   ce8f7398433c   2 years ago    4.26GB
tensorflow/tensorflow   2.1.0-gpu-py3           e2a4af785bdb   2 years ago    4.11GB

2. 도커의 실행

1) bash를 이용하는 방법

bash를 이용하게 되면 터미널에서 여러가지 작업이 가능합니다. bash를 사용하기 위해서는 도커 실행시 -it옵션과 /bin/bash를 넣어주어야 합니다.

-it 터미널을 통한 인터렉티브 상태로 실행
/bin/bash: 터미널에서 bash shell을 실행
--gpus all GPU를 사용하기 위한 옵션
-v 호스트의 폴더를 컨테이너 안에서도 사용이 가능하도록 공유

IMG=tensorflow/tensorflow:2.1.0-gpu-py3

docker run -it \
    --init \
    --gpus all	\
    --ipc=host \
    --shm-size=8G \
    --privileged \
    --net=host \
    -e DISPLAY=$DISPLAY \
    -e XDG_RUNTIME_DIR=/run/user/1000 \
    -e QT_GRAPHICSSYSTEM=native \
    -e USER=$USER \
    --env=UDEV=1 \
    --env=LIBUSB_DEBUG=1 \
    --env="DISPLAY" \
    --env="QT_X11_NO_MITSHM=1" \
    ${ENV_PARAMS[@]} \
    -v /home/$USER/workspace:/home/$USER/workspace \
    -v /dev:/dev \
    -v /tmp/.X11-unix:/tmp/.X11-unix:rw \
    $IMG \
    ${OTHER_PARAMS[@]}  \
    /bin/bash

실행결과는 아래와 같이 tensorflow가 정상적으로 로딩되는 것을 확인할 수 있습니다.

그리고 GPU 사용여부도 확인했을때 정상적으로 사용이 가능함도 확인했습니다. 굉장히 간단하고 편리합니다.

________                               _______________                
___  __/__________________________________  ____/__  /________      __
__  /  _  _ \_  __ \_  ___/  __ \_  ___/_  /_   __  /_  __ \_ | /| / /
_  /   /  __/  / / /(__  )/ /_/ /  /   _  __/   _  / / /_/ /_ |/ |/ / 
/_/    \___//_/ /_//____/ \____//_/    /_/      /_/  \____/____/|__/


WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.

To avoid this, run the container by specifying your user's userid:

$ docker run -u $(id -u):$(id -g) args...

root@Eagle:/# python

Python 2.7.17 (default, Nov  7 2019, 10:07:09) 
[GCC 7.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2022-01-28 19:00:37.107873: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2022-01-28 19:00:37.108987: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
>>> print(tf.__version__)

2.1.0
>>> gpu_available = tf.test.is_gpu_available()

WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2022-01-28 19:05:01.218089: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-01-28 19:05:01.245702: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3199980000 Hz
2022-01-28 19:05:01.246454: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555800bd4c40 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-01-28 19:05:01.246471: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-01-28 19:05:01.248443: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-01-28 19:05:01.327897: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-28 19:05:01.328450: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555800c07770 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-01-28 19:05:01.328464: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce GTX 1070, Compute Capability 6.1
2022-01-28 19:05:01.328562: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-28 19:05:01.328875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.683GHz coreCount: 15 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 238.66GiB/s
2022-01-28 19:05:01.328903: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-01-28 19:05:01.328923: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2022-01-28 19:05:01.330020: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2022-01-28 19:05:01.330215: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2022-01-28 19:05:01.331361: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2022-01-28 19:05:01.332001: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2022-01-28 19:05:01.332024: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-01-28 19:05:01.332069: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-28 19:05:01.332394: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-28 19:05:01.332664: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2022-01-28 19:05:01.332685: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-01-28 19:05:01.478527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-01-28 19:05:01.478556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2022-01-28 19:05:01.478561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2022-01-28 19:05:01.478719: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-28 19:05:01.479087: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-28 19:05:01.479473: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/device:GPU:0 with 7057 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
>>>

2) jupyter 노트북을 이용하는 방법

jupyter 노트북을 사용하려면 텐서플로우 이미지는 jupyter 노트북의 이미지를 사용해야 합니다. jupyter 노트북에서는 /bin/bash를 사용하지 않습니다.

-it 터미널을 통한 인터렉티브 상태로 실행
-p 8000:8888 호스트의 8000번을 8888로 포워딩, 8000포트가 jupyter 노트북의 디폴트 포트입니다.
--gpus all GPU를 사용하기 위한 옵션

IMG=tensorflow/tensorflow:2.1.0-gpu-py3-jupyter

docker run -it \
    --init \
    --gpus all	\
    --ipc=host \
    --shm-size=8G \
    --privileged \
    -e DISPLAY=$DISPLAY \
    -e XDG_RUNTIME_DIR=/run/user/1000 \
    -e QT_GRAPHICSSYSTEM=native \
    -e USER=$USER \
    --env=UDEV=1 \
    --env=LIBUSB_DEBUG=1 \
    --env="DISPLAY" \
    --env="QT_X11_NO_MITSHM=1" \
    ${ENV_PARAMS[@]} \
    -v /home/$USER/workspace:/home/$USER/workspace \
    -v /dev:/dev \
    -v /tmp/.X11-unix:/tmp/.X11-unix:rw \
    -p 8000:8888	\
    $IMG \
    ${OTHER_PARAMS[@]}

실행결과는 아래와 같이 텐서플로우의 도커가 실행이 되면서 동시에 jupyter 노트북의 서버가 실행됩니다.

___  __/__________________________________  ____/__  /________      __
__  /  _  _ \_  __ \_  ___/  __ \_  ___/_  /_   __  /_  __ \_ | /| / /
_  /   /  __/  / / /(__  )/ /_/ /  /   _  __/   _  / / /_/ /_ |/ |/ / 
/_/    \___//_/ /_//____/ \____//_/    /_/      /_/  \____/____/|__/


WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.

To avoid this, run the container by specifying your user's userid:

$ docker run -u $(id -u):$(id -g) args...

[I 21:36:40.418 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
jupyter_http_over_ws extension initialized. Listening on /http_over_websocket
[I 21:36:40.539 NotebookApp] Serving notebooks from local directory: /tf
[I 21:36:40.539 NotebookApp] The Jupyter Notebook is running at:
[I 21:36:40.539 NotebookApp] http://d7bbd37af124:8888/?token=aa21a1481c3faef243d57dfd2bb8b7d19a6e079c38baefb3
[I 21:36:40.539 NotebookApp]  or http://127.0.0.1:8888/?token=aa21a1481c3faef243d57dfd2bb8b7d19a6e079c38baefb3
[I 21:36:40.539 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 21:36:40.542 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///root/.local/share/jupyter/runtime/nbserver-7-open.html
    Or copy and paste one of these URLs:
        http://d7bbd37af124:8888/?token=aa21a1481c3faef243d57dfd2bb8b7d19a6e079c38baefb3
     or http://127.0.0.1:8888/?token=aa21a1481c3faef243d57dfd2bb8b7d19a6e079c38baefb3
[I 21:39:31.657 NotebookApp] 302 GET / (172.17.0.1) 0.67ms
[I 21:39:31.661 NotebookApp] 302 GET /tree? (172.17.0.1) 0.66ms
[I 21:39:37.598 NotebookApp] 302 POST /login?next=%2Ftree%3F (172.17.0.1) 1.95ms

http://127.0.0.1:8000/ jupyter 노트북이 실행되는 웹브라우저상의 주소와 포트입니다.

웹브라우저에 복사하면 노트북이 실행됩니다.

토큰은 옆에있는 키값들을 복사해서 붙혀넣으면 접속이 됩니다.

오른쪽 new에서 python3로 새로운 노트북을 만들어서 텐서플로우 버전을 확인해보았습니다.

잘 동작하네요.

만약 도커에서 jupyter 노트북이외에 터미널이 더 필요하다면 아래와 같이 하나더 만들 수 있습니다.

docker exec -it <CONTAINER_NAME> bash

* References

1. docker에서 Tensorflow 설치

https://www.tensorflow.org/install/docker?hl=ko

Docker | TensorFlow

Help protect the Great Barrier Reef with TensorFlow on Kaggle Join Challenge Docker Docker는 컨테이너를 사용하여 TensorFlow 설치를 나머지 시스템에서 격리하는 가상 환경을 만듭니다. TensorFlow 프로그램은 호스트 머

www.tensorflow.org

728x90

저작자표시 (새창열림)

'엔지니어링 > 인공지능' 카테고리의 다른 글

StarGAN V2 텐서플로우 모델 학습하기 (0)	2022.01.29
Colab을 이용한 머신러닝 학습 - 환경설정 (0)	2022.01.27
구글 Colab 환경 설정 및 Github에서 노트북 가져오기 (0)	2021.12.31
[ 머신러닝 ] 모두를 위한 머신러닝 도커환경설정 (0)	2021.12.28
[ 머신러닝 예제 ] MacOS에 python 텐서플로우 설치환경 구축 (0)	2021.12.17

Hello, World!

도커에서 텐서플로우 GPU를 사용하기 위한 환경설정

1. Tensorflow 이미지 설치

2. 도커의 실행

* References

'엔지니어링 > 인공지능' 카테고리의 다른 글

댓글

티스토리툴바

도커에서 텐서플로우 GPU를 사용하기 위한 환경설정

1. Tensorflow 이미지 설치

2. 도커의 실행

* References

'엔지니어링 > 인공지능' 카테고리의 다른 글

관련글

댓글

티스토리툴바