技術工具
- 工具首頁\
- PyTorch/Dev-GPU
### 容器說明
我們從套件原始碼安裝的 PyTorch 已與 NVIDIA TensorRT 整合,這使得模型推理加速變得更為簡單。
於模型訓練方面,本環境亦整合了 Uber Horovod + OpenMPI,這使得單或多節點多 GPU 模型訓練可以更為輕鬆的實現。
此外,GPU 的高速運算能力,使得 CPU 準備資料的速度成為了可能的計算瓶頸。為了使 CPU 能夠快速地執行數值運算,或是減少 CPU 處理資料的耗時,此容器環境亦安裝了高效能的數值運算函式庫: Intel Math Kernel Library (Intel MKL)。此函式庫的整合,能夠讓 Scikit-learn (機器學習套件)、NumPy/Scipy (數值/科學套件) 確保其計算性能可達最高水準。
本容器建置於 2020 年 9 月。
使用本容器之前,請先行於本機安裝NVIDIA驅動 ```440.33```(或以上)的版本。
本容器適用於符合以下計算相容性(Compute Capability; 以下簡稱CC)的 NVIDIA 卡片:
* ```CC3.5```, ```CC3.7``` (Kepler架構)
* ```CC5.0```, ```CC5.2``` (Maxwell架構)
* ```CC6.0```, ```CC6.1``` (Volta架構)
* ```CC7.0```, ```CC7.5``` (Turing架構)
若要確認您的顯卡屬於何種計算相容性,請洽訪[NVIDIA網站](https://developer.nvidia.com/cuda-gpus#compute)。
### 下載方式
請於終端機執行以下指令:
```bash
docker pull moeaidb/aigo:cu10.2-dnn7.6-gpu-pytorch-20.09
```
### 使用方式
#### 使用範例 1: 於背景啟動 Jupyterlab 服務
掛載當前位置目錄 (```$PWD```) 至容器內部的 ```/workspace``` 資料夾,並且讓 Jupyterlab 服務監聽本機的 port ```9999```:
```bash
# 決定 Jupyterlab 該監聽本機的哪一個 port
host_port=9999
# 啟動容器並取得容器 ID
container_id=$(nvidia-docker run --rm -d -p ${host_port}:8888 -v $PWD:/workspace moeaidb/aigo:cu10.1-dnn7.6-gpu-pytorch-19.12) # 休息一會,靜待容器服務啟動
# 等待服務啟動
sleep 2.
# 擷取容器的 Jupyterlab token
notebook_token=$(docker logs ${container_id} 2>&1 | grep -nP "(LabApp.*token=).*" | cut -d"=" -f 2)
# 顯示連線至 Jupyterlab 服務的網址
printf "Open a browser and connect to:\n
http://[your_ip]:${host_port}/?token=${notebook_token}\n
"
```
輸入以上指令於終端機後,應該會顯示一個網址:
```
Open a browser and connect to:
http://[your_ip]:9999/?token=87f6f7ad1455b7dde323f8a570897d4bf9dace8659e0e9bd
```
這代表我們已經在容器內啟動了 Jupyterlab 服務。接著,請開啟瀏覽器,並貼上此網址,即可使用 Jupyterlab 來撰寫 Python 筆記本。
注意事項:
* 網址當中可見```token=87f6f7ad14...```,其中 ```87f6f7ad14...```是亂數產生的一串 token。
由於 token 為隨機字串,因此您實際取得的 token 應和本範例不同。
* 需將```[your_ip]``` 更改為機器的 IP 位址。 若您於本地端使用,則```[your_ip]```應為```127.0.0.1```。
* 開啟 Jupyterlab 後,會自動進入```/workspace```資料夾。
因為建立容器時已將本機當前目錄```${PWD}```掛載至容器內的```/workspace```,所以您應該會在```/workspace```內看到先前存放於本機```${PWD}```的檔案。
#### 使用範例 2: 利用容器環境執行 Python 腳本
```bash
# 建立一個測試腳本。此腳本將單純的匯入 PyTorch,並且印出其當前版本。
printf "import torch \
\nprint('PyTorch version=', torch.__version__)" \
> check_pyt_version.py
# 我們已建立一個位於 ${PWD} 的 Python 腳本。接著,我們試跑一個容器來執行它:
nvidia-docker run -it --rm -v ${PWD}:/workspace \
moeaidb/aigo:cu10.2-dnn7.6-gpu-pytorch-20.09 python3 check_pyt_version.py
```
輸入以上指令於終端機後,應會顯示出 Docker 容器內部所安裝的 PyTorch 版本,如下:
```
PyTorch version= 1.6.0a0+b31f58d
```
若您能見到此訊息,則表示 Python 腳本已順利執行完畢。
#### 使用範例 3: GPU 效能評測(兩顆 GPU)
```bash
nvidia-docker run -it --privileged --rm moeaidb/aigo:cu10.2-dnn7.6-gpu-pytorch-20.09 horovodrun -np 2 python3 horovod_examples/pytorch_synthetic_benchmark.py
```
以上指令使用兩顆 GPU 做運算。執行後應出現類同於以下的輸出結果:
```
...
[1,0]<stdout>:Model: resnet50
[1,0]<stdout>:Batch size: 32
[1,0]<stdout>:Number of GPUs: 2
[1,0]<stdout>:Running warmup...
[1,0]<stdout>:Running benchmark...
[1,0]<stdout>:Iter #0: 232.5 img/sec per GPU
[1,0]<stdout>:Iter #1: 232.0 img/sec per GPU
[1,0]<stdout>:Iter #2: 227.7 img/sec per GPU
[1,0]<stdout>:Iter #3: 228.9 img/sec per GPU
[1,0]<stdout>:Iter #4: 231.4 img/sec per GPU
[1,0]<stdout>:Iter #5: 233.5 img/sec per GPU
[1,0]<stdout>:Iter #6: 227.8 img/sec per GPU
[1,0]<stdout>:Iter #7: 227.7 img/sec per GPU
[1,0]<stdout>:Iter #8: 233.7 img/sec per GPU
[1,0]<stdout>:Iter #9: 226.9 img/sec per GPU
[1,0]<stdout>:Img/sec per GPU: 230.2 +-4.9
[1,0]<stdout>:Total img/sec on 2 GPU(s): 460.4 +-9.9
```
#### 使用範例 4: 顯示套件資訊
AIGO 容器內含一個小程式:```versions_summary```。它可以讓您迅速的了解容器內安裝了哪些套件,以及所安裝的套件是何種版本。 請於終端機執行以下指令:
```bash
nvidia-docker run -it --privileged --rm moeaidb/aigo:cu10.2-dnn7.6-gpu-pytorch-20.09 versions_summary
```
執行後,您應該會看到類同於以下的輸出結果:
```
System INFO:
Python v3.7.5
NVIDIA Driver v440.36
CUDA v10.1.243-1
cuDNN v7.6.5.32-1+cuda10.1
NCCL v2.4.8-1+cuda10.1
Installed Python3 Packages:
[Base]:
torch v1.3.0a0+ee77ccb
torchvision v0.4.2
apex v0.1
horovod v0.18.2
mpi4py v3.0.3
numba v0.46.0
[Numerical]:
numexpr v2.7.0
numpy v1.17.4
scipy v1.3.3
[Data Science]:
sklearn v0.20.4
pandas v0.25.3
matplotlib v3.1.2
seaborn v0.9.0
bokeh v1.4.0
jupyterlab v1.2.3
pyodbc v4.0.27
yacs v0.1.6
[NLP]:
[CV]:
cv2 v3.4.8
detectron2 v0.1
```
### 套件資訊
| 套件/軟體/函式庫名稱 | 版本 | 套件說明 |
|:---------|:---------:|:---------|
| [PyTorch](https://pytorch.org) | 1.6.0 | Tensors and Dynamic neural networks in Python with strong GPU acceleration. </br>由Facebook AI Research (FAIR)維護的,開源的AI模型開發框架。 |
| [torchvision](https://github.com/pytorch/vision) | v0.7.0 | The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.</br>含有一些電腦視覺相關的基本模型和資料集,一般是隨PyTorch一起安裝。 |
| [Apex](https://github.com/NVIDIA/apex) | 0.1 | A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch. </br>除了支持自動混精加速外,還實作了SyncBatchNorm,有效利用多卡加大batch size。 |
| [Python](https://docs.python.org/3.8/whatsnew/changelog.html#python-3-8-5-final) | 3.8.5 | Python is powerful... and fast; plays well with others; runs everywhere; is friendly & easy to learn; is Open. </br>我們環境採用Python 3.8,它於字串處理和檔案搜索方面較Python3.6快很多。 |
| [Horovod](https://github.com/horovod/horovod) | 0.20.0 | Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. </br>使用Uber Horovod可簡易的將AI訓練利用多GPU做加速。 |
| [OpenMPI](https://www.open-mpi.org) | 4.0.4 | A High Performance Message Passing Library. (Required by Uber Horovod) </br>OpenMPI為Uber Horovod所需,可支持跨卡/跨伺服器節點的溝通。 |
| [NVIDIA CUDA](https://developer.nvidia.com/cuda-toolkit) (runtime) | V10.2.89 | The NVIDIA® CUDA® Toolkit provides a development environment for creating high performance GPU-accelerated applications. </br>CUDA為NVIDIA為其GPU所提供的開發框架。所有AI開發框架皆會呼叫其所提供的API。
| [NVIDIA cuDNN](https://developer.nvidia.com/cudnn) (runtime)| 7.6.5.32-1+cuda10.2 | The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. </br>cuDNN是NVIDIA專門為深度神經網路開發所提供的函示庫。 |
| [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) (runtime) | 6.0.1-1+cuda10.2 | NVIDIA TensorRT® is a platform for high-performance deep learning inference. </br>於模型部署階段,可利用NVIDIA TensorRT將模型優化,或將單精度模型以合適的方式轉換成半精度模型,使模型推理能夠以高速運行。 |
| [NVIDIA Collectives Communication Library 2 (NCCL2)](https://developer.nvidia.com/nccl) | v2.7.8-1+cuda10.2 | The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs. </br>使用多GPU訓練時,TensorFlow可利用NVIDIA NCCL做多GPU加速。
| [Intel Math Kernel Library (Intel MKL)](https://software.intel.com/en-us/mkl) | 2020.0-088 | Intel® Math Kernel Library (Intel® MKL) optimizes code with minimal effort for future generations of Intel® processors. </br>針對Intel CPU做快速的數值運算。 |
| [NumPy](https://www.numpy.org) (Intel-MKL-acclerated) | 1.19.1 | NumPy is the fundamental package for scientific computing with Python. </br>常用的數值運算套件 (利用Intel MKL加速)。 |
| [SciPy](https://www.scipy.org) (Intel-MKL-acclerated) | 1.5.2 | SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. </br>常用的科研套件,提供一些基礎算法,統計方法 (利用Intel MKL加速)。 |
| [Scikit-learn](https://scikit-learn.org/stable/#) (Intel-MKL-acclerated) | 0.23.2 | Machine Learning in Python. </br>常用的機器學習套件,提供一些基礎算法,統計方法 (利用Intel MKL加速)。 |
| [OpenCV](https://opencv.org) (Intel-MKL-acclerated) | 3.4.11 | OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. </br> 用於影像處理,以及建立影像相關的機器學習模型。 |
| [Numba](http://numba.pydata.org) | 0.51.2 | Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. </br>Python程式碼經JIT編譯器編譯後,可加速百倍至千倍。 |
| [Numexpr](https://github.com/pydata/numexpr) | 2.7.1 | Fast numerical array expression evaluator for Python, NumPy, PyTables, pandas, bcolz and more. </br>數學表達式經過計算優化後,可提升最高至4倍速。 |
| [pyodbc](https://github.com/mkleehammer/pyodbc) | 4.0.30 | pyodbc is an open source Python module that makes accessing ODBC databases simple. </br>連結資料庫使用。 |
| [Jupyterlab](https://github.com/jupyterlab/jupyterlab) | 2.2.6 | An extensible environment for interactive and reproducible computing, based on the Jupyter Notebook and Architecture. </br>程式碼運行,紀錄,筆記撰寫,皆可存放並整理至筆記本。 |
| [pandas](https://pandas.pydata.org) | 1.1.1 | pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. </br>建立並整理資料表,並且提供簡易的方式將資料表視覺化。 |
| [Matplotlib](https://matplotlib.org) | 3.3.1 | Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. </br>資料視覺化套件,可繪製長條圖,直方統計圖,散點圖等。 |
| [Seaborn](https://seaborn.pydata.org) | 0.11.0 | Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. </br>基於Matplotlib的高階繪圖API; 可接收資料表,自動做groupby後繪圖。 |
| [Bokeh](https://bokeh.pydata.org/en/latest/) | 2.2.1 | Bokeh is an interactive visualization library that targets modern web browsers for presentation. </br>可嵌入至網頁,實現互動式的數據呈現。|
| 套件/軟體/函式庫名稱 | 版本 | 套件說明 |
|:---------|:---------:|:---------|
| [PyTorch](https://pytorch.org) | 1.6.0 | Tensors and Dynamic neural networks in Python with strong GPU acceleration. </br>由Facebook AI Research (FAIR)維護的,開源的AI模型開發框架。 |
| [torchvision](https://github.com/pytorch/vision) | v0.7.0 | The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.</br>含有一些電腦視覺相關的基本模型和資料集,一般是隨PyTorch一起安裝。 |
| [Apex](https://github.com/NVIDIA/apex) | 0.1 | A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch. </br>除了支持自動混精加速外,還實作了SyncBatchNorm,有效利用多卡加大batch size。 |
| [Python](https://docs.python.org/3.8/whatsnew/changelog.html#python-3-8-5-final) | 3.8.5 | Python is powerful... and fast; plays well with others; runs everywhere; is friendly & easy to learn; is Open. </br>我們環境採用Python 3.8,它於字串處理和檔案搜索方面較Python3.6快很多。 |
| [Horovod](https://github.com/horovod/horovod) | 0.20.0 | Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. </br>使用Uber Horovod可簡易的將AI訓練利用多GPU做加速。 |
| [OpenMPI](https://www.open-mpi.org) | 4.0.4 | A High Performance Message Passing Library. (Required by Uber Horovod) </br>OpenMPI為Uber Horovod所需,可支持跨卡/跨伺服器節點的溝通。 |
| [NVIDIA CUDA](https://developer.nvidia.com/cuda-toolkit) (runtime) | V10.2.89 | The NVIDIA® CUDA® Toolkit provides a development environment for creating high performance GPU-accelerated applications. </br>CUDA為NVIDIA為其GPU所提供的開發框架。所有AI開發框架皆會呼叫其所提供的API。
| [NVIDIA cuDNN](https://developer.nvidia.com/cudnn) (runtime)| 7.6.5.32-1+cuda10.2 | The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. </br>cuDNN是NVIDIA專門為深度神經網路開發所提供的函示庫。 |
| [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) (runtime) | 6.0.1-1+cuda10.2 | NVIDIA TensorRT® is a platform for high-performance deep learning inference. </br>於模型部署階段,可利用NVIDIA TensorRT將模型優化,或將單精度模型以合適的方式轉換成半精度模型,使模型推理能夠以高速運行。 |
| [NVIDIA Collectives Communication Library 2 (NCCL2)](https://developer.nvidia.com/nccl) | v2.7.8-1+cuda10.2 | The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs. </br>使用多GPU訓練時,TensorFlow可利用NVIDIA NCCL做多GPU加速。
| [Intel Math Kernel Library (Intel MKL)](https://software.intel.com/en-us/mkl) | 2020.0-088 | Intel® Math Kernel Library (Intel® MKL) optimizes code with minimal effort for future generations of Intel® processors. </br>針對Intel CPU做快速的數值運算。 |
| [NumPy](https://www.numpy.org) (Intel-MKL-acclerated) | 1.19.1 | NumPy is the fundamental package for scientific computing with Python. </br>常用的數值運算套件 (利用Intel MKL加速)。 |
| [SciPy](https://www.scipy.org) (Intel-MKL-acclerated) | 1.5.2 | SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. </br>常用的科研套件,提供一些基礎算法,統計方法 (利用Intel MKL加速)。 |
| [Scikit-learn](https://scikit-learn.org/stable/#) (Intel-MKL-acclerated) | 0.23.2 | Machine Learning in Python. </br>常用的機器學習套件,提供一些基礎算法,統計方法 (利用Intel MKL加速)。 |
| [OpenCV](https://opencv.org) (Intel-MKL-acclerated) | 3.4.11 | OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. </br> 用於影像處理,以及建立影像相關的機器學習模型。 |
| [Numba](http://numba.pydata.org) | 0.51.2 | Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. </br>Python程式碼經JIT編譯器編譯後,可加速百倍至千倍。 |
| [Numexpr](https://github.com/pydata/numexpr) | 2.7.1 | Fast numerical array expression evaluator for Python, NumPy, PyTables, pandas, bcolz and more. </br>數學表達式經過計算優化後,可提升最高至4倍速。 |
| [pyodbc](https://github.com/mkleehammer/pyodbc) | 4.0.30 | pyodbc is an open source Python module that makes accessing ODBC databases simple. </br>連結資料庫使用。 |
| [Jupyterlab](https://github.com/jupyterlab/jupyterlab) | 2.2.6 | An extensible environment for interactive and reproducible computing, based on the Jupyter Notebook and Architecture. </br>程式碼運行,紀錄,筆記撰寫,皆可存放並整理至筆記本。 |
| [pandas](https://pandas.pydata.org) | 1.1.1 | pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. </br>建立並整理資料表,並且提供簡易的方式將資料表視覺化。 |
| [Matplotlib](https://matplotlib.org) | 3.3.1 | Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. </br>資料視覺化套件,可繪製長條圖,直方統計圖,散點圖等。 |
| [Seaborn](https://seaborn.pydata.org) | 0.11.0 | Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. </br>基於Matplotlib的高階繪圖API; 可接收資料表,自動做groupby後繪圖。 |
| [Bokeh](https://bokeh.pydata.org/en/latest/) | 2.2.1 | Bokeh is an interactive visualization library that targets modern web browsers for presentation. </br>可嵌入至網頁,實現互動式的數據呈現。|
請先登入後輸入您的回覆