技術工具細節-AIGO-AI產業實戰應用人才淬煉計畫

### 容器說明我們從套件原始碼安裝的 TensorFlow 和 PyTorch 已與 NVIDIA TensorRT 整合，這使得模型推理加速變得更為簡單。於模型訓練方面，本環境亦整合了 Uber Horovod + OpenMPI。這使得單或多節點多GPU模型訓練可以更為輕鬆的實現。此外，GPU 的高速運算能力，使得 CPU 準備資料的速度成為了可能的計算瓶頸。為了使 CPU 能夠快速地執行數值運算，或是減少 CPU 處理資料的耗時，此容器環境亦安裝了高效能的數值運算函式庫: Intel Math Kernel Library (Intel MKL)。此函式庫的整合，能夠讓 Scikit-learn (機器學習套件)， NumPy/Scipy (數值/科學套件) 確保其計算性能可達最高水準。此容器是設計給自然語言處理的開發者使用，因此，它亦包含了以下套件安裝: Transformers (自然語言處理套件; 於 TensorFlow 和 PyTorch 皆可使用), PyTorch-NLP(自然語言模型建置/訓練), Jieba(中文斷詞), PyHanLP(中文斷詞/依存句法分析, 模型已置於映像檔), NLTK(自然語言處理套件)以及 Gensim (自然語言處理套件)。此容器亦可提供給電腦視覺領域的深度學習開發者使用，因為它亦包含了以下安裝: OpenCV (應用於電腦視覺)，imgaug (應用於圖像增益)，pydicom (用於讀取醫療影像) 以及 GDCM (Grassroots DICOM 圖檔函式庫)。本容器包含了 NVIDIA Apex (for PyTorch)。Apex 是針對含有張量核心(Tensor core)的高階 NVIDIA 顯示卡所開發。Apex 支持 GPU 自動混精度訓練 (Automatic mixed-precision training; AMP)，可使神經網路的訓練速度提升至 1.3x -3x，並且不會降低網路的預測能力。此容器採用較不佔空間的 CUDA Runtime API。本容器建置於 2019 年 12 月。使用本容器之前，請先行於本機安裝 NVIDIA 驅動 418.39(或以上)的版本。 ### 下載方式請於終端機執行以下指令: ```bash docker pull moeaidb/aigo:cu10.1-dnn7.6-gpu-tf-and-pytorch-nlp-19.12 ``` ### 使用方式 #### 使用範例 1: 於背景啟動 Jupyterlab 服務掛載當前位置目錄 (```$PWD```) 至容器內部的 ```/workspace``` 資料夾，並且讓 Jupyterlab 服務監聽本機的 port ```9999```: ```bash # 決定 Jupyterlab 該監聽本機的哪一個 port host_port=9999 # 啟動容器並取得容器 ID container_id=$(nvidia-docker run --rm -d -p ${host_port}:8888 -v $PWD:/workspace moeaidb/aigo:cu10.1-dnn7.6-gpu-tf-and-pytorch-nlp-19.12) # 休息一會，靜待容器服務啟動 # 等待服務啟動 sleep 2. # 擷取容器的 Jupyterlab token notebook_token=$(docker logs ${container_id} 2>&1 | grep -nP "(LabApp.*token=).*" | cut -d"=" -f 2) # 顯示連線至 Jupyterlab 服務的網址 printf "Open a browser and connect to: http://[your_ip]:${host_port}/?token=${notebook_token} " ``` 輸入以上指令於終端機後，應該會顯示一個網址: ```bash Open a browser and connect to: http://[your_ip]:9999/?token=87f6f7ad1455b7dde323f8a570897d4bf9dace8659e0e9bd ``` 這代表我們已經在容器內啟動了 Jupyterlab 服務。接著，請開啟瀏覽器，並貼上此網址，即可使用 Jupyterlab 來撰寫 Python 筆記本。注意事項: * 網址當中可見 ```token=87f6f7ad14...```，其中 ```87f6f7ad14...``` 是亂數產生的一串 token。由於 token 為隨機字串，因此您實際取得的 token 應和本範例不同。 * 需將 ```[your_ip]``` 更改為機器的 IP 位址。若您於本地端使用，則 ```[your_ip]``` 應為 ```127.0.0.1```。 * 開啟 Jupyterlab 後，會自動進入 ```/workspace``` 資料夾。 * 因為建立容器時已將本機當前目錄 ```${PWD}``` 掛載至容器內的 ```/workspace```，所以您應該會在 ```/workspace``` 內看到先前存放於本機 ```${PWD}``` 的檔案。 #### 使用範例 2: 利用容器環境執行 Python 腳本 ```bash # 建立一個測試腳本。此腳本將單純的匯入 PyTorch, TensorFlow，並且印出它們的版本。 printf "import tensorflow as tf \ \nimport torch \ \nprint('TensorFlow version=', tf.__version__) \ \nprint('PyTorch version=', torch.__version__)" > check_pyt_tf_version.py # 我們已建立一個位於 ${PWD} 的 Python 腳本。接著，我們試跑一個容器來執行它: nvidia-docker run -it --rm -v ${PWD}:/workspace \ moeaidb/aigo:cu10.1-dnn7.6-gpu-tf-and-pytorch-nlp-19.12 python3 check_pyt_tf_version.py ``` 輸入以上指令於終端機後，應會顯示出 Docker 容器內部所安裝的 PyTorch 版本，如下: ```bash TensorFlow version= 2.1.0-rc1 PyTorch version= 1.3.0a0+ee77ccb ``` 若您能見到此訊息，則表示 Python 腳本已順利執行完畢。註: 此版本其實就是 ```v1.3.1```。這個版本相當值得使用，因為它修復了先前的一些 bug。詳情請參考[官方說明](https://github.com/pytorch/pytorch/releases/tag/v1.3.1)。 #### 使用範例 3: GPU 效能評測(兩顆 GPU) ```bash nvidia-docker run -it --privileged --rm moeaidb/aigo:cu10.1-dnn7.6-gpu-tf-and-pytorch-nlp-19.12 horovodrun -np 2 python3 horovod_examples/pytorch_synthetic_benchmark.py ``` 以上指令使用兩顆 GPU 做運算。執行後應出現類同於以下的輸出結果: ```bash ... [1,0]<stdout>:Model: resnet50 [1,0]<stdout>:Batch size: 32 [1,0]<stdout>:Number of GPUs: 2 [1,0]<stdout>:Running warmup... [1,0]<stdout>:Running benchmark... [1,0]<stdout>:Iter #0: 239.3 img/sec per GPU [1,0]<stdout>:Iter #1: 228.0 img/sec per GPU [1,0]<stdout>:Iter #2: 223.7 img/sec per GPU [1,0]<stdout>:Iter #3: 229.9 img/sec per GPU [1,0]<stdout>:Iter #4: 227.8 img/sec per GPU [1,0]<stdout>:Iter #5: 229.8 img/sec per GPU [1,0]<stdout>:Iter #6: 230.3 img/sec per GPU [1,0]<stdout>:Iter #7: 228.0 img/sec per GPU [1,0]<stdout>:Iter #8: 232.2 img/sec per GPU [1,0]<stdout>:Iter #9: 227.8 img/sec per GPU [1,0]<stdout>:Img/sec per GPU: 229.7 +-7.5 [1,0]<stdout>:Total img/sec on 2 GPU(s): 459.4 +-15.1 ``` #### 使用範例4: 顯示套件資訊 AIGO 容器內含一個小程式: ```versions_summary```。它可以讓您迅速的了解容器內安裝了哪些套件,以及所安裝的套件是何種版本。請於終端機執行以下指令: ```bash nvidia-docker run -it --privileged --rm moeaidb/aigo:cu10.1-dnn7.6-gpu-tf-and-pytorch-nlp-19.12 versions_summary ``` 執行後，您應該會看到類同於以下的輸出結果: ``` System INFO: Python v3.7.5 NVIDIA Driver v440.36 CUDA v10.1.243-1 cuDNN v7.6.5.32-1+cuda10.1 NCCL v2.4.8-1+cuda10.1 Installed Python3 Packages: [Base]: torch v1.3.0a0+ee77ccb torchvision v0.4.2 apex v0.1 tensorflow v2.1.0rc1 keras v2.3.1 horovod v0.18.2 mpi4py v3.0.3 numba v0.46.0 [Numerical]: numexpr v2.7.0 numpy v1.17.4 scipy v1.3.3 [Data Science]: sklearn v0.20.4 pandas v0.25.3 matplotlib v3.1.2 seaborn v0.9.0 bokeh v1.4.0 jupyterlab v1.2.3 pyodbc v4.0.27 yacs v0.1.6 [NLP]: fairseq v0.9.0 torchtext v0.4.0 torchnlp v0.5.0 pyhanlp v0.1.57 jieba v0.39 nltk v3.4.5 gensim v3.8.1 transformers v2.2.2 [CV]: cv2 v3.4.8 imgaug v0.3.0 pydicom v1.2.2 skimage v0.16.2 detectron2 v0.1 ``` ### 套件資訊 | 套件/軟體/函式庫名稱 | 版本 | 套件說明 | |:---------|:---------:|:---------| | [TensorFlow](https://www.tensorflow.org)| 2.1.0rc1 | An open source machine learning library for research and production. 由Google維護的，開源的AI模型開發框架。 | | [PyTorch](https://pytorch.org) | 1.3.1 | Tensors and Dynamic neural networks in Python with strong GPU acceleration. 由Facebook AI Research (FAIR)維護的，開源的AI模型開發框架。 | | [torchvision](https://github.com/pytorch/vision) | 0.4.2 | The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. 含有一些電腦視覺相關的基本模型和資料集，一般是隨PyTorch一起安裝。 | | [Apex](https://github.com/NVIDIA/apex) | 0.1 | A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch. 除了支持自動混精加速外，還實作了SyncBatchNorm，有效利用多卡加大batch size。 | | [PyTorch-NLP](https://github.com/PetrochukM/PyTorch-NLP) | 0.5.0 | Supporting Rapid Prototyping with a Toolkit (incl. Datasets and Neural Network Layers) 自然語言模型建置框架 (基於PyTorch)。 | | [Jieba](https://github.com/fxsjy/jieba) | 0.39 | “结巴”中文分词：做最好的 Python 中文分詞组件用來做中文斷詞的常用套件。 | | [Pyhanlp](https://github.com/hankcs/pyhanlp) | 0.1.57 | 自然語言處理工具包HanLP的Python接口。HanLP以及模型已經安裝在本機。它可處理中文斷詞和句法分析。 | | [NLTK](http://www.nltk.org) | 3.4.5 | NLTK -- the Natural Language Toolkit -- is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. 有名的自然語言處理工具包。 | | [Gensim](https://github.com/RaRe-Technologies/gensim) | 3.8.1 | Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. 方便的自然語言建模工具。 | | [Python](https://docs.python.org/3.7/whatsnew/changelog.html#python-3-7-3-final) | 3.7.5 | Python is powerful... and fast; plays well with others; runs everywhere; is friendly & easy to learn; is Open. 我們環境採用Python 3.7，它於字串處理和檔案搜索方面較Python3.6快很多。 | | [Horovod](https://github.com/horovod/horovod) | 0.18.2 | Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. 使用Uber Horovod可簡易的將AI訓練利用多GPU做加速。 | | [OpenMPI](https://www.open-mpi.org) | 4.0.2 | A High Performance Message Passing Library. (Required by Uber Horovod) OpenMPI為Uber Horovod所需，可支持跨卡/跨伺服器節點的溝通。 | | [NVIDIA CUDA](https://developer.nvidia.com/cuda-toolkit) (runtime) | 10.1.243-1 | The NVIDIA® CUDA® Toolkit provides a development environment for creating high performance GPU-accelerated applications. CUDA為NVIDIA為其GPU所提供的開發框架。所有AI開發框架皆會呼叫其所提供的API。 | [NVIDIA cuDNN](https://developer.nvidia.com/cudnn) (runtime)| 7.6.5.32-1+cuda10.1 | The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN是NVIDIA專門為深度神經網路開發所提供的函示庫。 | | [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) (runtime) | 6.0.1-1+cuda10.1 | NVIDIA TensorRT® is a platform for high-performance deep learning inference. 於模型部署階段，可利用NVIDIA TensorRT將模型優化，或將單精度模型以合適的方式轉換成半精度模型，使模型推理能夠以高速運行。 | | [NVIDIA Collectives Communication Library (NCCL)](https://developer.nvidia.com/nccl) | v2.4.8-1+cuda10.1 | The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs. 使用多GPU訓練時，TensorFlow可利用NVIDIA NCCL做多GPU加速。 | [Intel Math Kernel Library (Intel MKL)](https://software.intel.com/en-us/mkl) | 2019.4-070 | Intel® Math Kernel Library (Intel® MKL) optimizes code with minimal effort for future generations of Intel® processors. 針對Intel CPU做快速的數值運算。 |(Intel® MKL) optimizes code with minimal effort for future generations of Intel® processors. 針對Intel CPU做快速的數值運算。 | | [NumPy](https://www.numpy.org) (Intel-MKL-acclerated) | 1.17.4 | NumPy is the fundamental package for scientific computing with Python. 常用的數值運算套件 (利用Intel MKL加速)。 | | [SciPy](https://www.scipy.org) (Intel-MKL-acclerated) | 1.3.3 | SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. 常用的科研套件，提供一些基礎算法，統計方法 (利用Intel MKL加速)。 | | [Scikit-learn](https://scikit-learn.org/stable/#) (Intel-MKL-acclerated) | 0.20.4 | Machine Learning in Python. 常用的機器學習套件，提供一些基礎算法，統計方法 (利用Intel MKL加速)。 | | [OpenCV](https://opencv.org) (Intel-MKL-acclerated) | 3.4.8 | OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. 用於影像處理，以及建立影像相關的機器學習模型。 | | [imgaug](https://github.com/aleju/imgaug) | 0.3.0 | Image augmentation for machine learning experiments. 用於data augmentation (資料增益)。 | | [pydicom](https://pydicom.github.io/pydicom/stable/getting_started.html) | 1.2.2 | Pydicom is a pure Python package for working with DICOM files such as medical images, reports, and radiotherapy objects. 用於讀取醫療影像。 | | [gdcm](https://sourceforge.net/projects/gdcm/) | 3.0.4 | Grassroots DiCoM is a C++ library for DICOM medical files. It is accessible from Python, C#, Java and PHP. It supports RAW, JPEG, JPEG 2000, JPEG-LS, RLE and deflated transfer syntax. 須經由此函式庫的幫助，才能透過pydicom讀取壓縮過的醫療影像。 | | [Numba](http://numba.pydata.org) | 0.46.0 | Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. Python程式碼經JIT編譯器編譯後，可加速百倍至千倍。 | | [Numexpr](https://github.com/pydata/numexpr) | 2.7.0 | Fast numerical array expression evaluator for Python, NumPy, PyTables, pandas, bcolz and more. 數學表達式經過計算優化後，可提升最高至4倍速。 | | [pyodbc](https://github.com/mkleehammer/pyodbc) | 4.0.27 | pyodbc is an open source Python module that makes accessing ODBC databases simple. 連結資料庫使用。 | | [Jupyterlab](https://github.com/jupyterlab/jupyterlab) | 1.2.3 | An extensible environment for interactive and reproducible computing, based on the Jupyter Notebook and Architecture. 程式碼運行，紀錄，筆記撰寫，皆可存放並整理至筆記本。 | | [pandas](https://pandas.pydata.org) | 0.25.3 | pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. 建立並整理資料表，並且提供簡易的方式將資料表視覺化。 | | [Matplotlib](https://matplotlib.org) | 3.1.2 | Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. 資料視覺化套件，可繪製長條圖，直方統計圖，散點圖等。 | | [Seaborn](https://seaborn.pydata.org) | 0.9.0 | Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. 基於Matplotlib的高階繪圖API; 可接收資料表，自動做groupby後繪圖。 | | [Bokeh](https://bokeh.pydata.org/en/latest/) | 1.4.0 | Bokeh is an interactive visualization library that targets modern web browsers for presentation. 可嵌入至網頁，實現互動式的數據呈現。|

| 套件/軟體/函式庫名稱 | 版本 | 套件說明 | |:---------|:---------:|:---------| | [TensorFlow](https://www.tensorflow.org)| 2.1.0rc1 | An open source machine learning library for research and production. 由Google維護的，開源的AI模型開發框架。 | | [PyTorch](https://pytorch.org) | 1.3.1 | Tensors and Dynamic neural networks in Python with strong GPU acceleration. 由Facebook AI Research (FAIR)維護的，開源的AI模型開發框架。 | | [torchvision](https://github.com/pytorch/vision) | 0.4.2 | The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. 含有一些電腦視覺相關的基本模型和資料集，一般是隨PyTorch一起安裝。 | | [Apex](https://github.com/NVIDIA/apex) | 0.1 | A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch. 除了支持自動混精加速外，還實作了SyncBatchNorm，有效利用多卡加大batch size。 | | [PyTorch-NLP](https://github.com/PetrochukM/PyTorch-NLP) | 0.5.0 | Supporting Rapid Prototyping with a Toolkit (incl. Datasets and Neural Network Layers) 自然語言模型建置框架 (基於PyTorch)。 | | [Jieba](https://github.com/fxsjy/jieba) | 0.39 | “结巴”中文分词：做最好的 Python 中文分詞组件用來做中文斷詞的常用套件。 | | [Pyhanlp](https://github.com/hankcs/pyhanlp) | 0.1.57 | 自然語言處理工具包HanLP的Python接口。HanLP以及模型已經安裝在本機。它可處理中文斷詞和句法分析。 | | [NLTK](http://www.nltk.org) | 3.4.5 | NLTK -- the Natural Language Toolkit -- is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. 有名的自然語言處理工具包。 | | [Gensim](https://github.com/RaRe-Technologies/gensim) | 3.8.1 | Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. 方便的自然語言建模工具。 | | [Python](https://docs.python.org/3.7/whatsnew/changelog.html#python-3-7-3-final) | 3.7.5 | Python is powerful... and fast; plays well with others; runs everywhere; is friendly & easy to learn; is Open. 我們環境採用Python 3.7，它於字串處理和檔案搜索方面較Python3.6快很多。 | | [Horovod](https://github.com/horovod/horovod) | 0.18.2 | Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. 使用Uber Horovod可簡易的將AI訓練利用多GPU做加速。 | | [OpenMPI](https://www.open-mpi.org) | 4.0.2 | A High Performance Message Passing Library. (Required by Uber Horovod) OpenMPI為Uber Horovod所需，可支持跨卡/跨伺服器節點的溝通。 | | [NVIDIA CUDA](https://developer.nvidia.com/cuda-toolkit) (runtime) | 10.1.243-1 | The NVIDIA® CUDA® Toolkit provides a development environment for creating high performance GPU-accelerated applications. CUDA為NVIDIA為其GPU所提供的開發框架。所有AI開發框架皆會呼叫其所提供的API。 | [NVIDIA cuDNN](https://developer.nvidia.com/cudnn) (runtime)| 7.6.5.32-1+cuda10.1 | The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN是NVIDIA專門為深度神經網路開發所提供的函示庫。 | | [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) (runtime) | 6.0.1-1+cuda10.1 | NVIDIA TensorRT® is a platform for high-performance deep learning inference. 於模型部署階段，可利用NVIDIA TensorRT將模型優化，或將單精度模型以合適的方式轉換成半精度模型，使模型推理能夠以高速運行。 | | [NVIDIA Collectives Communication Library (NCCL)](https://developer.nvidia.com/nccl) | v2.4.8-1+cuda10.1 | The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs. 使用多GPU訓練時，TensorFlow可利用NVIDIA NCCL做多GPU加速。 | [Intel Math Kernel Library (Intel MKL)](https://software.intel.com/en-us/mkl) | 2019.4-070 | Intel® Math Kernel Library (Intel® MKL) optimizes code with minimal effort for future generations of Intel® processors. 針對Intel CPU做快速的數值運算。 |(Intel® MKL) optimizes code with minimal effort for future generations of Intel® processors. 針對Intel CPU做快速的數值運算。 | | [NumPy](https://www.numpy.org) (Intel-MKL-acclerated) | 1.17.4 | NumPy is the fundamental package for scientific computing with Python. 常用的數值運算套件 (利用Intel MKL加速)。 | | [SciPy](https://www.scipy.org) (Intel-MKL-acclerated) | 1.3.3 | SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. 常用的科研套件，提供一些基礎算法，統計方法 (利用Intel MKL加速)。 | | [Scikit-learn](https://scikit-learn.org/stable/#) (Intel-MKL-acclerated) | 0.20.4 | Machine Learning in Python. 常用的機器學習套件，提供一些基礎算法，統計方法 (利用Intel MKL加速)。 | | [OpenCV](https://opencv.org) (Intel-MKL-acclerated) | 3.4.8 | OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. 用於影像處理，以及建立影像相關的機器學習模型。 | | [imgaug](https://github.com/aleju/imgaug) | 0.3.0 | Image augmentation for machine learning experiments. 用於data augmentation (資料增益)。 | | [pydicom](https://pydicom.github.io/pydicom/stable/getting_started.html) | 1.2.2 | Pydicom is a pure Python package for working with DICOM files such as medical images, reports, and radiotherapy objects. 用於讀取醫療影像。 | | [gdcm](https://sourceforge.net/projects/gdcm/) | 3.0.4 | Grassroots DiCoM is a C++ library for DICOM medical files. It is accessible from Python, C#, Java and PHP. It supports RAW, JPEG, JPEG 2000, JPEG-LS, RLE and deflated transfer syntax. 須經由此函式庫的幫助，才能透過pydicom讀取壓縮過的醫療影像。 | | [Numba](http://numba.pydata.org) | 0.46.0 | Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. Python程式碼經JIT編譯器編譯後，可加速百倍至千倍。 | | [Numexpr](https://github.com/pydata/numexpr) | 2.7.0 | Fast numerical array expression evaluator for Python, NumPy, PyTables, pandas, bcolz and more. 數學表達式經過計算優化後，可提升最高至4倍速。 | | [pyodbc](https://github.com/mkleehammer/pyodbc) | 4.0.27 | pyodbc is an open source Python module that makes accessing ODBC databases simple. 連結資料庫使用。 | | [Jupyterlab](https://github.com/jupyterlab/jupyterlab) | 1.2.3 | An extensible environment for interactive and reproducible computing, based on the Jupyter Notebook and Architecture. 程式碼運行，紀錄，筆記撰寫，皆可存放並整理至筆記本。 | | [pandas](https://pandas.pydata.org) | 0.25.3 | pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. 建立並整理資料表，並且提供簡易的方式將資料表視覺化。 | | [Matplotlib](https://matplotlib.org) | 3.1.2 | Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. 資料視覺化套件，可繪製長條圖，直方統計圖，散點圖等。 | | [Seaborn](https://seaborn.pydata.org) | 0.9.0 | Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. 基於Matplotlib的高階繪圖API; 可接收資料表，自動做groupby後繪圖。 | | [Bokeh](https://bokeh.pydata.org/en/latest/) | 1.4.0 | Bokeh is an interactive visualization library that targets modern web browsers for presentation. 可嵌入至網頁，實現互動式的數據呈現。|

經由 qazwsxedc850124 評論 ‧ 1459 天前

不好意思，如果我要處理不平衡資料，可以怎麼在這個docker加入imblearn https://imbalanced-learn.readthedocs.io/en/stable/install.html

經由 翁啟閎 評論 ‧ 1414 天前

可參考這篇來基於AIGO容器去做客製化歐：https://philipzheng.gitbook.io/docker_practice/image/create 你的dockerfile會長這樣： FROM moeaidb/aigo:cu10.1-dnn7.6-gpu-tf-and-pytorch-nlp-19.12 RUN pip3 install xxxxx RUN xxxxx