448 lines
12 KiB
Plaintext
448 lines
12 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# GPU-Jupyter\n",
|
|
"\n",
|
|
"This Jupyterlab Instance is connected to the GPU via CUDA drivers. In this notebook, we test the installation and perform some basic operations on the GPU."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Test GPU connection\n",
|
|
"\n",
|
|
"#### Using the following command, your GPU type and its NVIDIA-SMI driver version should be listed:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Wed Mar 11 07:16:17 2020 \n",
|
|
"+-----------------------------------------------------------------------------+\n",
|
|
"| NVIDIA-SMI 440.48.02 Driver Version: 440.48.02 CUDA Version: 10.2 |\n",
|
|
"|-------------------------------+----------------------+----------------------+\n",
|
|
"| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
|
|
"| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
|
|
"|===============================+======================+======================|\n",
|
|
"| 0 GeForce RTX 207... Off | 00000000:01:00.0 Off | N/A |\n",
|
|
"| 0% 42C P8 1W / 215W | 1788MiB / 7974MiB | 0% Default |\n",
|
|
"+-------------------------------+----------------------+----------------------+\n",
|
|
" \n",
|
|
"+-----------------------------------------------------------------------------+\n",
|
|
"| Processes: GPU Memory |\n",
|
|
"| GPU PID Type Process name Usage |\n",
|
|
"|=============================================================================|\n",
|
|
"+-----------------------------------------------------------------------------+\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"!nvidia-smi"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Now, test if PyTorch can access the GPU via CUDA:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"True"
|
|
]
|
|
},
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"import torch\n",
|
|
"torch.cuda.is_available()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"WARNING:tensorflow:From <ipython-input-3-d1bfbb527297>:3: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.\n",
|
|
"Instructions for updating:\n",
|
|
"Use `tf.config.list_physical_devices('GPU')` instead.\n",
|
|
"True\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[name: \"/device:CPU:0\"\n",
|
|
" device_type: \"CPU\"\n",
|
|
" memory_limit: 268435456\n",
|
|
" locality {\n",
|
|
" }\n",
|
|
" incarnation: 8034786465358909470,\n",
|
|
" name: \"/device:XLA_CPU:0\"\n",
|
|
" device_type: \"XLA_CPU\"\n",
|
|
" memory_limit: 17179869184\n",
|
|
" locality {\n",
|
|
" }\n",
|
|
" incarnation: 13772661904993777233\n",
|
|
" physical_device_desc: \"device: XLA_CPU device\",\n",
|
|
" name: \"/device:GPU:0\"\n",
|
|
" device_type: \"GPU\"\n",
|
|
" memory_limit: 5480775680\n",
|
|
" locality {\n",
|
|
" bus_id: 1\n",
|
|
" links {\n",
|
|
" }\n",
|
|
" }\n",
|
|
" incarnation: 8336380964433791501\n",
|
|
" physical_device_desc: \"device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5\",\n",
|
|
" name: \"/device:XLA_GPU:0\"\n",
|
|
" device_type: \"XLA_GPU\"\n",
|
|
" memory_limit: 17179869184\n",
|
|
" locality {\n",
|
|
" }\n",
|
|
" incarnation: 4817022749254415174\n",
|
|
" physical_device_desc: \"device: XLA_GPU device\"]"
|
|
]
|
|
},
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"import tensorflow as tf\n",
|
|
"from tensorflow.python.client import device_lib\n",
|
|
"print(tf.test.is_gpu_available(cuda_only=True))\n",
|
|
"device_lib.list_local_devices()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"tensor([[0.1091, 0.0178, 0.2500],\n",
|
|
" [0.1409, 0.9612, 0.0325],\n",
|
|
" [0.8944, 0.3869, 0.9657],\n",
|
|
" [0.8131, 0.5454, 0.2587],\n",
|
|
" [0.6570, 0.0147, 0.1361]])"
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from __future__ import print_function\n",
|
|
"import numpy as np\n",
|
|
"import torch\n",
|
|
"a = torch.rand(5, 3)\n",
|
|
"a"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Performance test\n",
|
|
"\n",
|
|
"#### Now we want to know how much faster a typical operation is using GPU. Therefore we do the same operation in numpy, PyTorch and PyTorch with CUDA. The test operation is the calculation of the prediction matrix that is done in a linear regression."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### 1) Numpy"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"x = np.random.rand(10000, 256)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"248 ms ± 174 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"%%timeit\n",
|
|
"H = x.dot(np.linalg.inv(x.transpose().dot(x))).dot(x.transpose())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### 2) PyTorch"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"x = torch.rand(10000, 256)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"78.2 ms ± 250 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"%%timeit\n",
|
|
"# Calculate the projection matrix of x\n",
|
|
"H = x.mm( (x.t().mm(x)).inverse() ).mm(x.t())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### 3) PyTorch on GPU via CUDA"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"tensor([[0.0962, 0.3125, 0.7327, 0.5982, 0.4624],\n",
|
|
" [0.4655, 0.4890, 0.9603, 0.4339, 0.0524],\n",
|
|
" [0.9294, 0.9639, 0.6312, 0.1752, 0.7721],\n",
|
|
" [0.5533, 0.3656, 0.9329, 0.8796, 0.9513],\n",
|
|
" [0.4949, 0.0972, 0.2892, 0.7570, 0.2847]], device='cuda:0')\n",
|
|
"tensor([[0.0962, 0.3125, 0.7327, 0.5982, 0.4624],\n",
|
|
" [0.4655, 0.4890, 0.9603, 0.4339, 0.0524],\n",
|
|
" [0.9294, 0.9639, 0.6312, 0.1752, 0.7721],\n",
|
|
" [0.5533, 0.3656, 0.9329, 0.8796, 0.9513],\n",
|
|
" [0.4949, 0.0972, 0.2892, 0.7570, 0.2847]], dtype=torch.float64)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# let us run this cell only if CUDA is available\n",
|
|
"# We will use ``torch.device`` objects to move tensors in and out of GPU\n",
|
|
"if torch.cuda.is_available():\n",
|
|
" device = torch.device(\"cuda\") # a CUDA device object\n",
|
|
" x = torch.rand(10000, 256, device=device) # directly create a tensor on GPU\n",
|
|
" y = x.to(device) # or just use strings ``.to(\"cuda\")``\n",
|
|
" print(x[0:5, 0:5])\n",
|
|
" print(y.to(\"cpu\", torch.double)[0:5, 0:5])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"11.4 ms ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"%%timeit\n",
|
|
"H = x.mm( (x.t().mm(x)).inverse() ).mm(x.t())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Exhaustive Testing on GPU"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# let us run this cell only if CUDA is available\n",
|
|
"# We will use ``torch.device`` objects to move tensors in and out of GPU\n",
|
|
"import torch\n",
|
|
"if torch.cuda.is_available():\n",
|
|
" device = torch.device(\"cuda\") # a CUDA device object\n",
|
|
" x = torch.rand(10000, 10, device=device) # directly create a tensor on GPU"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"tensor([[0.4303, 0.7364, 0.1235, 0.7786, 0.7036],\n",
|
|
" [0.3256, 0.4515, 0.7994, 0.9814, 0.7705],\n",
|
|
" [0.2292, 0.5194, 0.4354, 0.3964, 0.5804],\n",
|
|
" [0.8855, 0.5156, 0.9321, 0.9555, 0.4150],\n",
|
|
" [0.0640, 0.0665, 0.1170, 0.9547, 0.2668]], device='cuda:0')\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"if torch.cuda.is_available():\n",
|
|
" y = x.to(device) # or just use strings ``.to(\"cuda\")``\n",
|
|
" print(x[0:5, 0:5])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"if torch.cuda.is_available():\n",
|
|
" # Here is the memory of the GPU a border. \n",
|
|
" # A matrix with 100000 lines requires 37 GB, but only 8 GB are available.\n",
|
|
" H = x.mm( (x.t().mm(x)).inverse() ).mm(x.t())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"tensor([[1.0966e-03, 3.5866e-04, 4.0044e-04, 3.2466e-04, 2.3044e-04],\n",
|
|
" [3.5866e-04, 9.7424e-04, 2.8649e-04, 8.2904e-04, 2.0482e-04],\n",
|
|
" [4.0044e-04, 2.8649e-04, 5.4179e-04, 1.2729e-04, 9.4659e-05],\n",
|
|
" [3.2466e-04, 8.2904e-04, 1.2729e-04, 1.3005e-03, 6.6951e-06],\n",
|
|
" [2.3044e-04, 2.0482e-04, 9.4659e-05, 6.6950e-06, 1.3420e-03]],\n",
|
|
" device='cuda:0')\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"if torch.cuda.is_available():\n",
|
|
" print(H[0:5, 0:5])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"tensor([[1.0966e-03, 3.5866e-04, 4.0044e-04, 3.2466e-04, 2.3044e-04],\n",
|
|
" [3.5866e-04, 9.7424e-04, 2.8649e-04, 8.2904e-04, 2.0482e-04],\n",
|
|
" [4.0044e-04, 2.8649e-04, 5.4179e-04, 1.2729e-04, 9.4659e-05],\n",
|
|
" [3.2466e-04, 8.2904e-04, 1.2729e-04, 1.3005e-03, 6.6951e-06],\n",
|
|
" [2.3044e-04, 2.0482e-04, 9.4659e-05, 6.6950e-06, 1.3420e-03]],\n",
|
|
" dtype=torch.float64)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"if torch.cuda.is_available():\n",
|
|
" # This operation is difficult, as an symmetric matrix is transferred \n",
|
|
" # back to the CPU. Is possible up to 30000 rows.\n",
|
|
" print(H.to(\"cpu\", torch.double)[0:5, 0:5])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|