# GPU-Jupyter

This Jupyterlab Instance is connected to the GPU via CUDA drivers. In this notebook, we test the installation and perform some basic operations on the GPU.

## Test GPU connection

#### Using the following command, your GPU type and its NVIDIA-SMI driver version should be listed:

In [1]:
!nvidia-smi

Tue Mar 10 17:55:25 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.48.02    Driver Version: 440.48.02    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce RTX 207...  Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   41C    P8     1W / 215W |    215MiB /  7974MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
+-------

#### Now, test if PyTorch can access the GPU via CUDA:

In [2]:
import torch
torch.cuda.is_available()

True

In [7]:
import tensorflow as tf
from tensorflow.python.client import device_lib
print(tf.test.is_gpu_available(cuda_only=True))
device_lib.list_local_devices()

True


[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 933763008911863935,
 name: "/device:XLA_CPU:0"
 device_type: "XLA_CPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 12790964875098705008
 physical_device_desc: "device: XLA_CPU device",
 name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 6940531098
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 4940791198162309705
 physical_device_desc: "device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5",
 name: "/device:XLA_GPU:0"
 device_type: "XLA_GPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 6996862811697216940
 physical_device_desc: "device: XLA_GPU device"]

In [8]:
from __future__ import print_function
import numpy as np
import torch
a = torch.rand(5, 3)
a

tensor([[0.8519, 0.7682, 0.3258],
        [0.1957, 0.4073, 0.6085],
        [0.9164, 0.8401, 0.4548],
        [0.9011, 0.8838, 0.9559],
        [0.4692, 0.3993, 0.4313]])

## Performance test

#### Now we want to know how much faster a typical operation is using GPU. Therefore we do the same operation in numpy, PyTorch and PyTorch with CUDA. The test operation is the calculation of the prediction matrix that is done in a linear regression.

### 1) Numpy

In [9]:
x = np.random.rand(10000, 256)

In [10]:
%%timeit
H = x.dot(np.linalg.inv(x.transpose().dot(x))).dot(x.transpose())

362 ms ± 86.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### 2) PyTorch

In [11]:
x = torch.rand(10000, 256)

In [12]:
%%timeit
# Calculate the projection matrix of x
H = x.mm( (x.t().mm(x)).inverse() ).mm(x.t())

135 ms ± 3.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


### 3) PyTorch on GPU via CUDA

In [13]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    x = torch.rand(10000, 256, device=device) # directly create a tensor on GPU
    y = x.to(device)                       # or just use strings ``.to("cuda")``
    print(x[0:5, 0:5])
    print(y.to("cpu", torch.double)[0:5, 0:5])

tensor([[0.2812, 0.3255, 0.5715, 0.1665, 0.6951],
        [0.5562, 0.9592, 0.0911, 0.9672, 0.3311],
        [0.6711, 0.0422, 0.5091, 0.6653, 0.9234],
        [0.1029, 0.1447, 0.8385, 0.7580, 0.7998],
        [0.7787, 0.0114, 0.4865, 0.4171, 0.7066]], device='cuda:0')
tensor([[0.2812, 0.3255, 0.5715, 0.1665, 0.6951],
        [0.5562, 0.9592, 0.0911, 0.9672, 0.3311],
        [0.6711, 0.0422, 0.5091, 0.6653, 0.9234],
        [0.1029, 0.1447, 0.8385, 0.7580, 0.7998],
        [0.7787, 0.0114, 0.4865, 0.4171, 0.7066]], dtype=torch.float64)


    Found GPU0 GeForce RTX 2070 SUPER which requires CUDA_VERSION >= 10000 to
     work properly, but your PyTorch was compiled
     with CUDA_VERSION 9000. Please install the correct PyTorch binary
     using instructions from https://pytorch.org
    


In [14]:
%%timeit
H = x.mm( (x.t().mm(x)).inverse() ).mm(x.t())

12.8 ms ± 564 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Exhaustive Testing on GPU

In [15]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
import torch
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    x = torch.rand(10000, 10, device=device) # directly create a tensor on GPU

In [16]:
if torch.cuda.is_available():
    y = x.to(device)                       # or just use strings ``.to("cuda")``
    print(x[0:5, 0:5])

tensor([[0.6760, 0.8890, 0.7271, 0.4208, 0.1131],
        [0.4036, 0.8012, 0.3448, 0.4120, 0.2439],
        [0.6088, 0.4356, 0.9391, 0.1366, 0.4379],
        [0.4540, 0.5981, 0.3885, 0.2473, 0.5938],
        [0.2976, 0.8384, 0.6107, 0.6882, 0.9593]], device='cuda:0')


In [17]:
if torch.cuda.is_available():
    # Here is the memory of the GPU a border. 
    # A matrix with 100000 lines requires 37 GB, but only 8 GB are available.
    H = x.mm( (x.t().mm(x)).inverse() ).mm(x.t())

In [18]:
if torch.cuda.is_available():
    print(H[0:5, 0:5])

tensor([[ 1.1191e-03,  1.6152e-04, -2.1592e-04,  1.4253e-04, -4.0365e-04],
        [ 1.6151e-04,  5.5901e-04,  2.6872e-04, -3.1842e-06,  2.8985e-04],
        [-2.1592e-04,  2.6872e-04,  1.0728e-03, -3.5968e-05,  5.5613e-04],
        [ 1.4253e-04, -3.1840e-06, -3.5968e-05,  6.5156e-04, -3.1820e-04],
        [-4.0365e-04,  2.8985e-04,  5.5613e-04, -3.1820e-04,  1.4067e-03]],
       device='cuda:0')


In [19]:
if torch.cuda.is_available():
    # This operation is difficult, as an symmetric matrix is transferred 
    # back to the CPU. Is possible up to 30000 rows.
    print(H.to("cpu", torch.double)[0:5, 0:5])

tensor([[ 1.1191e-03,  1.6152e-04, -2.1592e-04,  1.4253e-04, -4.0365e-04],
        [ 1.6151e-04,  5.5901e-04,  2.6872e-04, -3.1842e-06,  2.8985e-04],
        [-2.1592e-04,  2.6872e-04,  1.0728e-03, -3.5968e-05,  5.5613e-04],
        [ 1.4253e-04, -3.1840e-06, -3.5968e-05,  6.5156e-04, -3.1820e-04],
        [-4.0365e-04,  2.8985e-04,  5.5613e-04, -3.1820e-04,  1.4067e-03]],
       dtype=torch.float64)
