2050 lines
223 KiB
Plaintext
2050 lines
223 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "5c60acfe",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Problem Set 7: Building ConvNets with PyTorch\n",
|
||
"\n",
|
||
"**Release Date:** 02 April 2024\n",
|
||
"\n",
|
||
"**Due Date:** 2359h, 13 April 2024\n",
|
||
"\n",
|
||
"In PS6, you looked at the basics of building learning pipelines with `PyTorch`. Computer Vision, the field that aims to make computers understand and learn visual objects like images, is a hot topic in Machine Learning. Knowledge of building these Computer Vision pipelines is pertinent given the fast-paced nature of the field.\n",
|
||
"\n",
|
||
"<img src=\"imgs/logo.png\" width=\"600\">\n",
|
||
"\n",
|
||
"In *Problem Set 7*, we will take you through data `PyTorch` API for Computer Vision. You will be building a __Convolutional Neural Network__ (CNN/ConvNet) and training it on two datasets, *MNIST* and *CIFAR-10*. You'll also learn how to build __data augmentation pipelines__ to enhance your dataset. Finally, you'll look through the eyes of your ConvNet to see __why__ it's making certain predictions."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"id": "adfd1c67",
|
||
"metadata": {
|
||
"collapsed": false,
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-27T17:03:32.690848Z",
|
||
"start_time": "2024-04-27T17:03:30.945851Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"/tmp/ipykernel_95268/3863716717.py:17: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'\n",
|
||
" device = torch.device(\"mps\") if torch.has_mps else torch.device(\"cpu\")\n",
|
||
"/tmp/ipykernel_95268/3863716717.py:18: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'\n",
|
||
" torch.has_mps\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": "False"
|
||
},
|
||
"execution_count": 1,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# do not remove this cell\n",
|
||
"# run this cell before moving on\n",
|
||
"\n",
|
||
"# DL libraries\n",
|
||
"import torch\n",
|
||
"import torch.nn as nn \n",
|
||
"import torch.nn.functional as F\n",
|
||
"from torchvision import datasets, transforms\n",
|
||
"\n",
|
||
"# Computational libraries\n",
|
||
"import math\n",
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"# Visualization libraries\n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"import seaborn as sns\n",
|
||
"device = torch.device(\"mps\") if torch.has_mps else torch.device(\"cpu\")\n",
|
||
"torch.has_mps"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "06e51671",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Chapter 1: Vision Layers\n",
|
||
"\n",
|
||
"Here, you'll be building the two fundamental layers that are the cornerstone of Computer Vision: the convolutional layer and the pooling layer (specifically, __max__ pooling). \n",
|
||
"\n",
|
||
"## Task 1.1: Convolution Under The Hood\n",
|
||
"\n",
|
||
"Your task is to write the `conv2d` function that performs the convolution operation on a 2-dim image, `img : torch.Tensor`, using a certain kernel, `kernel : torch.Tensor`. Assume there is no padding and the stride is 1. \n",
|
||
"\n",
|
||
"> Don't work on the channels – assume they remain the same. Your code should only work on the spatial dimensions: Height and Width.\n",
|
||
"\n",
|
||
"We've given you two images `x1` and `x2` and their convolutional outputs `c1` and `c2` respectively. Run them to verify whether your function is working as expected.\n",
|
||
"\n",
|
||
"$$\n",
|
||
"c1 = \\texttt{conv2d}\\Bigg(\n",
|
||
"\\begin{bmatrix}\n",
|
||
" 4 & 9 & 3 & 0 & 3 \\\\\n",
|
||
" 9 & 7 & 3 & 7 & 3 \\\\\n",
|
||
" 1 & 6 & 6 & 9 & 8 \\\\\n",
|
||
" 6 & 6 & 8 & 4 & 3 \\\\\n",
|
||
" 6 & 9 & 1 & 4 & 4 \\\\\n",
|
||
"\\end{bmatrix},~\n",
|
||
"\\begin{bmatrix}\n",
|
||
" 1 & 1 \\\\\n",
|
||
" 1 & 1\n",
|
||
"\\end{bmatrix}\\Bigg) = \n",
|
||
" \\begin{bmatrix} \n",
|
||
" 4+9+9+7 & 9+3+7+3 & 3+0+3+7 & 0+3+7+3 \\\\\n",
|
||
" 9+7+1+6 & 7+3+6+6 & 3+7+6+9 & 7+3+9+8 \\\\\n",
|
||
" 1+6+6+6 & 6+6+6+8 & 6+9+8+4 & 9+8+4+3 \\\\\n",
|
||
" 6+6+6+9 & 6+8+9+1 & 8+4+1+4 & 4+3+4+4 \\\\\n",
|
||
" \\end{bmatrix} =\n",
|
||
"\\begin{bmatrix} \n",
|
||
" 29 & 22 & 13 & 13 \\\\\n",
|
||
" 23 & 22 & 25 & 27 \\\\\n",
|
||
" 19 & 26 & 27 & 24 \\\\\n",
|
||
" 27 & 24 & 17 & 15 \\\\\n",
|
||
"\\end{bmatrix}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"$$\n",
|
||
"c2 = \\texttt{conv2d}\\Bigg(\n",
|
||
"\\begin{bmatrix}\n",
|
||
" 1 & 9 & 9 & 9 & 0 & 1 \\\\\n",
|
||
" 2 & 3 & 0 & 5 & 5 & 2 \\\\\n",
|
||
" 9 & 1 & 8 & 8 & 3 & 6 \\\\\n",
|
||
" 9 & 1 & 7 & 3 & 5 & 2 \\\\\n",
|
||
" 1 & 0 & 9 & 3 & 1 & 1 \\\\\n",
|
||
" 0 & 3 & 6 & 6 & 7 & 9 \\\\\n",
|
||
"\\end{bmatrix},~\n",
|
||
"\\begin{bmatrix}\n",
|
||
" 6 & 3 & 4 & 5 \\\\\n",
|
||
" 0 & 8 & 2 & 8 \\\\\n",
|
||
" 2 & 7 & 5 & 0 \\\\\n",
|
||
" 0 & 8 & 1 & 9 \\\\\n",
|
||
"\\end{bmatrix}\\Bigg) = \\begin{bmatrix} \n",
|
||
" 285 & 369 & 286 \\\\\n",
|
||
" 230 & 317 & 257 \\\\ \n",
|
||
" 306 & 374 & 344 \\\\\n",
|
||
"\\end{bmatrix}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"__Note:__ You are not allowed to use the `torch.nn.functional.conv2d` function."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "a63a577557da6e87",
|
||
"metadata": {
|
||
"collapsed": false
|
||
},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"id": "7873ce10",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-27T17:03:40.731264Z",
|
||
"start_time": "2024-04-27T17:03:40.726572Z"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"torch.manual_seed(0)\n",
|
||
"\n",
|
||
"def conv2d(img, kernel):\n",
|
||
" \"\"\"\n",
|
||
" PARAMS\n",
|
||
" img: the 2-dim image with a specific height and width\n",
|
||
" kernel: a 2-dim kernel (smaller than image dimensions) that convolves the given image\n",
|
||
" \n",
|
||
" RETURNS\n",
|
||
" the convolved 2-dim image\n",
|
||
" \"\"\"\n",
|
||
" iH, iW = img.shape\n",
|
||
" kH, kW = kernel.shape\n",
|
||
" result_shape = (iH-kH+1, iW-kW+1)\n",
|
||
" \n",
|
||
" result = torch.zeros(result_shape)\n",
|
||
" for i in range(result_shape[0]):\n",
|
||
" for j in range(result_shape[1]):\n",
|
||
" result[i, j] = torch.sum(img[i:i+kH, j:j+kW] * kernel)\n",
|
||
" return result\n",
|
||
" \n",
|
||
" \n",
|
||
" \n",
|
||
" # YOUR CODE HERE\n",
|
||
" pass"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"id": "38048231",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-03T08:51:32.066584Z",
|
||
"start_time": "2024-04-03T08:51:32.056798Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"tensor([[29., 22., 13., 13.],\n",
|
||
" [23., 22., 25., 27.],\n",
|
||
" [19., 26., 27., 24.],\n",
|
||
" [27., 24., 17., 15.]]) True\n",
|
||
"tensor([[285., 369., 286.],\n",
|
||
" [230., 317., 257.],\n",
|
||
" [306., 374., 344.]]) True\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"x1 = torch.tensor([\n",
|
||
" [4, 9, 3, 0, 3],\n",
|
||
" [9, 7, 3, 7, 3],\n",
|
||
" [1, 6, 6, 9, 8],\n",
|
||
" [6, 6, 8, 4, 3],\n",
|
||
" [6, 9, 1, 4, 4]\n",
|
||
"])\n",
|
||
"k1 = torch.ones((2, 2))\n",
|
||
"o1 = torch.tensor([\n",
|
||
" [29., 22., 13., 13.],\n",
|
||
" [23., 22., 25., 27.],\n",
|
||
" [19., 26., 27., 24.],\n",
|
||
" [27., 24., 17., 15.]\n",
|
||
"])\n",
|
||
"\n",
|
||
"x2 = torch.tensor([\n",
|
||
" [1, 9, 9, 9, 0, 1],\n",
|
||
" [2, 3, 0, 5, 5, 2],\n",
|
||
" [9, 1, 8, 8, 3, 6],\n",
|
||
" [9, 1, 7, 3, 5, 2],\n",
|
||
" [1, 0, 9, 3, 1, 1],\n",
|
||
" [0, 3, 6, 6, 7, 9]\n",
|
||
"])\n",
|
||
"k2 = torch.tensor([\n",
|
||
" [6, 3, 4, 5],\n",
|
||
" [0, 8, 2, 8],\n",
|
||
" [2, 7, 5, 0],\n",
|
||
" [0, 8, 1, 9]\n",
|
||
"])\n",
|
||
"o2 = torch.tensor([\n",
|
||
" [285., 369., 286.],\n",
|
||
" [230., 317., 257.],\n",
|
||
" [306., 374., 344.]\n",
|
||
"])\n",
|
||
"\n",
|
||
"# TEST YOUR conv2d FUNCTION HERE\n",
|
||
"c1 = conv2d(x1, k1)\n",
|
||
"print(c1, torch.all(torch.eq(c1, o1)).item())\n",
|
||
"c2 = conv2d(x2, k2)\n",
|
||
"print(c2, torch.all(torch.eq(c2, o2)).item())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2d6a9790",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Task 1.2: Max Pooling Under The Hood\n",
|
||
"\n",
|
||
"Your task is to write the `maxpool2d` function that takes in an image, `img : torch.Tensor`, and a square kernel size `size : int`. Assume stride is 1 and there's no padding.\n",
|
||
"\n",
|
||
"We've given you two images `x1` and `x2` to test your `maxpool2d` function with `size=2` and `size=3` respectively. \n",
|
||
"$$\n",
|
||
"m1 = \\texttt{maxpool2d}\\Bigg(\n",
|
||
"\\begin{bmatrix}\n",
|
||
" 4 & 9 & 3 & 0 & 3 \\\\\n",
|
||
" 9 & 7 & 3 & 7 & 3 \\\\\n",
|
||
" 1 & 6 & 6 & 9 & 8 \\\\\n",
|
||
" 6 & 6 & 8 & 4 & 3 \\\\\n",
|
||
" 6 & 9 & 1 & 4 & 4 \\\\\n",
|
||
"\\end{bmatrix},~2\\Bigg) =\n",
|
||
"\\begin{bmatrix} \n",
|
||
" max(4,9,9,7) & max(9,3,7,3) & max(3,0,3,7) & max(0,3,7,3) \\\\\n",
|
||
" max(9,7,1,6) & max(7,3,6,6) & max(3,7,6,9) & max(7,3,9,8) \\\\\n",
|
||
" max(1,6,6,6) & max(6,6,6,8) & max(6,9,8,4) & max(9,8,4,3) \\\\\n",
|
||
" max(6,6,6,9) & max(6,8,9,1) & max(8,4,1,4) & max(4,3,4,4) \\\\\n",
|
||
"\\end{bmatrix} =\n",
|
||
"\\begin{bmatrix} \n",
|
||
" 9 & 9 & 7 & 7 \\\\\n",
|
||
" 9 & 7 & 9 & 9 \\\\\n",
|
||
" 6 & 8 & 9 & 9 \\\\\n",
|
||
" 9 & 9 & 8 & 4 \\\\\n",
|
||
"\\end{bmatrix}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"$$\n",
|
||
"m2 = \\texttt{maxpool2d}\\Bigg(\n",
|
||
"\\begin{bmatrix}\n",
|
||
" 1 & 9 & 9 & 9 & 0 & 1 \\\\\n",
|
||
" 2 & 3 & 0 & 5 & 5 & 2 \\\\\n",
|
||
" 9 & 1 & 8 & 8 & 3 & 6 \\\\\n",
|
||
" 9 & 1 & 7 & 3 & 5 & 2 \\\\\n",
|
||
" 1 & 0 & 9 & 3 & 1 & 1 \\\\\n",
|
||
" 0 & 3 & 6 & 6 & 7 & 9 \\\\\n",
|
||
"\\end{bmatrix},~3\\Bigg) = \\begin{bmatrix} \n",
|
||
" 9 & 9 & 9 & 9 \\\\\n",
|
||
" 9 & 8 & 8 & 8 \\\\ \n",
|
||
" 9 & 9 & 9 & 8 \\\\\n",
|
||
" 9 & 9 & 9 & 9 \\\\\n",
|
||
"\\end{bmatrix}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"__Note:__ You are not allowed to use the `torch.nn.functional.max_pool2d` function."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"id": "cb5469f7",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-03T08:53:52.828251Z",
|
||
"start_time": "2024-04-03T08:53:52.823369Z"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"torch.manual_seed(0)\n",
|
||
"\n",
|
||
"def maxpool2d(img, size):\n",
|
||
" \"\"\"\n",
|
||
" PARAMS\n",
|
||
" img: the 2-dim image with a specific height and width\n",
|
||
" size: an integer corresponding to the window size for Max Pooling\n",
|
||
" \n",
|
||
" RETURNS\n",
|
||
" the 2-dim output after Max Pooling\n",
|
||
" \"\"\"\n",
|
||
" iH, iW = img.shape\n",
|
||
" result_shape = (iH-size+1, iW-size+1)\n",
|
||
" \n",
|
||
" result = torch.zeros(result_shape)\n",
|
||
" for i in range(result_shape[0]):\n",
|
||
" for j in range(result_shape[1]):\n",
|
||
" result[i, j] = torch.max(img[i:i+size, j:j+size])\n",
|
||
" return result"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 14,
|
||
"id": "cb2954e1",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-03T08:53:54.157039Z",
|
||
"start_time": "2024-04-03T08:53:54.148566Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"tensor([[9., 9., 7., 7.],\n",
|
||
" [9., 7., 9., 9.],\n",
|
||
" [6., 8., 9., 9.],\n",
|
||
" [9., 9., 8., 4.]]) True\n",
|
||
"tensor([[9., 9., 9., 9.],\n",
|
||
" [9., 8., 8., 8.],\n",
|
||
" [9., 9., 9., 8.],\n",
|
||
" [9., 9., 9., 9.]]) True\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"x1 = torch.tensor([\n",
|
||
" [4, 9, 3, 0, 3],\n",
|
||
" [9, 7, 3, 7, 3],\n",
|
||
" [1, 6, 6, 9, 8],\n",
|
||
" [6, 6, 8, 4, 3],\n",
|
||
" [6, 9, 1, 4, 4]\n",
|
||
"])\n",
|
||
"k1 = 2\n",
|
||
"o1 = torch.tensor([\n",
|
||
" [9., 9., 7., 7.],\n",
|
||
" [9., 7., 9., 9.],\n",
|
||
" [6., 8., 9., 9.],\n",
|
||
" [9., 9., 8., 4.]\n",
|
||
"])\n",
|
||
"\n",
|
||
"x2 = torch.tensor([\n",
|
||
" [1, 9, 9, 9, 0, 1],\n",
|
||
" [2, 3, 0, 5, 5, 2],\n",
|
||
" [9, 1, 8, 8, 3, 6],\n",
|
||
" [9, 1, 7, 3, 5, 2],\n",
|
||
" [1, 0, 9, 3, 1, 1],\n",
|
||
" [0, 3, 6, 6, 7, 9]\n",
|
||
"])\n",
|
||
"k2 = 3\n",
|
||
"o2 = torch.tensor([\n",
|
||
" [9., 9., 9., 9.],\n",
|
||
" [9., 8., 8., 8.],\n",
|
||
" [9., 9., 9., 8.],\n",
|
||
" [9., 9., 9., 9.]\n",
|
||
"])\n",
|
||
"\n",
|
||
"# TEST YOUR maxpool2d FUNCTION HERE\n",
|
||
"m1 = maxpool2d(x1, k1)\n",
|
||
"print(m1, torch.all(torch.eq(m1, o1)).item())\n",
|
||
"m2 = maxpool2d(x2, k2)\n",
|
||
"print(m2, torch.all(torch.eq(m2, o2)).item())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "1933b2f8",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Chapter 2: MNIST Classification with CNNs\n",
|
||
"\n",
|
||
"As done in PS5, we will be working on the MNIST handwritten digits classification problem. This time, however, your images are no longer flattened to form input vectors $\\in \\mathbb{R}^{784}$. You'll be working the images as they are in the form of $1 \\times 28 \\times 28$ tensors, where $28$ is the image height and width, and $1$ is the number of colour channels (grayscale image in this case)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"id": "c78ee5e1",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-27T17:04:36.032494Z",
|
||
"start_time": "2024-04-27T17:04:35.981931Z"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# do not remove this cell\n",
|
||
"# run this before moving on\n",
|
||
"\n",
|
||
"T = transforms.Compose([\n",
|
||
" transforms.ToTensor(),\n",
|
||
" transforms.Normalize([0.5], [0.5])\n",
|
||
"])\n",
|
||
"\n",
|
||
"\"\"\"\n",
|
||
"Note: You can update the path to point to the directory containing `MNIST` \n",
|
||
"directory to avoid downloading the MNIST data again.\n",
|
||
"\"\"\"\n",
|
||
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
|
||
"mnist_train = datasets.MNIST(\"./\", train=True, download=True, transform=T)\n",
|
||
"mnist_test = datasets.MNIST(\"./\", train=False, download=True, transform=T)\n",
|
||
"\n",
|
||
"\n",
|
||
"\"\"\"\n",
|
||
"if you feel your computer can't handle too much data, you can reduce the batch\n",
|
||
"size to 64 or 32 accordingly, but it will make training slower. \n",
|
||
"\n",
|
||
"We recommend sticking to 128 but do choose an appropriate batch size that your\n",
|
||
"computer can manage. The training phase tends to require quite a bit of memory.\n",
|
||
"\"\"\"\n",
|
||
"train_loader = torch.utils.data.DataLoader(mnist_train, shuffle=True, batch_size=256)\n",
|
||
"test_loader = torch.utils.data.DataLoader(mnist_test, batch_size=10000)\n",
|
||
"\n",
|
||
"def get_accuracy(scores, labels):\n",
|
||
" ''' accuracy metric '''\n",
|
||
" _, predicted = torch.max(scores.data, 1)\n",
|
||
" correct = (predicted == labels).sum().item() \n",
|
||
" return correct / scores.size(0)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "6e4c7f78",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Concept 1: DataLoaders\n",
|
||
"\n",
|
||
"PyTorch __DataLoaders__ accept datasets and can iterate through the datasets as we deem fit.\n",
|
||
"\n",
|
||
"`train_loader = torch.utils.data.DataLoader(mnist_train, shuffle=True, batch_size=256)` means that this dataloader takes in the MNIST training data, and outputs training features and labels in batches of 256. It also reshuffles all the data in the dataset for the next epoch once it has outputted all the data in the dataset.\n",
|
||
"\n",
|
||
"Run the following code to get a better idea of how dataloaders work."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"id": "4e0fdf18",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-27T17:04:39.560907Z",
|
||
"start_time": "2024-04-27T17:04:39.424591Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Feature batch shape: torch.Size([256, 1, 28, 28])\n",
|
||
"Labels batch shape: torch.Size([256])\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": "<Figure size 640x480 with 1 Axes>",
|
||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAGFCAYAAAASI+9IAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAJbklEQVR4nO3cMWieVR/G4ZM2CDHYGCrUCCo2GAex0EEq6FLo1kExDkZLzKAg1CVrwaK4BjsKSqBQiJkK3TNkqLabmkUMVixCFEEFNRmC4XUQbj6ww/t/vuRtTK9rzs15oE1/nsEz1Ov1eg0AWmuH7vYHALB/iAIAIQoAhCgAEKIAQIgCACEKAIQoABDD/f7g0NDQXn4HAHusn/9X2U0BgBAFAEIUAAhRACBEAYAQBQBCFAAIUQAgRAGAEAUAQhQACFEAIEQBgBAFAEIUAAhRACBEAYAQBQBCFAAIUQAgRAGAEAUAQhQACFEAIEQBgBi+2x/A7jp8+HB5c/Xq1fJmamqqvHn22WfLm9Za+/PPPzvtgDo3BQBCFAAIUQAgRAGAEAUAQhQACFEAIEQBgBAFAEIUAAhRACBEAYAQBQDCK6kHzPBw/Y/0yJEj5c2TTz5Z3oyMjJQ3rXklFQbJTQGAEAUAQhQACFEAIEQBgBAFAEIUAAhRACBEAYAQBQBCFAAIUQAgPIh3wJw6daq8OXHixB58CfBf5KYAQIgCACEKAIQoABCiAECIAgAhCgCEKAAQogBAiAIAIQoAhCgAEB7EO2Dee++98mZsbKy8WVtbK282NzfLG2Cw3BQACFEAIEQBgBAFAEIUAAhRACBEAYAQBQBCFAAIUQAgRAGAEAUAwoN4+9Tp06c77Z5//vld/pI7u3TpUnmztbW1B18C7CY3BQBCFAAIUQAgRAGAEAUAQhQACFEAIEQBgBAFAEIUAAhRACBEAYDwIN4+NTk52Wl3+PDhXf6SO1tfXx/IOcBguSkAEKIAQIgCACEKAIQoABCiAECIAgAhCgCEKAAQogBAiAIAIQoAhCgAEF5JpW1sbAxkA+x/bgoAhCgAEKIAQIgCACEKAIQoABCiAECIAgAhCgCEKAAQogBAiAIA4UG8fero0aMDO+uHH34YyIZ/vPDCC512s7Ozu/wld/bTTz+VN5988kl54+/Q/uSmAECIAgAhCgCEKAAQogBAiAIAIQoAhCgAEKIAQIgCACEKAIQoABAexNunXnvttYGd9d133w3srIPm/Pnz5c3Fixc7nTXIRxKr3nzzzfJmcXGx01nvvvtupx39cVMAIEQBgBAFAEIUAAhRACBEAYAQBQBCFAAIUQAgRAGAEAUAQhQAiKFer9fr6weHhvb6W/gfX331Vafd008/Xd6cO3euvFleXi5vDqLffvutvHnggQf24Evu7PPPPy9vHnroofJmamqqvNnc3CxvWmttbGys047W+vnn3k0BgBAFAEIUAAhRACBEAYAQBQBCFAAIUQAgRAGAEAUAQhQACFEAIEQBgBi+2x9wL+jyguQjjzzS6awuL09+8cUXnc7az0ZHR8ubpaWl8ubBBx8sb7755pvyprXW3nnnnfJmZWWlvJmYmChvurzG+thjj5U3rbU2Pz9f3ly6dKnTWfciNwUAQhQACFEAIEQBgBAFAEIUAAhRACBEAYAQBQBCFAAIUQAgRAGA8CDeADz66KPlzfj4eKezfv755/Km6wNt+9nc3Fx5c/bs2fJma2urvHn//ffLm9a6PW7XxY8//jiQTZffi9Zae+ONN8obD+L1z00BgBAFAEIUAAhRACBEAYAQBQBCFAAIUQAgRAGAEAUAQhQACFEAIDyIx4H00ksvDeSchYWF8ubTTz/dgy/ZPTMzM+XNM888swdfcmdLS0sDO+te5KYAQIgCACEKAIQoABCiAECIAgAhCgCEKAAQogBAiAIAIQoAhCgAEB7EG4DV1dXy5uuvv+501uOPP17ePPfcc+XNzZs3y5uuHn744fLm+PHje/Al//brr78O5JyuDh2q/3ffK6+8Ut6MjIyUNysrK+VNa619+OGHnXb0x00BgBAFAEIUAAhRACBEAYAQBQBCFAAIUQAgRAGAEAUAQhQACFEAIDyINwA7OzvlzZdfftnprKeeeqq8uXbtWnkzPT1d3ly/fr28aa210dHR8mZ8fLzTWVXHjh0byDldLSwslDcvvvhiebO+vl7eXLhwobxprbW//vqr047+uCkAEKIAQIgCACEKAIQoABCiAECIAgAhCgCEKAAQogBAiAIAIQoAhCgAEEO9Xq/X1w8ODe31t7ALfv/99/Lm/vvvL29u3LhR3pw/f768aa21tbW18uatt94qbz766KPyZnt7u7z54IMPypvWuv05zc3NlTcTExPlze3bt8ubkydPljetdfs7zj/6+efeTQGAEAUAQhQACFEAIEQBgBAFAEIUAAhRACBEAYAQBQBCFAAIUQAgPIh3wLz66qvlzeXLl8ub4eHh8uaXX34pb1pr7YknnihvdnZ2ypsrV66UN9PT0+VNn79yd83GxkZ5c+bMmfJmfX29vOH/40E8AEpEAYAQBQBCFAAIUQAgRAGAEAUAQhQACFEAIEQBgBAFAEIUAIj6q2bsa8vLy+VNl8cOP/744/Lm6NGj5U1rrd2+fbu8mZ2dLW9u3bpV3ux333//fXkzPz9f3njc7uBwUwAgRAGAEAUAQhQACFEAIEQBgBAFAEIUAAhRACBEAYAQBQBCFACIoV6v1+vrBzs8msbBNTMzU94sLi52Ouu+++7rtBuELr8Xff7K/cvq6mp58/bbb5c33377bXnDf0M/f/fcFAAIUQAgRAGAEAUAQhQACFEAIEQBgBAFAEIUAAhRACBEAYAQBQDCg3gMzOTkZKfdxYsXy5vXX3+901lVf/zxR3nz8ssvdzrrs88+K2+2t7c7ncXB5EE8AEpEAYAQBQBCFAAIUQAgRAGAEAUAQhQACFEAIEQBgBAFAEIUAAhRACC8kgpwj/BKKgAlogBAiAIAIQoAhCgAEKIAQIgCACEKAIQoABCiAECIAgAhCgCEKAAQogBAiAIAIQoAhCgAEKIAQIgCACEKAIQoABCiAECIAgAhCgCEKAAQogBAiAIAIQoAhCgAEKIAQIgCACEKAIQoABCiAECIAgAhCgCEKAAQogBAiAIAIQoAhCgAEKIAQIgCACEKAIQoABCiAECIAgAhCgCEKAAQogBAiAIAIQoAhCgAEKIAQIgCACEKAIQoABCiAECIAgAhCgCEKAAQogBAiAIAIQoAhCgAEKIAQIgCACEKAIQoABCiAECIAgAhCgCEKAAQogBAiAIAIQoAhCgAEKIAQAz3+4O9Xm8vvwOAfcBNAYAQBQBCFAAIUQAgRAGAEAUAQhQACFEAIEQBgPgbLvku86SCoJIAAAAASUVORK5CYII="
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Label: 6\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# no need to code\n",
|
||
"# run this before moving on\n",
|
||
"\n",
|
||
"train_features, train_labels = next(iter(train_loader))\n",
|
||
"print(f\"Feature batch shape: {train_features.size()}\")\n",
|
||
"print(f\"Labels batch shape: {train_labels.size()}\")\n",
|
||
"img = train_features[0].squeeze()\n",
|
||
"label = train_labels[0]\n",
|
||
"plt.imshow(img, cmap=\"gray\")\n",
|
||
"plt.axis(\"off\")\n",
|
||
"plt.show()\n",
|
||
"print(f\"Label: {label}\")"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "aa2edfb3",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Task 2.1: Building a Vanilla ConvNet\n",
|
||
"\n",
|
||
"Your task here is to build a ConvNet using PyTorch layers. You can refer to the attached command glossary to read more about the layers. Use the following architecture:\n",
|
||
"\n",
|
||
"$$\n",
|
||
"\\text{Conv(32, (3,3))} \\rightarrow \\text{MP(2,2)} \\rightarrow \\text{LReLU(0.1)} \\rightarrow \\text{Conv(64, (3,3))} \\rightarrow \\text{MP(2,2)} \\rightarrow \\text{LReLU(0.1)} \\rightarrow \\text{Flat} \\\\ \\rightarrow \\text{L(1600, 256)} \\rightarrow \\text{LReLU(0.1)} \\rightarrow \\text{L(256, 128)} \\rightarrow \\text{LReLU(0.1)} \\rightarrow \\text{L(128, 10)} \\rightarrow \\text{Softmax}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"where \n",
|
||
"- [`Conv`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html) is a Convolution layer with the specified output channels and kernel size, with no padding and a stride of 1 by default.\n",
|
||
"\n",
|
||
"- [`MP`](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html) is the Max Pooling layer with the specified kernel size, with no padding, the stride set to the same shape as the kernel by default.\n",
|
||
"\n",
|
||
"- [`LReLU`](https://pytorch.org/docs/stable/generated/torch.nn.LeakyReLU.html) is Leaky ReLU with the specified negative slope.\n",
|
||
"\n",
|
||
"- `Flat` is the flattening operation, which should flatten/reshape the tensor from a multi-dimensional tensor (batch_size, num_channels, width, height) into a \"flat\" tensor (batch_size, num_channels x width x height). The 2-dimensional result represents that each sample has only 1 dimension of \"flattened\" data. This has already been implemented for you\n",
|
||
"\n",
|
||
"- [`L`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) is a fully-connected layer with the specified input and output features.\n",
|
||
"\n",
|
||
"You are highly encouraged to initialise all your layers in the `__init__` method.\n",
|
||
"\n",
|
||
"__Note:__ The only constructor argument here is `classes`. For all your networks hereon, do not add any parameters to the `__init__` method other than the ones mentioned. Remember not to hardcode for the number of classes and use the `classes` argument instead.\n",
|
||
"\n",
|
||
"__Note:__ There is no need to include a Softmax layer in your neural network, as technically, [CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) which we are going to use as our loss function later, already applies Softmax implicitly."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"id": "6dd33b95",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-27T17:04:45.792747Z",
|
||
"start_time": "2024-04-27T17:04:45.419030Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"input torch.Size([20, 1, 28, 28])\n",
|
||
"conv1 torch.Size([20, 32, 26, 26])\n",
|
||
"maxpo torch.Size([20, 32, 13, 13])\n",
|
||
"lrelu torch.Size([20, 32, 13, 13])\n",
|
||
"conv2 torch.Size([20, 64, 11, 11])\n",
|
||
"torch.Size([20, 10])\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"class RawCNN(nn.Module):\n",
|
||
" def __init__(self, classes):\n",
|
||
" super().__init__()\n",
|
||
" \"\"\"\n",
|
||
" classes: integer that corresponds to the number of classes for MNIST\n",
|
||
" \"\"\"\n",
|
||
" self.conv1 = nn.Conv2d(1, 32, 3, stride=1, padding=0)\n",
|
||
" self.conv2 = nn.Conv2d(32, 64, 3, stride=1, padding=0)\n",
|
||
" self.mp = nn.MaxPool2d(2)\n",
|
||
" self.lRelu = nn.LeakyReLU(0.1)\n",
|
||
" self.l1 = nn.Linear(1600, 256)\n",
|
||
" self.l2 = nn.Linear(256, 128)\n",
|
||
" self.l3 = nn.Linear(128, classes)\n",
|
||
" # YOUR CODE HERE\n",
|
||
" \n",
|
||
" def forward(self, x):\n",
|
||
" # YOUR CODE HERE \n",
|
||
" print('input', x.shape)\n",
|
||
" x = self.conv1(x)\n",
|
||
" print('conv1',(x.shape))\n",
|
||
" x = self.mp(x)\n",
|
||
" print('maxpo',(x.shape))\n",
|
||
" x = self.lRelu(x)\n",
|
||
" print('lrelu',(x.shape))\n",
|
||
" x = self.conv2(x)\n",
|
||
" print('conv2',(x.shape))\n",
|
||
" x = self.mp(x)\n",
|
||
" x = self.lRelu(x)\n",
|
||
" x = x.view(-1, 64*5*5) # Flattening – do not remove this line\n",
|
||
"\n",
|
||
" x = self.l1(x)\n",
|
||
" x = self.lRelu(x)\n",
|
||
" x = self.l2(x)\n",
|
||
" x = self.lRelu(x)\n",
|
||
" x = self.l3(x)\n",
|
||
" return x\n",
|
||
"\n",
|
||
"# Test your network's forward pass\n",
|
||
"num_samples, num_channels, width, height = 20, 1, 28, 28\n",
|
||
"x = torch.rand(num_samples, num_channels, width, height).to(device)\n",
|
||
"net = RawCNN(10).to(device)\n",
|
||
"y = net(x)\n",
|
||
"print(y.shape) # torch.Size([20, 10])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "f2f0bc12",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Concept 2: Dropout\n",
|
||
"\n",
|
||
"__Dropout__ (*Srivastava et al., 2014*) is a regularisation technique that randomly shuts off neurons in a given layer. This means the output of the neuron is __zero__. As users, we need to specify a probability value `p` that is the probability of a neuron being shut off or not; there's a $p$ chance of a neuron being shut off.\n",
|
||
"\n",
|
||
"Suppose a layer has $n$ neurons/units. Mathematically, \n",
|
||
"\n",
|
||
"$$\n",
|
||
"\\text{Prob}(i = 1) = p \\\\\n",
|
||
"\\text{Prob}(i = 0) = 1 - p\n",
|
||
"$$ \n",
|
||
"\n",
|
||
"where $i \\in \\{1, \\dots, n\\}$ and $1$ represents neuron $i$ being shut off and $0$ represents neuron $0$ left untouched.\n",
|
||
"\n",
|
||
"Essentially, Dropout does this:\n",
|
||
"\n",
|
||
"<img src=\"https://production-media.paperswithcode.com/methods/Screen_Shot_2020-05-23_at_6.19.24_PM.png\" width=600>\n",
|
||
"\n",
|
||
"### Why Dropout works\n",
|
||
"By randomly dropping/zero-ing out neurons in a layer, it has a regularising effect on the model. It prevents overfitting because the loss of certain features means the model doesn't accidentally compute very complex functions to model the relationship between $x$ and $y$.\n",
|
||
"\n",
|
||
"#### Dropout in PyTorch\n",
|
||
"To use Dropout in a network, we can create a `Dropout` layer in our `__init__` method of the model class:\n",
|
||
"\n",
|
||
"```python\n",
|
||
"class Model(nn.Module):\n",
|
||
" def __init__(self, ..., drop_prob):\n",
|
||
" super().__init__()\n",
|
||
" self.l1 = ...\n",
|
||
" ...\n",
|
||
" self.dropout = nn.Dropout(p=drop_prob)\n",
|
||
" ...\n",
|
||
" self.ln = ...\n",
|
||
" \n",
|
||
" def forward(self, x):\n",
|
||
" x = self.l1(x)\n",
|
||
" ...\n",
|
||
" x = self.dropout(x)\n",
|
||
" ...\n",
|
||
" out = ...\n",
|
||
" \n",
|
||
" return out\n",
|
||
"```\n",
|
||
"\n",
|
||
"__Note:__ Other that `nn.Dropout`, there is a `nn.Dropout2d` in PyTorch. Instead of randomly zero-ing out neurons, `Dropout2d` randomly zero-es out the entire channels of the input. \n",
|
||
"\n",
|
||
"`nn.Dropout` is best used with non-spatial data or data that has been flattened, which is typical for fully connected layers. `nn.Dropout2d` is designed for spatial data, making it ideal for use right after convolutional and pooling layers in CNNs.\n",
|
||
"\n",
|
||
"For the sake of this problem set, You should choose one of them to be but __NOT both__ in your neural network.\n",
|
||
"\n",
|
||
"---"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "d263bdc8",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Task 2.2: Building a ConvNet with Dropout\n",
|
||
"\n",
|
||
"Here, you must build the exact same network but with Dropout inside the architecture. You can refer to the attached command glossary to read more about the layers. Use the following architecture:\n",
|
||
"\n",
|
||
"$$\n",
|
||
"\\text{Conv(32, (3,3))} \\rightarrow \\text{MP(2,2)} \\rightarrow \\text{LReLU(0.1)} \\rightarrow \\textbf{DO(prob)} \\rightarrow \\\\\n",
|
||
"\\text{Conv(64, (3,3))} \\rightarrow \\text{MP(2,2)} \\rightarrow \\text{LReLU(0.1)} \\rightarrow \\textbf{DO(prob)} \\rightarrow \\\\\n",
|
||
"\\text{Flat} \\rightarrow \\text{L(1600, 256)} \\rightarrow \\text{LReLU(0.1)} \\rightarrow \\textbf{DO(prob)} \\rightarrow \\\\\n",
|
||
"\\text{L(256, 128)} \\rightarrow \\text{LReLU(0.1)} \\rightarrow \\text{L(128, 10)} \\rightarrow \\text{Softmax}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"where \n",
|
||
"- [`Conv`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html) is a Convolution layer with the specified output channels and kernel size, with no padding and a stride of 1 by default.\n",
|
||
"\n",
|
||
"- [`MP`](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html) is the Max Pooling layer with the specified kernel size, with no padding, the stride set to the same shape as the kernel by default.\n",
|
||
"\n",
|
||
"- [`LReLU`](https://pytorch.org/docs/stable/generated/torch.nn.LeakyReLU.html) is Leaky ReLU with the specified negative slope.\n",
|
||
"\n",
|
||
"- `Flat` is the flattening operation, which should flatten/reshape the tensor from a multi-dimensional tensor (batch_size, num_channels, width, height) into a \"flat\" tensor (batch_size, num_channels x width x height). The 2-dimensional result represents that each sample has only 1 dimension of \"flattened\" data. This has already been implemented for you\n",
|
||
"\n",
|
||
"- [`L`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) is a fully-connected layer with the specified input and output features.\n",
|
||
" \n",
|
||
"- [`DO`](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html) is Dropout with a dropping probability. Choose between `nn.Dropout` and `nn.Dropout2d` but __not both__ for your network.\n",
|
||
"\n",
|
||
"You are highly encouraged to initialise all your layers in the `__init__` method.\n",
|
||
"\n",
|
||
"__Reminder:__ Do not hardcode for the number of classes and the dropout probability. Use the `classes` and `drop_prob` constructor arguments instead.\n",
|
||
"\n",
|
||
"__Note:__ There is no need to include a Softmax layer in your neural network, as technically, [CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) which we are going to use as our loss function later, already applies Softmax implicitly."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 89,
|
||
"id": "e9a58c35",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T03:15:50.305024Z",
|
||
"start_time": "2024-04-07T03:15:50.294454Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"torch.Size([20, 10])\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"class DropoutCNN(nn.Module):\n",
|
||
" def __init__(self, classes, drop_prob=0.5):\n",
|
||
" super().__init__()\n",
|
||
" \"\"\"\n",
|
||
" classes: integer that corresponds to the number of classes for MNIST\n",
|
||
" drop_prob: probability of dropping a node in the neural network\n",
|
||
" \"\"\"\n",
|
||
" self.conv1 = nn.Conv2d(1, 32, (3, 3), stride=1, padding=0)\n",
|
||
" self.conv2 = nn.Conv2d(32, 64, (3, 3), stride=1, padding=0)\n",
|
||
" self.mp = nn.MaxPool2d((2, 2))\n",
|
||
" self.lRelu = nn.LeakyReLU(0.1)\n",
|
||
" self.l1 = nn.Linear(1600, 256)\n",
|
||
" self.l2 = nn.Linear(256, 128)\n",
|
||
" self.l3 = nn.Linear(128, classes)\n",
|
||
" self.dropout = nn.Dropout(p=drop_prob)\n",
|
||
"\n",
|
||
" # YOUR CODE HERE\n",
|
||
" \n",
|
||
" def forward(self, x):\n",
|
||
" x = self.conv1(x)\n",
|
||
" x = self.mp(x)\n",
|
||
" x = self.lRelu(x)\n",
|
||
" x = self.dropout(x)\n",
|
||
" \n",
|
||
" x = self.conv2(x)\n",
|
||
" x = self.mp(x)\n",
|
||
" x = self.lRelu(x)\n",
|
||
" x = self.dropout(x)\n",
|
||
" \n",
|
||
" x = x.view(-1, 64*5*5) # Flattening – do not remove this line\n",
|
||
" x = self.l1(x)\n",
|
||
" x = self.lRelu(x)\n",
|
||
" x = self.dropout(x)\n",
|
||
" \n",
|
||
" x = self.l2(x)\n",
|
||
" x = self.lRelu(x)\n",
|
||
" x = self.l3(x)\n",
|
||
" return x\n",
|
||
"\n",
|
||
"# Test your network's forward pass\n",
|
||
"num_samples, num_channels, width, height = 20, 1, 28, 28\n",
|
||
"x = torch.rand(num_samples, num_channels, width, height).to(device)\n",
|
||
"net = DropoutCNN(10).to(device)\n",
|
||
"y = net(x)\n",
|
||
"print(y.shape) # torch.Size([20, 10])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "c779f0ec",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Task 2.3: Training your Vanilla and Dropout CNNs\n",
|
||
"\n",
|
||
"Here, write down the training loop in the function `train_model` to train the CNNs you have just created. It will take in the respective NN (vanilla or dropout), as well as training and testing __data loaders__ (more on this later) that return batches of images and their respective labels to train on. \n",
|
||
"\n",
|
||
"Use the `torch.optim.Adam(...)` optimizer and Cross Entropy Loss.\n",
|
||
"\n",
|
||
"> Return the model and epoch losses.\n",
|
||
"\n",
|
||
"Remember to extract the loss value from the `loss` tensor by using `loss.item()`.\n",
|
||
"\n",
|
||
"__Tip:__ Don't be worried if your model takes a while to train. Your mileage may also vary depending on your CPU. But if you would like to speed things up, you can consider making use of your device's GPU to parallelize the matrix computations."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"id": "1573889a",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-11T04:04:14.704810Z",
|
||
"start_time": "2024-04-11T04:04:10.568283Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"input torch.Size([256, 1, 28, 28])\n",
|
||
"conv1 torch.Size([256, 32, 26, 26])\n",
|
||
"maxpo torch.Size([256, 32, 13, 13])\n",
|
||
"lrelu torch.Size([256, 32, 13, 13])\n",
|
||
"conv2 torch.Size([256, 64, 11, 11])\n",
|
||
"input torch.Size([256, 1, 28, 28])\n",
|
||
"conv1 torch.Size([256, 32, 26, 26])\n",
|
||
"maxpo torch.Size([256, 32, 13, 13])\n",
|
||
"lrelu torch.Size([256, 32, 13, 13])\n",
|
||
"conv2 torch.Size([256, 64, 11, 11])\n",
|
||
"input torch.Size([256, 1, 28, 28])\n",
|
||
"conv1 torch.Size([256, 32, 26, 26])\n",
|
||
"maxpo torch.Size([256, 32, 13, 13])\n",
|
||
"lrelu torch.Size([256, 32, 13, 13])\n",
|
||
"conv2 torch.Size([256, 64, 11, 11])\n"
|
||
]
|
||
},
|
||
{
|
||
"ename": "KeyboardInterrupt",
|
||
"evalue": "",
|
||
"output_type": "error",
|
||
"traceback": [
|
||
"\u001B[0;31m---------------------------------------------------------------------------\u001B[0m",
|
||
"\u001B[0;31mKeyboardInterrupt\u001B[0m Traceback (most recent call last)",
|
||
"File \u001B[0;32m<timed exec>:33\u001B[0m\n",
|
||
"File \u001B[0;32m<timed exec>:21\u001B[0m, in \u001B[0;36mtrain_model\u001B[0;34m(loader, model, device)\u001B[0m\n",
|
||
"File \u001B[0;32m/opt/homebrew/anaconda3/envs/cs2109s-ay2223s1/lib/python3.9/site-packages/torch/_tensor.py:396\u001B[0m, in \u001B[0;36mTensor.backward\u001B[0;34m(self, gradient, retain_graph, create_graph, inputs)\u001B[0m\n\u001B[1;32m 387\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m has_torch_function_unary(\u001B[38;5;28mself\u001B[39m):\n\u001B[1;32m 388\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m handle_torch_function(\n\u001B[1;32m 389\u001B[0m Tensor\u001B[38;5;241m.\u001B[39mbackward,\n\u001B[1;32m 390\u001B[0m (\u001B[38;5;28mself\u001B[39m,),\n\u001B[0;32m (...)\u001B[0m\n\u001B[1;32m 394\u001B[0m create_graph\u001B[38;5;241m=\u001B[39mcreate_graph,\n\u001B[1;32m 395\u001B[0m inputs\u001B[38;5;241m=\u001B[39minputs)\n\u001B[0;32m--> 396\u001B[0m \u001B[43mtorch\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mautograd\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mbackward\u001B[49m\u001B[43m(\u001B[49m\u001B[38;5;28;43mself\u001B[39;49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mgradient\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mretain_graph\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mcreate_graph\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43minputs\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43minputs\u001B[49m\u001B[43m)\u001B[49m\n",
|
||
"File \u001B[0;32m/opt/homebrew/anaconda3/envs/cs2109s-ay2223s1/lib/python3.9/site-packages/torch/autograd/__init__.py:173\u001B[0m, in \u001B[0;36mbackward\u001B[0;34m(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)\u001B[0m\n\u001B[1;32m 168\u001B[0m retain_graph \u001B[38;5;241m=\u001B[39m create_graph\n\u001B[1;32m 170\u001B[0m \u001B[38;5;66;03m# The reason we repeat same the comment below is that\u001B[39;00m\n\u001B[1;32m 171\u001B[0m \u001B[38;5;66;03m# some Python versions print out the first line of a multi-line function\u001B[39;00m\n\u001B[1;32m 172\u001B[0m \u001B[38;5;66;03m# calls in the traceback and some print out the last line\u001B[39;00m\n\u001B[0;32m--> 173\u001B[0m \u001B[43mVariable\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43m_execution_engine\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mrun_backward\u001B[49m\u001B[43m(\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;66;43;03m# Calls into the C++ engine to run the backward pass\u001B[39;49;00m\n\u001B[1;32m 174\u001B[0m \u001B[43m \u001B[49m\u001B[43mtensors\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mgrad_tensors_\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mretain_graph\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mcreate_graph\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43minputs\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 175\u001B[0m \u001B[43m \u001B[49m\u001B[43mallow_unreachable\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[38;5;28;43;01mTrue\u001B[39;49;00m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43maccumulate_grad\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[38;5;28;43;01mTrue\u001B[39;49;00m\u001B[43m)\u001B[49m\n",
|
||
"\u001B[0;31mKeyboardInterrupt\u001B[0m: "
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"%%time \n",
|
||
"# do not remove the above line\n",
|
||
"device = \"cuda\" if torch.has_cuda else \"cpu\"\n",
|
||
"def train_model(loader, model, device=device):\n",
|
||
" model = model.to(device)\n",
|
||
" optimiser = torch.optim.Adam(model.parameters())\n",
|
||
" loss_fn = nn.CrossEntropyLoss().to(device)\n",
|
||
" # loss_fn = nn.CrossEntropyLoss()\n",
|
||
" epoch_losses = []\n",
|
||
" for i in range(10):\n",
|
||
" epoch_loss = 0\n",
|
||
" \n",
|
||
" for idx, data in enumerate(loader):\n",
|
||
" x, y = data\n",
|
||
" x = x.to(device)\n",
|
||
" y = y.to(device)\n",
|
||
" \n",
|
||
" optimiser.zero_grad()\n",
|
||
" y_pred = model(x)\n",
|
||
"\n",
|
||
" loss = loss_fn(y_pred, y)\n",
|
||
" loss.backward()\n",
|
||
" optimiser.step()\n",
|
||
" # COMPUTE STATS\n",
|
||
" epoch_loss += loss.item()\n",
|
||
"\n",
|
||
" epoch_loss = epoch_loss / len(loader)\n",
|
||
" epoch_losses.append(epoch_loss)\n",
|
||
" print (\"Epoch: {}, Loss: {}\".format(i, epoch_loss))\n",
|
||
" return model, epoch_losses\n",
|
||
" \n",
|
||
"\n",
|
||
" # YOUR CODE HERE\n",
|
||
"vanilla_model, losses = train_model(train_loader, RawCNN(10))\n",
|
||
"# do_model, losses = train_model(train_loader, DropoutCNN(10))\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"id": "15a66fad",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T02:51:31.721190Z",
|
||
"start_time": "2024-04-07T02:51:29.073203Z"
|
||
},
|
||
"scrolled": true
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"vanilla acc: 0.9872\n",
|
||
"drop-out (0.5) acc: 0.9912\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# do not remove – nothing to code here\n",
|
||
"# run this cell before moving on\n",
|
||
"\n",
|
||
"with torch.no_grad():\n",
|
||
" vanilla_model.eval()\n",
|
||
" for i, data in enumerate(test_loader):\n",
|
||
" x, y = data\n",
|
||
" x = x.to(device)\n",
|
||
" y = y.to(device)\n",
|
||
" pred_vanilla = vanilla_model(x)\n",
|
||
" acc = get_accuracy(pred_vanilla, y)\n",
|
||
" print(f\"vanilla acc: {acc}\")\n",
|
||
" \n",
|
||
" do_model.eval()\n",
|
||
" for i, data in enumerate(test_loader):\n",
|
||
" x, y = data\n",
|
||
" x = x.to(device)\n",
|
||
" y = y.to(device)\n",
|
||
" pred_do = do_model(x)\n",
|
||
" acc = get_accuracy(pred_do, y)\n",
|
||
" print(f\"drop-out (0.5) acc: {acc}\")\n",
|
||
" \n",
|
||
"\"\"\"\n",
|
||
"The network with Dropout might under- or outperform the network without\n",
|
||
"Dropout. However, in terms of generalisation, we are assured that the Dropout\n",
|
||
"network will not overfit – that's the guarantee of Dropout.\n",
|
||
"\n",
|
||
"A very nifty trick indeed!\n",
|
||
"\"\"\";"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "5e39d607",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Task 2.4: Observing Effects of Dropout\n",
|
||
"\n",
|
||
"Here, train your `DropoutCNN` with your `train_model(loader, model)` from Task 2.3, but with `p=0.1` and `p=0.95` respectively. \n",
|
||
"\n",
|
||
"Explain why extreme values of Dropout don't work as well on neural networks. Look back at first principles – what does Dropout do in the first place? How does the `p` value affect how it does it? "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 19,
|
||
"id": "eaa1389b",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T02:55:00.432275Z",
|
||
"start_time": "2024-04-07T02:52:00.630718Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Epoch: 0, Loss: 0.279120754735901\n",
|
||
"Epoch: 1, Loss: 0.06485465260499969\n",
|
||
"Epoch: 2, Loss: 0.04437660489905071\n",
|
||
"Epoch: 3, Loss: 0.03242283386990745\n",
|
||
"Epoch: 4, Loss: 0.027098283476810505\n",
|
||
"Epoch: 5, Loss: 0.022629741275128214\n",
|
||
"Epoch: 6, Loss: 0.019325615923871543\n",
|
||
"Epoch: 7, Loss: 0.01702770205521758\n",
|
||
"Epoch: 8, Loss: 0.01503490183522251\n",
|
||
"Epoch: 9, Loss: 0.013751926652571939\n",
|
||
"Epoch: 0, Loss: 2.383768850691775\n",
|
||
"Epoch: 1, Loss: 2.270876559805363\n",
|
||
"Epoch: 2, Loss: 2.070882184454735\n",
|
||
"Epoch: 3, Loss: 1.7385065088880822\n",
|
||
"Epoch: 4, Loss: 1.5343923441907192\n",
|
||
"Epoch: 5, Loss: 1.4513092396107126\n",
|
||
"Epoch: 6, Loss: 1.395279371484797\n",
|
||
"Epoch: 7, Loss: 1.348512672870717\n",
|
||
"Epoch: 8, Loss: 1.3113447128458227\n",
|
||
"Epoch: 9, Loss: 1.2827049635826273\n",
|
||
"CPU times: user 2min 59s, sys: 158 ms, total: 2min 59s\n",
|
||
"Wall time: 2min 59s\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"%%time \n",
|
||
"# do not remove – nothing to code here\n",
|
||
"# run this before moving on\n",
|
||
"\n",
|
||
"do10_model, do10_losses = train_model(train_loader, DropoutCNN(10, 0.10))\n",
|
||
"do95_model, do95_losses = train_model(train_loader, DropoutCNN(10, 0.95))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 21,
|
||
"id": "e8874ce7",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T02:57:55.059022Z",
|
||
"start_time": "2024-04-07T02:57:52.392158Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"drop-out (0.10) acc: 0.9924\n",
|
||
"drop-out (0.95) acc: 0.6142\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# do not remove – nothing to code here\n",
|
||
"# run this cell before moving on\n",
|
||
"\n",
|
||
"with torch.no_grad():\n",
|
||
" do10_model.eval()\n",
|
||
" for i, data in enumerate(test_loader):\n",
|
||
" x, y = data\n",
|
||
" x = x.to(device)\n",
|
||
" y = y.to(device)\n",
|
||
" pred_do = do10_model(x)\n",
|
||
" acc = get_accuracy(pred_do, y)\n",
|
||
" print(f\"drop-out (0.10) acc: {acc}\")\n",
|
||
"\n",
|
||
" do95_model.eval()\n",
|
||
" for i, data in enumerate(test_loader):\n",
|
||
" x, y = data\n",
|
||
" x = x.to(device)\n",
|
||
" y = y.to(device)\n",
|
||
" pred_do = do95_model(x)\n",
|
||
" acc = get_accuracy(pred_do, y)\n",
|
||
" print(f\"drop-out (0.95) acc: {acc}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "c11012be",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Concept 3: Confusion Matrix Analysis\n",
|
||
"\n",
|
||
"A __Confusion Matrix__ (CM) is a $k \\times k$ matrix that represents the number of correctly classified and misclassified samples from the dataset. For a binary classification problem, the CM is simply $2 \\times 2$. \n",
|
||
"\n",
|
||
"> For every non-diagonal row $i$ and non-diagonal column $j$, the entry $CM(i, j)$ represents the number of times the model classified a sample with label $j$ as $i$ (for example, calling a `cat` a `dog` or vice versa). \n",
|
||
"\n",
|
||
"#### TP, FP, TN, FN\n",
|
||
"Let's start small: to understand a $2 \\times 2$ CM and its 4 quadrants, you need to first understand the following concepts:\n",
|
||
"\n",
|
||
"- __True Positive__: when the prediction is positive and label is positive\n",
|
||
"- __True Negative__: when the prediction is negative and label is negative\n",
|
||
"- __False Positive__: when the prediction is positive and label is negative (also known as Type I error)\n",
|
||
"- __False Negative__: when the prediction is negative but label is positive (also known as Type II error)\n",
|
||
"\n",
|
||
"<img src=\"imgs/confusion_matrix.png\" width=1000>\n",
|
||
"\n",
|
||
"---\n",
|
||
"\n",
|
||
"#### Types of Errors\n",
|
||
"__Type 1 Error__: When you support and make the __False Positive__ conclusion. Eg: The ART says you have COVID but you actually don't have it.\n",
|
||
"\n",
|
||
"__Type 2 Error__: When you support and make the __False Negative__ conclusion. Eg: The ART says you don't have COVID but you actually have it. \n",
|
||
"\n",
|
||
"Obviously, we want the model to score high on the True Positives and True Negatives (i.e., the diagonals). Here's a comic to better understand the above terms:\n",
|
||
"\n",
|
||
"<img src=\"imgs/doc_confusion_matrix.png\" width=1000>\n",
|
||
"\n",
|
||
"---\n",
|
||
"\n",
|
||
"This concept of FP, TP, FN, TN and the confusion matrix can be scaled to a classification problem with $k > 2$ classes as well. The diagonals represent the number of samples the model correctly classified where each column (or row) corresponds to class label (or prediction). "
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "1cce640a",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Task 2.5: What did the model misclassify?\n",
|
||
"\n",
|
||
"Your task is to run the cell below, and check out the CM for the vanilla model and dropout model you have previously trained in Task 2.3. On Coursemology, post a screenshot of the two CMs, identify which class (i.e., which digit) each model misclassified the most (the class with the most **False Positives + False Negatives**), and explain your reasoning on how you came to this conclusion for both models."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 93,
|
||
"id": "dd4b56d6",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T03:37:27.228083Z",
|
||
"start_time": "2024-04-07T03:37:23.893030Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"vanilla acc: 0.9919\n",
|
||
"drop-out (0.5) acc: 0.9924\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"<Axes: title={'center': 'Confusion Matrix for do_model'}>"
|
||
]
|
||
},
|
||
"execution_count": 93,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
},
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 1000x700 with 2 Axes>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
},
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 1000x700 with 2 Axes>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"from sklearn.metrics import confusion_matrix\n",
|
||
"\n",
|
||
"with torch.no_grad():\n",
|
||
" vanilla_model.eval()\n",
|
||
" for i, data in enumerate(test_loader):\n",
|
||
" x, y = data\n",
|
||
" x = x.to(device)\n",
|
||
" y = y.to(device)\n",
|
||
" pred_vanilla = vanilla_model(x)\n",
|
||
" acc = get_accuracy(pred_vanilla, y)\n",
|
||
" print(f\"vanilla acc: {acc}\")\n",
|
||
" \n",
|
||
" do_model.eval()\n",
|
||
" for i, data in enumerate(test_loader):\n",
|
||
" x, y = data\n",
|
||
" x = x.to(device)\n",
|
||
" y = y.to(device)\n",
|
||
" pred_do = do_model(x)\n",
|
||
" acc = get_accuracy(pred_do, y)\n",
|
||
" print(f\"drop-out (0.5) acc: {acc}\")\n",
|
||
"\n",
|
||
"cm = confusion_matrix(mnist_test.targets, pred_vanilla.argmax(dim=1).cpu())\n",
|
||
"plt.figure(figsize=(10,7))\n",
|
||
"plt.title('Confusion Matrix for vanilla_model')\n",
|
||
"np.fill_diagonal(cm, 0) # you can zero-out the diagonal to highlight the errors better\n",
|
||
"sns.heatmap(cm, annot=True, cmap='Blues', fmt='g')\n",
|
||
"# print(cm) # if seaborn does not work, you can always print out the array\n",
|
||
" \n",
|
||
"cm = confusion_matrix(mnist_test.targets, pred_do.argmax(dim=1).cpu())\n",
|
||
"plt.figure(figsize=(10,7))\n",
|
||
"plt.title('Confusion Matrix for do_model')\n",
|
||
"np.fill_diagonal(cm, 0) # you can zero-out the diagonal to highlight the errors better\n",
|
||
"sns.heatmap(cm, annot=True, cmap='Blues', fmt='g')\n",
|
||
"# print(cm) # if seaborn does not work, you can always print out the array"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "3e0bf87e",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Concept 4: Classification Metrics (Precision, Recall, F1-score, ROC Curve, and AUC-ROC)\n",
|
||
"\n",
|
||
"### Introduction\n",
|
||
"\n",
|
||
"In classification tasks, we often want to evaluate the performance of our predictive model beyond just the accuracy. In some cases, we would like to minimize the number of false positives, while in other cases, we would like to minimize the number of false negatives. \n",
|
||
"\n",
|
||
"For instance, in a cancer diagnosis task, we would like to minimize the number of false negatives (i.e., patients who have cancer but are diagnosed as healthy), but don't care as much for the number of false positives (i.e., patients who don't have cancer but are diagnosed as ill) because delaying cancer treatment is dangerous while suggesting a person to go through more advanced checkup poses no additional harm. On the other hand, in a spam detection task, we would like to minimize the number of false positives (i.e., emails that are not spam but are classified as spam), because users may miss out on an important email. \n",
|
||
"\n",
|
||
"In this section, we will delve deeper into the metrics that help us to understand how well our model is doing in different aspects: Precision, Recall, F1-score, ROC Curve, and AUC-ROC.\n",
|
||
"\n",
|
||
"### Definitions\n",
|
||
"\n",
|
||
"#### 1. __Precision__\n",
|
||
"\n",
|
||
"Precision helps us to understand the correctness of our model in predicting the positive class.\n",
|
||
"\n",
|
||
"$$\n",
|
||
"\\text{Precision} = \\frac{\\text{True Positives}}{\\text{True Positives} + \\text{False Positives}}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"#### 2. __Recall__\n",
|
||
"\n",
|
||
"Recall, also known as sensitivity or true positive rate, indicates how well the model identifies positive instances.\n",
|
||
"\n",
|
||
"$$\n",
|
||
"\\text{Recall} = \\frac{\\text{True Positives}}{\\text{True Positives} + \\text{False Negatives}}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"#### 3. __F1-score__\n",
|
||
"\n",
|
||
"The F1-score is the harmonic mean of precision and recall and provides a balance between the two metrics.\n",
|
||
"\n",
|
||
"$$\n",
|
||
"\\text{F1-Score} = 2 \\cdot \\frac{\\text{Precision} \\cdot \\text{Recall}}{\\text{Precision} + \\text{Recall}}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"### Graphical Metrics\n",
|
||
"\n",
|
||
"#### 4. __ROC Curve__\n",
|
||
"\n",
|
||
"The ROC (Receiver Operating Characteristic) curve is a graphical representation of the true positive rate against the false positive rate, helping to visualize the performance of the binary classification model.\n",
|
||
"\n",
|
||
"#### 5. __AUC-ROC__\n",
|
||
"\n",
|
||
"AUC (Area Under the ROC Curve) represents the model's ability to discriminate between positive and negative classes; a higher AUC value indicates a better model performance.\n",
|
||
"\n",
|
||
"\n",
|
||
"<img src=\"imgs/Roc_curve.png\" width=500>\n",
|
||
"\n",
|
||
"### Further Reading\n",
|
||
"You can read more about the metrics and how to use them in the following links:\n",
|
||
"1. [Scikit-learn Classification Metrics](https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics)\n",
|
||
"2. [Understanding Confusion Matrix](https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "e3dfa092",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Evaluate your model\n",
|
||
"\n",
|
||
"Run the cell below to evaluate the model you built previously with the metrics."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 95,
|
||
"id": "d3169d65",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T03:38:00.781825Z",
|
||
"start_time": "2024-04-07T03:38:00.765999Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"0.9923320262116722\n",
|
||
"0.9922976098863504\n",
|
||
"0.9923886815131227\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"from sklearn.metrics import f1_score, precision_score, recall_score\n",
|
||
"\n",
|
||
"print(f1_score(pred_do.argmax(dim=1).cpu(), mnist_test.targets, average='macro'))\n",
|
||
"print(precision_score(pred_do.argmax(dim=1).cpu(), mnist_test.targets, average='macro'))\n",
|
||
"print(recall_score(pred_do.argmax(dim=1).cpu(), mnist_test.targets, average='macro'))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "c800da3f",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Chapter 3: Training on CIFAR-10\n",
|
||
"\n",
|
||
"## Concept 5: CIFAR-10\n",
|
||
"Using what you've learned with MNIST, apply the techniques to CIFAR-10, a dataset of 60K training and 10K testing images comprising of real-life objects corresponding to the following 10 classes:\n",
|
||
"\n",
|
||
"- airplane\t\t\t\t\t\t\t\t\t\t\n",
|
||
"- automobile\t\t\t\t\t\t\t\t\t\t\n",
|
||
"- bird\t\t\t\t\t\t\t\t\t\t\n",
|
||
"- cat\t\t\t\t\t\t\t\t\t\t\n",
|
||
"- deer\t\t\t\t\t\t\t\t\t\t\n",
|
||
"- dog\t\t\t\t\t\t\t\t\t\t\n",
|
||
"- frog\t\t\t\t\t\t\t\t\t\t\n",
|
||
"- horse\t\t\t\t\t\t\t\t\t\t\n",
|
||
"- ship\t\t\t\t\t\t\t\t\t\t\n",
|
||
"- truck\n",
|
||
"\n",
|
||
"Each image is $3 \\times 32 \\times 32$, meaning we operate on 3 color channels RGB. Some of these images look like so:\n",
|
||
"\n",
|
||
"<img src=\"imgs/cifar.jpg\" width=600>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "7e003539",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Concept 6: Data Augmentation\n",
|
||
"\n",
|
||
"In reality, however, finding a well-representative, balanced dataset is difficult. To address this issue, we use __Data Augmentation__. It refers to the process of transforming data in a training dataset in one or more ways to create more samples to expand the training dataset. \n",
|
||
"\n",
|
||
"Here, we will pick images from the original dataset `x_train`, perform some transformations $F$ on them, and append them to `x_train`. So, for example, if I have a training dataset of 200 car images, I can perform augmentations on the 200 images to get 300 more images, thereby making my new training dataset 500 images large.\n",
|
||
"\n",
|
||
"Of course, the impact of data augmentation on model training depends on the types of augmentation used. Here are some common ones Computer Vision practitioners use:\n",
|
||
"\n",
|
||
"- Normalisation\n",
|
||
"- Horizontal and Vertical Flipping\n",
|
||
"- Rotation\n",
|
||
"- Blurring\n",
|
||
"- Adding noise\n",
|
||
"- Skewing\n",
|
||
"- Cropping (zooming in or out)\n",
|
||
"- Brightness and Contrast\n",
|
||
"- Shuffling pixels\n",
|
||
"\n",
|
||
"This results in a wide variety of new samples being created that can be used for training."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "239c0069",
|
||
"metadata": {},
|
||
"source": [
|
||
"## The `transforms` module\n",
|
||
"\n",
|
||
"Here, we are going to use the `transforms` module from PyTorch to transform the images in our dataset. It contains all kinds of image transformations from `rotate` to `resize`. Check out the full list of augmentations on the PyTorch documentation: https://pytorch.org/vision/stable/transforms.html.\n",
|
||
"\n",
|
||
"Explore the following example to see how the transformations work."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 24,
|
||
"id": "9fc794d7",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T04:10:43.402021Z",
|
||
"start_time": "2024-04-07T04:10:42.575847Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Files already downloaded and verified\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 1000x700 with 2 Axes>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"cifar_train = datasets.CIFAR10(\"./\", train=True, download=True, transform=transforms.ToTensor())\n",
|
||
"cifar_train_loader = torch.utils.data.DataLoader(cifar_train, batch_size=128, shuffle=True)\n",
|
||
"\n",
|
||
"train_features, train_labels = next(iter(cifar_train_loader))\n",
|
||
"img = train_features[0]\n",
|
||
"\n",
|
||
"fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,7))\n",
|
||
"transform = transforms.Compose([transforms.RandomHorizontalFlip(),\n",
|
||
" transforms.RandomVerticalFlip(),\n",
|
||
" transforms.ColorJitter(brightness=0.5),\n",
|
||
" # transforms.RandomResizedCrop(32),\n",
|
||
" # YOUR CODE HERE\n",
|
||
" ]) # add in your own transformations to test\n",
|
||
"tensor_img = transform(img)\n",
|
||
"ax1.imshow(img.permute(1,2,0))\n",
|
||
"ax1.axis(\"off\")\n",
|
||
"ax1.set_title(\"Before Transformation\")\n",
|
||
"ax2.imshow(tensor_img.permute(1, 2, 0))\n",
|
||
"ax2.axis(\"off\")\n",
|
||
"ax2.set_title(\"After Transformation\")\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"id": "bd138177d2e4877",
|
||
"metadata": {
|
||
"collapsed": false
|
||
},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "fc1e7ce8",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Task 3.1: Picking Data Augmentations\n",
|
||
"\n",
|
||
"Your task is to pick your favourite data augmentations and apply them to the images from the dataset (in the later cell). \n",
|
||
"\n",
|
||
"We've already started you off with the necessary one `ToTensor()` that converts the original JPEG-format image to the PyTorch `Tensor` format. Refer to the command glossary to add your custom data augmentations from the list we've provided. \n",
|
||
"\n",
|
||
"**Choose at least 2 additional augmentations.** Tell us which augmentations you chose to use _and_ why. Then tell us which augmentations you avoided _and_ why. \n",
|
||
"\n",
|
||
"__Note:__ Feel free to use any augmentations you wish from the full list of augmentations shown on the [PyTorch documentation](https://pytorch.org/vision/stable/transforms.html)! There's no need to be restricted to the list that we've provided.\n",
|
||
"\n",
|
||
"The point is to improve your model performance as much as possible! Use trial and error to get the best performing network in Task 3.2!\n",
|
||
"\n",
|
||
"Be creative :D\n",
|
||
"\n",
|
||
"__Note:__ Do ensure your augmentations retain the 3-dimensional shape of the CIFAR-10 images. The final images should still have the shape `(3, 32, 32)`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 118,
|
||
"id": "1d132349",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T04:52:35.817710Z",
|
||
"start_time": "2024-04-07T04:52:35.813776Z"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# pick your data augmentations here\n",
|
||
"def get_augmentations():\n",
|
||
" T = transforms.Compose([\n",
|
||
" transforms.ToTensor(),\n",
|
||
" transforms.RandomHorizontalFlip(),\n",
|
||
" transforms.RandomVerticalFlip(),\n",
|
||
" transforms.ColorJitter(brightness=0.5),\n",
|
||
" ])\n",
|
||
" return T"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "b91f7700",
|
||
"metadata": {},
|
||
"source": [
|
||
"Create your data loaders that return batches of data:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 119,
|
||
"id": "8f15171f",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T04:52:39.178114Z",
|
||
"start_time": "2024-04-07T04:52:37.715878Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Files already downloaded and verified\n",
|
||
"Files already downloaded and verified\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# do not remove this cell\n",
|
||
"# run this before moving on\n",
|
||
"\n",
|
||
"T = get_augmentations()\n",
|
||
"\n",
|
||
"cifar_train = datasets.CIFAR10(\"./\", train=True, download=True, transform=T)\n",
|
||
"cifar_test = datasets.CIFAR10(\"./\", train=False, download=True, transform=T)\n",
|
||
"\n",
|
||
"\"\"\"\n",
|
||
"if you feel your computer can't handle too much data, you can reduce the batch\n",
|
||
"size to 64 or 32 accordingly, but it will make training slower. \n",
|
||
"\n",
|
||
"We recommend sticking to 128 but dochoose an appropriate batch size that your\n",
|
||
"computer can manage. The training phase tends to require quite a bit of memory.\n",
|
||
"\n",
|
||
"CIFAR-10 images have dimensions 3x32x32, while MNIST is 1x28x28\n",
|
||
"\"\"\"\n",
|
||
"cifar_train_loader = torch.utils.data.DataLoader(cifar_train, batch_size=128, shuffle=True)\n",
|
||
"cifar_test_loader = torch.utils.data.DataLoader(cifar_test, batch_size=10000)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2f0c6794",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Concept 7: Sequential Model Building with PyTorch\n",
|
||
"\n",
|
||
"All this while, you've been adding layers one by one as attributes inside the `__init__` method. This is so that you can quickly debug which layer(s) is causing issues later down the road. However, for the most part, there should be no major issues when creating parts of your network or your entire network. \n",
|
||
"\n",
|
||
"This is why PyTorch lets you combine layers together using the `nn.Sequential` API. It allows you to stack layers inside and chain layers together. It allows you to build isolated modules that can exist on their own (either within a `nn.Module` class or otherwise) and be used as independent \"mini models\" on data tensors. Refer to https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html for more information about combining PyTorch modules to create your own.\n",
|
||
"\n",
|
||
"__Note:__ You should not add an array of layers inside `nn.Sequential` i.e., it's `nn.Sequential(xyz, abc, mno)`, **not** `nn.Sequential([xyz, abc, mno])`.\n",
|
||
"\n",
|
||
"#### DEMO 1: 3-layer Multilayer Perceptron for MNIST"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 40,
|
||
"id": "fd7ede5a",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T04:12:19.583175Z",
|
||
"start_time": "2024-04-07T04:12:19.572009Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"torch.Size([15, 10])\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"densenet = nn.Sequential(\n",
|
||
" nn.Linear(784, 512),\n",
|
||
" nn.ReLU(),\n",
|
||
" nn.Linear(512, 128),\n",
|
||
" nn.ReLU(),\n",
|
||
" nn.Linear(128, 10),\n",
|
||
" nn.Softmax(1) # softmax dimension\n",
|
||
" )\n",
|
||
"\n",
|
||
"x = torch.rand(15, 784) # a batch of 15 MNIST images\n",
|
||
"y = densenet(x) # here we simply run the sequential densenet on the `x` tensor\n",
|
||
"print(y.shape) # a batch of 15 predictions"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "035610c7",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### DEMO 2: 2-layer ConvNet for MNIST"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 41,
|
||
"id": "a68d4ba9",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T04:12:21.336248Z",
|
||
"start_time": "2024-04-07T04:12:21.181299Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"torch.Size([15, 10])\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"convnet = nn.Sequential(\n",
|
||
" nn.Conv2d(1, 32, (3,3)),\n",
|
||
" nn.ReLU(),\n",
|
||
" nn.Conv2d(32, 64, (3,3)),\n",
|
||
" nn.ReLU(),\n",
|
||
" nn.Flatten(),\n",
|
||
" nn.Linear(36864, 1024),\n",
|
||
" nn.ReLU(),\n",
|
||
" nn.Linear(1024, 512),\n",
|
||
" nn.ReLU(),\n",
|
||
" nn.Linear(512, 128),\n",
|
||
" nn.ReLU(),\n",
|
||
" nn.Linear(128, 10),\n",
|
||
" nn.Softmax(1) # softmax dimension\n",
|
||
" )\n",
|
||
"\n",
|
||
"x = torch.rand(15, 1, 28, 28) # a batch of 15 MNIST images\n",
|
||
"y = convnet(x) # here we simply run the sequential convnet on the `x` tensor\n",
|
||
"print (y.shape) # a batch of 15 predictions"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "3a60c57c",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Task 3.2: Build a ConvNet for CIFAR-10\n",
|
||
"\n",
|
||
"Your task is to build a decently-sized ConvNet (i.e., $\\geq 4$ layers). Design your ConvNet with the following architecture\n",
|
||
"\n",
|
||
"$$\n",
|
||
"\\text{Conv(32, (3,3))} \\rightarrow \\text{MP((2,2))} \\rightarrow \\text{LReLU(0.1)} \\rightarrow \\text{Conv(64, (3,3))} \\rightarrow \\text{MP((2,2))} \\rightarrow \\text{LReLU(0.1)} \\rightarrow \\text{GAP} \\\\ \\rightarrow \\text{L(64, 256)} \\rightarrow \\text{LReLU(0.1)} \\rightarrow \\text{L(256, 128)} \\rightarrow \\text{LReLU(0.1)} \\rightarrow \\text{L(128, 10)}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"where \n",
|
||
"- [`Conv`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html) is a Convolution layer with the specified output channels and kernel size\n",
|
||
"\n",
|
||
"- [`MP`](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html) is the Max Pooling layer with the specified kernel size\n",
|
||
"\n",
|
||
"- [`LReLU`](https://pytorch.org/docs/stable/generated/torch.nn.LeakyReLU.html) is Leaky ReLU with the specified negative slope\n",
|
||
"\n",
|
||
"- `GAP` is the Global Average Pooling operation (already implemented for you)\n",
|
||
"\n",
|
||
"- [`L`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) is a fully-connected layer with the specified input and output features\n",
|
||
"\n",
|
||
"You are highly encouraged to initialise all your layers in the `__init__` method.\n",
|
||
"\n",
|
||
"---\n",
|
||
"\n",
|
||
"You must use the [`nn.Sequential`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) API to build two parts:\n",
|
||
"1. The `self.conv` attribute must contain all the Convolutional, Pooling, and Activation layers\n",
|
||
"2. The `self.fc` attribute must contain all the fully-connected layers after the flattening\n",
|
||
"\n",
|
||
"The `self.conv` and `self.fc` attributes are already given to you. All you need to do is chain the arbitrary `nn.XYZ` layers together based on the architecture stated above.\n",
|
||
"\n",
|
||
"__Note:__ The flattening is already done for you via Global Average Pooling (GAP) in the `forward` method. Do not add the Softmax activation in the `self.fc` Sequential module.\n",
|
||
"\n",
|
||
"__Reminder:__ Do not hardcode for the number of classes. Use the `classes` argument instead.\n",
|
||
"\n",
|
||
"__Note:__ There is no need to include a Softmax layer in your neural network, as technically, [CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) which we are going to use as our loss function later, already applies Softmax implicitly."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 120,
|
||
"id": "3edf5056",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T04:52:44.926878Z",
|
||
"start_time": "2024-04-07T04:52:44.922955Z"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"class CIFARCNN(nn.Module):\n",
|
||
" def __init__(self, classes):\n",
|
||
" super().__init__()\n",
|
||
" \"\"\"\n",
|
||
" classes: integer that corresponds to the number of classes for CIFAR-10\n",
|
||
" \"\"\"\n",
|
||
" self.conv = nn.Sequential(\n",
|
||
" nn.Conv2d(3, 32, (3, 3)),\n",
|
||
" nn.MaxPool2d((2, 2)),\n",
|
||
" nn.LeakyReLU(0.1),\n",
|
||
" nn.Conv2d(32, 64, (3, 3)),\n",
|
||
" nn.MaxPool2d((2, 2)),\n",
|
||
" nn.LeakyReLU(0.1),\n",
|
||
" )\n",
|
||
"\n",
|
||
" self.fc = nn.Sequential(\n",
|
||
" nn.Linear(64, 256),\n",
|
||
" nn.LeakyReLU(0.1),\n",
|
||
" nn.Linear(256, 128),\n",
|
||
" nn.LeakyReLU(0.1),\n",
|
||
" nn.Linear(128, classes)\n",
|
||
" )\n",
|
||
" \n",
|
||
" def forward(self, x):\n",
|
||
" # YOUR CODE HERE\n",
|
||
" x = self.conv(x)\n",
|
||
" x = x.view(x.shape[0], 64, 6*6).mean(2) # GAP – do not remove this line\n",
|
||
" x = self.fc(x)\n",
|
||
" return x"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "22de4211",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Train your ConvNet on CIFAR-10"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 121,
|
||
"id": "c9ea9d9c",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T04:54:08.922882Z",
|
||
"start_time": "2024-04-07T04:52:50.295512Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Epoch: 0, Loss: 1.9733156856063687\n",
|
||
"Epoch: 1, Loss: 1.7190505208261788\n",
|
||
"Epoch: 2, Loss: 1.6375392404053828\n",
|
||
"Epoch: 3, Loss: 1.5758821574013557\n",
|
||
"Epoch: 4, Loss: 1.518964537269319\n",
|
||
"Epoch: 5, Loss: 1.4709320745199843\n",
|
||
"Epoch: 6, Loss: 1.4300793857525682\n",
|
||
"Epoch: 7, Loss: 1.3909132538549125\n",
|
||
"Epoch: 8, Loss: 1.3572429686860965\n",
|
||
"Epoch: 9, Loss: 1.3212151067031315\n",
|
||
"CPU times: user 1min 4s, sys: 15.5 s, total: 1min 19s\n",
|
||
"Wall time: 1min 18s\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"%%time\n",
|
||
"# do not remove – nothing to code here\n",
|
||
"# run this cell before moving on\n",
|
||
"\n",
|
||
"cifar10_model, losses = train_model(cifar_train_loader, CIFARCNN(10), device=\"cpu\")\n",
|
||
"cifar10_model_gpu, losses = train_model(cifar_train_loader, CIFARCNN(10), device=\"mps\")\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "f1376a81",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Test the CIFAR-10 ConvNet model using the testing data loader"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 131,
|
||
"id": "20bdce79",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T04:56:22.594761Z",
|
||
"start_time": "2024-04-07T04:56:20.868553Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"cifar accuracy: 0.5123\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# do not remove – nothing to code here\n",
|
||
"# run this cell before moving on\n",
|
||
"\n",
|
||
"with torch.no_grad():\n",
|
||
" cifar10_model.eval()\n",
|
||
" for i, data in enumerate(cifar_test_loader):\n",
|
||
" x, y = data\n",
|
||
" # x = x.to(\"mps\")\n",
|
||
" # y = y.to(\"mps\")\n",
|
||
" pred = cifar10_model(x)\n",
|
||
" acc = get_accuracy(pred, y)\n",
|
||
" print(f\"cifar accuracy: {acc}\")\n",
|
||
" \n",
|
||
"# don't worry if the CIFAR-10 accuracy is low, it's a tough dataset to crack.\n",
|
||
"# as long as you get something shy of 50%, you should be alright!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "89a05019",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Concept 8: Class Activation Map (CAM) Analysis\n",
|
||
"\n",
|
||
"A __Class Activation Map__ (CAM) is an analysis technique that lets one see through the eyes of the model. It ultimately answers the following question: \"why is the model predicting this label for a given image\". \n",
|
||
"\n",
|
||
"CAM creates a superimposable heatmap that's placed on top of the test image. This heatmap is coloured more strongly for areas the model is focusing on more than others and coloured less strongly for areas the model chooses to ignore.\n",
|
||
"\n",
|
||
"For example, when a picture of a dog (left) is passed through a trained ConvNet, CAM generates a heatmap (center) that's embossed on top of the image (right). The darker (redder) the region of the heatmap, the more the model focuses on that part of the image. \n",
|
||
"\n",
|
||
"<img src=\"imgs/cam.png\" width=1000>\n",
|
||
"\n",
|
||
"You can think of the model looking out for the \"most interesting\" parts of an image when classifying it. In fact, __Saliency__ is the measure of \"interestingness\" in an image. ConvNets look out for the *most salient* features/regions in an image and give out their predictions based on that. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "e321bf5d",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Task 4.1: Building CAM for CIFAR-10\n",
|
||
"\n",
|
||
"Earlier we asked you to divide your CIFAR-10 ConvNet into two parts `self.conv` and `self.fc`. You'll now understand why we did so. \n",
|
||
"\n",
|
||
"Your task is to write the `get_CAM` method that takes in,\n",
|
||
"\n",
|
||
"- `feature_map`: the output of the final activation layer of the CNN. You can get this by running the image through the `self.conv` module i.e., `output = self.conv(x)`\n",
|
||
"\n",
|
||
"- `weight`: the weights of the immediate first Linear layer after the flattening operation\n",
|
||
"\n",
|
||
"- `class_idx`: the label index that your ConvNet outputs/predicts\n",
|
||
"\n",
|
||
"### The CAM Algorithm\n",
|
||
"\n",
|
||
"The first few steps of CAM are already written for you. We'll let you handle the minor implementation details of the rest of the algorithm.\n",
|
||
"\n",
|
||
"1. remove the first dimension of `cam` using `torch.squeeze(...)`\n",
|
||
"2. reshape `cam` to $h \\times w$\n",
|
||
"3. get the difference of `cam` and the minimum elements of `cam`\n",
|
||
"4. divide `cam` by the maximum elements of `cam`\n",
|
||
"5. clip the values of `cam` so they are within the $[0, 1]$ range\n",
|
||
"\n",
|
||
"Refer to the command glossary to find the respective methods."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 133,
|
||
"id": "4e0cde54",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T04:56:28.220258Z",
|
||
"start_time": "2024-04-07T04:56:28.215689Z"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def get_CAM(feature_map, weight, class_idx):\n",
|
||
" \"\"\"\n",
|
||
" PARAMS\n",
|
||
" feature_map: the output of the final pre-GAP layer in the ConvNet\n",
|
||
" weight: the parameters of the first linear layer post-GAP\n",
|
||
" class_idx: the final prediction label of the ConvNet\n",
|
||
" \n",
|
||
" RETURNS\n",
|
||
" a CAM heatmap of the areas the ConvNet is focusing on more\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" # do not remove these lines\n",
|
||
" size_upsample = (32, 32)\n",
|
||
" bz, nc, h, w = feature_map.shape\n",
|
||
"\n",
|
||
" before_dot = feature_map.reshape((nc, h*w))\n",
|
||
" cam = weight[class_idx].unsqueeze(0) @ before_dot\n",
|
||
" \n",
|
||
" \"\"\"\n",
|
||
" YOUR CODE HERE - perform the steps listed above\n",
|
||
" \"\"\"\n",
|
||
"\n",
|
||
" cam = torch.squeeze(cam)# YOUR CODE HERE ## remove the first dimension of cam using torch.squeeze(...)\n",
|
||
" cam = torch.reshape(cam, (h, w)) # YOUR CODE HERE ## reshape cam to h x w\n",
|
||
" cam = cam - torch.min(cam)# YOUR CODE HERE ## get the difference of cam and the minimum elements of cam\n",
|
||
" cam = cam / torch.max(cam)# YOUR CODE HERE ## divide cam by the maximum elements of cam\n",
|
||
" cam = torch.clip(cam, 0, 1) # YOUR CODE HERE ## clip the values of cam so they are within the [0, 1] range\n",
|
||
" \n",
|
||
" # here, `cam` is the final processed heatmap\n",
|
||
" # we upsample/resize the heatmap to the original image's dimensions\n",
|
||
" # do not remove these lines\n",
|
||
" img = transforms.Resize(size_upsample)(cam.unsqueeze(0))\n",
|
||
" \n",
|
||
" return img.detach().numpy(), cam"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "88a9cf44",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Task 4.2: Visualising CAM Heatmaps\n",
|
||
"\n",
|
||
"Once you run the `plot_cam` method in the cell below, you'll be presented two images: the original test image `x`, the raw heatmap and the image with the heatmap on it. \n",
|
||
"\n",
|
||
"Take a screenshot of all three plots and post it on Coursemology. Then, explain what you think the ConvNet was looking at that convinced the model to predict that class. Talk about this in terms of the \"saliency\" of the image. \n",
|
||
"\n",
|
||
"__Note:__ This is an open-ended question that tests your understanding of saliency and what features ConvNets rely on when predicting a class label. Also, the resolution of images from CIFAR-10 isn't fantastic, try your best to identify discerning features of the image!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 134,
|
||
"id": "8f397744",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T04:56:31.521488Z",
|
||
"start_time": "2024-04-07T04:56:31.517416Z"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# do not remove this cell\n",
|
||
"# run this cell before moving on\n",
|
||
"\n",
|
||
"cifar10_classes = [\n",
|
||
" \"airplane\",\n",
|
||
" \"automobile\",\n",
|
||
" \"bird\",\n",
|
||
" \"cat\",\n",
|
||
" \"deer\",\n",
|
||
" \"dog\",\n",
|
||
" \"frog\",\n",
|
||
" \"horse\",\n",
|
||
" \"ship\",\n",
|
||
" \"truck\",\n",
|
||
"]\n",
|
||
"\n",
|
||
"def plot_cam(img, cam):\n",
|
||
" ''' Visualization function '''\n",
|
||
" img = img.permute(1, 2, 0)\n",
|
||
" fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(10,7))\n",
|
||
" ax1.imshow(img)\n",
|
||
" ax1.set_title(f\"Input image\\nLabel: {cifar10_classes[y]}\")\n",
|
||
"\n",
|
||
" ax2.imshow(cam.reshape(32, 32), cmap=\"jet\")\n",
|
||
" ax2.set_title(\"Raw CAM.\")\n",
|
||
"\n",
|
||
" ax3.imshow(img)\n",
|
||
" ax3.imshow(cam.reshape(32, 32), cmap=\"jet\", alpha=0.2)\n",
|
||
" ax3.set_title(f\"Overlayed CAM.\\nPrediction: {cifar10_classes[idx[0]]}\")\n",
|
||
" plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 139,
|
||
"id": "463cced4",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2024-04-07T04:57:02.907611Z",
|
||
"start_time": "2024-04-07T04:57:02.758721Z"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"true class: horse\n",
|
||
"predicated class: horse\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 1000x700 with 3 Axes>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"# do not remove this cell\n",
|
||
"# run this cell before moving on\n",
|
||
"\n",
|
||
"rand_idx = torch.randint(0, 10000, size=[1]) # pick a random index from the test set\n",
|
||
"\n",
|
||
"x = cifar_test[rand_idx][0] # test image\n",
|
||
"y = cifar_test[rand_idx][1] # associated test label\n",
|
||
"\n",
|
||
"cifar10_model.eval()\n",
|
||
"scores = cifar10_model(x.unsqueeze(0)) # get the raw scores\n",
|
||
"probs = scores.data.squeeze()\n",
|
||
"probs, idx = probs.sort(0, True)\n",
|
||
"\n",
|
||
"print('true class: ', cifar10_classes[y])\n",
|
||
"print('predicated class: ', cifar10_classes[idx[0]])\n",
|
||
"\n",
|
||
"assert y == idx[0], \"We want to visualize what the model is focusing on for a correct prediction, run again for another random sample!\"\n",
|
||
"\n",
|
||
"# if the printed prediction and label are different, it means the model misclassified it. \n",
|
||
"# Rerun this cell until you get the same class printed for both. It will help for the visualisation later.\n",
|
||
"\n",
|
||
"# Get the first Linear layer's weights and final Feature Map\n",
|
||
"params = list(cifar10_model.fc.parameters()) # access the model layers\n",
|
||
"weight = params[0].data # grab the first layer's weights\n",
|
||
"\n",
|
||
"feature_maps = cifar10_model.conv(x.unsqueeze(0))\n",
|
||
"\n",
|
||
"# Creating the heatmap\n",
|
||
"heatmap, _ = get_CAM(feature_maps, weight, idx[0])\n",
|
||
" \n",
|
||
"plot_cam(x, heatmap)\n",
|
||
"# Red \"hot\" areas represent where the model is focusing on more\n",
|
||
"# if the shading isn't that great, rerun the cell to get another random sample"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a4c0a351",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Submission\n",
|
||
"\n",
|
||
"Once you are done, please submit your work to Coursemology, by copying the right snippets of code into the corresponding box that says __Your answer__ and click __Save__. After you save, you can make changes to your\n",
|
||
"submission.\n",
|
||
"\n",
|
||
"Once you are satisfied with what you have uploaded, click __Finalize submission__. \n",
|
||
"\n",
|
||
"\n",
|
||
"__Note:__ Once your submission is finalized, it is considered to be submitted for grading and cannot be changed. If you need to undo this action, you will have to email your assigned tutor for help. Please do not finalize your submission until you are sure that you want to submit your solutions for grading. "
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.11.8"
|
||
},
|
||
"vscode": {
|
||
"interpreter": {
|
||
"hash": "5c7b89af1651d0b8571dde13640ecdccf7d5a6204171d6ab33e7c296e100e08a"
|
||
}
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|