nus/cs2109s/labs/final-mock/scratchpad.ipynb
2024-04-28 15:58:30 +08:00

689 lines
50 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "7d017333",
"metadata": {},
"source": [
"# Final Assessment Scratch Pad"
]
},
{
"cell_type": "markdown",
"id": "d3d00386",
"metadata": {},
"source": [
"## Instructions"
]
},
{
"cell_type": "markdown",
"id": "ea516aa7",
"metadata": {},
"source": [
"1. Please use only this Jupyter notebook to work on your model, and **do not use any extra files**. If you need to define helper classes or functions, feel free to do so in this notebook.\n",
"2. This template is intended to be general, but it may not cover every use case. The sections are given so that it will be easier for us to grade your submission. If your specific use case isn't addressed, **you may add new Markdown or code blocks to this notebook**. However, please **don't delete any existing blocks**.\n",
"3. If you don't think a particular section of this template is necessary for your work, **you may skip it**. Be sure to explain clearly why you decided to do so."
]
},
{
"cell_type": "markdown",
"id": "022cb4cd",
"metadata": {},
"source": [
"## Report"
]
},
{
"cell_type": "markdown",
"id": "9c14a2d8",
"metadata": {},
"source": [
"**[TODO]**\n",
"\n",
"Please provide a summary of the ideas and steps that led you to your final model. Someone reading this summary should understand why you chose to approach the problem in a particular way and able to replicate your final model at a high level. Please ensure that your summary is detailed enough to provide an overview of your thought process and approach but also concise enough to be easily understandable. Also, please follow the guidelines given in the `main.ipynb`.\n",
"\n",
"This report should not be longer than **1-2 pages of A4 paper (up to around 1,000 words)**. Marks will be deducted if you do not follow instructions and you include too many words here. \n",
"\n",
"**[DELETE EVERYTHING FROM THE PREVIOUS TODO TO HERE BEFORE SUBMISSION]**\n",
"\n",
"##### Overview\n",
"**[TODO]**\n",
"\n",
"##### 1. Descriptive Analysis\n",
"**[TODO]**\n",
"\n",
"##### 2. Detection and Handling of Missing Values\n",
"**[TODO]**\n",
"\n",
"##### 3. Detection and Handling of Outliers\n",
"**[TODO]**\n",
"\n",
"##### 4. Detection and Handling of Class Imbalance \n",
"**[TODO]**\n",
"\n",
"##### 5. Understanding Relationship Between Variables\n",
"**[TODO]**\n",
"\n",
"##### 6. Data Visualization\n",
"**[TODO]** \n",
"##### 7. General Preprocessing\n",
"**[TODO]**\n",
" \n",
"##### 8. Feature Selection \n",
"**[TODO]**\n",
"\n",
"##### 9. Feature Engineering\n",
"**[TODO]**\n",
"\n",
"##### 10. Creating Models\n",
"**[TODO]**\n",
"\n",
"##### 11. Model Evaluation\n",
"**[TODO]**\n",
"\n",
"##### 12. Hyperparameters Search\n",
"**[TODO]**\n",
"\n",
"##### Conclusion\n",
"**[TODO]**"
]
},
{
"cell_type": "markdown",
"id": "49dcaf29",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "27103374",
"metadata": {},
"source": [
"# Workings (Not Graded)\n",
"\n",
"You will do your working below. Note that anything below this section will not be graded, but we might counter-check what you wrote in the report above with your workings to make sure that you actually did what you claimed to have done. "
]
},
{
"cell_type": "markdown",
"id": "0f4c6cd4",
"metadata": {},
"source": [
"## Import Packages\n",
"\n",
"Here, we import some packages necessary to run this notebook. In addition, you may import other packages as well. Do note that when submitting your model, you may only use packages that are available in Coursemology (see `main.ipynb`)."
]
},
{
"cell_type": "code",
"execution_count": 95,
"id": "cded1ed6",
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-27T06:36:12.309324Z",
"start_time": "2024-04-27T06:36:12.305262Z"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import os\n",
"import numpy as np\n",
"from util import show_images, dict_train_test_split\n",
"from sklearn.preprocessing import OrdinalEncoder\n",
"import matplotlib.pyplot as plt\n"
]
},
{
"cell_type": "markdown",
"id": "748c35d7",
"metadata": {},
"source": [
"## Load Dataset\n",
"\n",
"The dataset provided is multimodal and contains two components, images and tabular data. The tabular dataset `tabular.csv` contains $N$ entries and $F$ columns, including the target feature. On the other hand, the image dataset `images.npy` is of size $(N, H, W)$, where $N$, $H$, and $W$ correspond to the number of data, image width, and image height, respectively. Each image corresponds to the data in the same index of the tabular dataset. These datasets can be found in the `data/` folder in the given file structure.\n",
"\n",
"A code snippet that loads and displays some of the data is provided below.\n",
"\n",
"### Load Tabular Data"
]
},
{
"cell_type": "code",
"execution_count": 96,
"id": "a88be725",
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-27T06:36:13.752562Z",
"start_time": "2024-04-27T06:36:12.324501Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(357699, 61)\n",
"Object columns 18\n"
]
}
],
"source": [
"df = pd.read_csv(os.path.join('data', 'tabular.csv'))\n",
"print(df.shape)\n",
"df.head()\n",
"\n",
"import math\n",
"\n",
"# Object columns\n",
"object_columns = df.dtypes[df.dtypes == 'object']\n",
"print('Object columns', object_columns.shape[0])"
]
},
{
"cell_type": "markdown",
"id": "c09da291",
"metadata": {},
"source": [
"### Load Image Data"
]
},
{
"cell_type": "code",
"execution_count": 114,
"id": "6297e25a",
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-27T07:48:25.899880Z",
"start_time": "2024-04-27T07:48:25.398594Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Shape: (357699, 8, 8)\n"
]
},
{
"data": {
"text/plain": "<Figure size 1200x500 with 15 Axes>",
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA5kAAAGsCAYAAABXfmMRAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAA9hAAAPYQGoP6dpAABHhklEQVR4nO3dfXTU5Zn/8U/IwwQwicozEjHrYzSomKACYkHcdFnpHrXHogWKLlRpEcVoVZa6IrRET7uIXZf0QC31WVotBV0U01aErsXSiIpoAaWWIEQ2qAlPmYRkfn9wwm9TCM59zTeZh/v9OmfOKdPvx/vmO9c131yZYSYtEolEBAAAAABAALrEewMAAAAAgNTBkAkAAAAACAxDJgAAAAAgMAyZAAAAAIDAMGQCAAAAAALDkAkAAAAACAxDJgAAAAAgMBmdvWBLS4t27typnJwcpaWldfbyCEAkEtHevXvVv39/denC7ylcUP/Jj/q3o/5TAz1gRw8kP+rfjvpPfi713+lD5s6dO5Wfn9/Zy6IDVFdXa8CAAfHeRlKh/lMH9e+O+k8t9IA7eiB1UP/uqP/UEU39d/qQmZOTI0m66667FAqFnLKHDh0yrTlo0CBTTpLeeOMNU27hwoWm3KxZs0y5Xr16mXKS1LVrV6fjDx48qBkzZhx5LBG91nNWXV2t3Nxcp+wPfvAD05qVlZWmnCQNGzbMlCspKTHlduzYYcotWrTIlJOkTZs2OR1fX1+v/Px86t+g9Zw9/PDDzs87Tz/9tGnNL774wpSTpIwM2yVyxIgRpty+fftMuddff92Uk6TRo0c7ZxobG/WLX/yCHjCI5Rpwww03mNY8//zzTTlJ6tmzpylXU1Njyp100kmmnOvzyf/15ptvOh3f1NSkX//619S/Qes569evn/OrwCNHjjStOXz4cFNOkn7/+9+bcmeccYYp9+CDD5pyr776qiknSb/97W+djg+Hw3rkkUeiqv9OHzJbXx4PhULKzs52yjY1NZnW7NatmyknyXkQjpXrOWkVyxOsNctbHdy1nrPc3FznHzCstZienm7KxbKmtees9R/LW5ZcH4dW1L+71nPWtWtX5+cd68AXS/1bs9a+aWxsNOVi+TtmZWWZs/SAu1iuAZmZmaY1Y/k5xvrzgXXNePwMZO0B6t9d6znr0qWL83Xb+jjFUhvWnrPWsVX37t3NWWuvRlP/vJkcAAAAABAY05C5cOFCFRQUKDs7W8XFxVq7dm3Q+wISFvUP39ED8Bn1D9/RA4iG85C5dOlSzZgxQ7NmzdKGDRs0YsQIjRkzRtu3b++I/QEJhfqH7+gB+Iz6h+/oAUTLecicP3++Jk+erClTpqiwsFALFixQfn6+Kioqjnl8OBxWfX19mxuQrKh/+M6lB6h/pBquAfAd1wBEy2nIbGxsVFVVlUpLS9vcX1pa2u6nsJaXlysvL+/IjY8uRrKi/uE71x6g/pFKuAbAd1wD4MJpyKytrVVzc7P69OnT5v4+ffq0+3HVM2fOVF1d3ZFbdXW1fbdAHFH/8J1rD1D/SCVcA+A7rgFwYfpM+L//2NpIJNLuR9mGQqFO/xoQoCNR//BdtD1A/SMVcQ2A77gGIBpOr2T27NlT6enpR/22Yvfu3Uf9VgNINdQ/fEcPwGfUP3xHD8CF05CZlZWl4uJiVVZWtrm/srJSw4YNC3RjQKKh/uE7egA+o/7hO3oALpzfLltWVqaJEyeqpKREQ4cO1aJFi7R9+3ZNnTq1I/YHJBTqH76jB+Az6h++owcQLechc9y4cdqzZ4/mzJmjXbt2qaioSCtXrtTAgQM7Yn9AQqH+4Tt6AD6j/uE7egDRSotEIpHOXLC+vl55eXlasmSJunXr1ilrduni/HWgR3z++eem3J49e0y5NWvWmHIXX3yxKSdJW7ZscTq+qalJzz//vOrq6pSbm2te10et9X/llVcqI8P0uVvO9u7da8727dvXlHv++edNufa+a+7LfPe73zXlJOnPf/6z0/H79u3TyJEjqX+D1vqfNGmSsrKyOmXNWD7NMBwOm3KnnnqqKVdSUmLKXXHFFaacJP3mN79xzjQ0NGju3Ln0gEFrD/z4xz9W165dO2XNWHrgrLPOMuVuuukmU+5nP/uZKRfLdW79+vVOx/MzkF1r/X/ve9/rtA8EimUGaG5uNuUGDRpkyi1dutSUO/HEE005SXrzzTedjm9ubtbmzZujqn/7mQcAAAAA4O8wZAIAAAAAAsOQCQAAAAAIDEMmAAAAACAwDJkAAAAAgMAwZAIAAAAAAsOQCQAAAAAIDEMmAAAAACAwDJkAAAAAgMAwZAIAAAAAAsOQCQAAAAAIDEMmAAAAACAwDJkAAAAAgMBkxGvhTz/9VNnZ2U6ZRYsWddBu2rdp0yZT7t///d9NuW984xum3N69e005ScrLy3M6vrGx0bwWDjv11FOVlZXllHF9nILw4IMPmnLz58835YYPH96p60nSJ5984nT8gQMHzGvhsEgkokgk4pSx1v95551nyknSj3/8Y1Pu7rvvNuUqKytNubVr15pyklRYWOicaWpqMq+Hw7KyspyvAdbnnn79+plyknTTTTeZco8++qgpV1tba8pt377dlJOkc8891+n4hoYG81o4bP/+/Tp06JBTZs+ePR20m/Y9/vjjptz3v/99U+4Pf/iDKXfDDTeYcpJ7/Tc1NWnz5s1RHcsrmQAAAACAwDBkAgAAAAACw5AJAAAAAAiM05BZXl6uIUOGKCcnR71799bVV18d9ftygWRH/cN39AB8Rv3Dd/QAXDgNma+//rqmTZumdevWqbKyUocOHVJpaan279/fUfsDEgb1D9/RA/AZ9Q/f0QNw4fTpsq+88kqbPy9ZskS9e/dWVVWVLr/88kA3BiQa6h++owfgM+ofvqMH4CKmrzCpq6uTJJ188sntHhMOhxUOh4/8ub6+PpYlgYRB/cN3X9YD1D9SGdcA+I5rAI7H/ME/kUhEZWVluuyyy1RUVNTuceXl5crLyztyy8/Pty4JJAzqH76Lpgeof6QqrgHwHdcAfBnzkHnrrbfq3Xff1bPPPnvc42bOnKm6urojt+rqauuSQMKg/uG7aHqA+keq4hoA33ENwJcxvV12+vTpWrFihdasWaMBAwYc99hQKKRQKGTaHJCIqH/4LtoeoP6RirgGwHdcAxANpyEzEolo+vTpWrZsmVavXq2CgoKO2heQcKh/+I4egM+of/iOHoALpyFz2rRpeuaZZ7R8+XLl5OSopqZGkpSXl6euXbt2yAaBREH9w3f0AHxG/cN39ABcOP2bzIqKCtXV1WnkyJHq16/fkdvSpUs7an9AwqD+4Tt6AD6j/uE7egAunN8uC/iK+ofv6AH4jPqH7+gBuIjpezJj8Z3vfEe5ublOmR07dpjWmj9/viknSbfccospd+KJJ5pyffr0MeUuvfRSU06SPvjgA6fj09LSzGvhsIcffti5/qdPn25aq7S01JSTpCuuuMKUKykpMeVmz55tysVSk7///e+djm9qajKvhcP+8z//07n+v/rVr5rWGj58uCknSe+++64pZ/0uuNraWlPu29/+tikn2a4d+/bt04MPPmheE9KkSZOce+Dhhx82rXXw4EFTTpK+9a1vmXJDhw415QoLC025WP5t4KpVq5yO5xoQu/Lycuf6v/fee01r/exnPzPlJOnGG2805c466yxTbsKECabcW2+9ZcpJUq9evczZL2P+ChMAAAAAAP4eQyYAAAAAIDAMmQAAAACAwDBkAgAAAAACw5AJAAAAAAgMQyYAAAAAIDAMmQAAAACAwDBkAgAAAAACw5AJAAAAAAgMQyYAAAAAIDAMmQAAAACAwDBkAgAAAAACw5AJAAAAAAhMRrwWzsvLc87cd999prXuuusuU06Szj33XFPu7bffNuU2b95syr344oumnCSdccYZTsc3NDSY18JhL7zwgrp16+aUKSgoMK21du1aU06STj/9dFPurLPOMuW+/vWvm3K33nqrKSdJ1113ndPxDQ0NWrVqlXk9HH4uD4VCTpkLLrjAtNbOnTtNOUn6yU9+YsoVFhaacuvWrTPlbrzxRlNOkq699lrnTFNTk3k9HDZ37lznHujbt69prezsbFNOkkaNGmXKhcNhU27Tpk2mXE5OjiknSenp6U7HNzc3m9fCYeXl5c51mZFhG1nKyspMOUnavXu3Kffee++Zci+//LIpd+WVV5pyktSnTx+n4xsbG6M+llcyAQAAAACBYcgEAAAAAAQmpiGzvLxcaWlpmjFjRkDbAZIH9Q/f0QPwGfUPn1H/+DLmIXP9+vVatGiRzj///CD3AyQF6h++owfgM+ofPqP+EQ3TkLlv3z6NHz9eixcv1kknnRT0noCERv3Dd/QAfEb9w2fUP6JlGjKnTZumq666KqpPMwqHw6qvr29zA5IZ9Q/fRdsD1D9SEdcA+Iz6R7ScPw/4ueee01tvvaX169dHdXx5ebkeeOAB540BiYj6h+9ceoD6R6rhGgCfUf9w4fRKZnV1tW6//XY99dRTUX+/zcyZM1VXV3fkVl1dbdooEG/UP3zn2gPUP1IJ1wD4jPqHK6dXMquqqrR7924VFxcfua+5uVlr1qzRo48+qnA4fNSX2oZCIecvHAYSEfUP37n2APWPVMI1AD6j/uHKacgcPXq0Nm7c2Oa+m266Seecc47uueeeo4oLSCXUP3xHD8Bn1D98Rv3DldOQmZOTo6Kiojb3de/eXT169DjqfiDVUP/wHT0An1H/8Bn1D1fm78kEAAAAAODvOX+67N9bvXp1ANsAkhP1D9/RA/AZ9Q+fUf84npiHTKvPPvtMubm5Tpknn3yyg3bTvt69e5tyAwcONOVWrlxpyk2ePNmUk6RevXo5Hb9//37zWjissrJSWVlZTpnvfOc7HbSb9s2ePduUmzJlSqeud9ttt5lykrR161an48PhsHktHLZp0yZlZLhdfuLx4RG1tbWm3J///GdTbtmyZabc7bffbsrFsiZi88c//tG5B0aOHNkxmzmOwsJCU+6vf/2rKWftnVjerrlv3z6n45uamsxr4TDLdfSxxx7rgJ0c34QJE0y5l19+2ZSz/izT0tJiyknS3LlznY6vr6/X4sWLozqWt8sCAAAAAALDkAkAAAAACAxDJgAAAAAgMAyZAAAAAIDAMGQCAAAAAALDkAkAAAAACAxDJgAAAAAgMAyZAAAAAIDAMGQCAAAAAALDkAkAAAAACAxDJgAAAAAgMAyZAAAAAIDAMGQCAAAAAAKTEa+Fn3/+eXXt2tUp09LSYlrr4osvNuUkaeHChabcqFGjTLnJkyebcs8//7wpJ0mjR492Ov7gwYPmtXDYZZdd5lz/b7/9tmmtLl3sv0uqra015S688EJT7pvf/KYpl5mZacpJUq9evZyOp/5jd9pppykrK8sp09DQYForltpYvny5KTdz5kxTbsGCBabcs88+a8pJ0vjx450zTU1N+uUvf2leE9IFF1ygUCjklJk9e7ZprbKyMlNOkrZv327KzZ0715T73ve+Z8pt27bNlJPcrwGNjY3mtXBYWlqa0tLSnDI33nijaa2ioiJTTpJefPFFU+7JJ5805d58801T7rXXXjPlJPfnh3A4HPWxvJIJAAAAAAgMQyYAAAAAIDAMmQAAAACAwDgPmZ988okmTJigHj16qFu3brrwwgtVVVXVEXsDEg71D9/RA/AZ9Q/f0QOIltMH/3z++ecaPny4Ro0apZdfflm9e/fWRx99pBNPPLGDtgckDuofvqMH4DPqH76jB+DCach86KGHlJ+fryVLlhy577TTTjtuJhwOt/kkovr6ercdAgmC+ofvXHuA+kcq4RoA33ENgAunt8uuWLFCJSUluu6669S7d28NHjxYixcvPm6mvLxceXl5R275+fkxbRiIF+ofvnPtAeofqYRrAHzHNQAunIbMbdu2qaKiQmeeeaZWrVqlqVOn6rbbbtMTTzzRbmbmzJmqq6s7cquuro5500A8UP/wnWsPUP9IJVwD4DuuAXDh9HbZlpYWlZSUaN68eZKkwYMHa9OmTaqoqNC3vvWtY2ZCoZDzFw4DiYj6h+9ce4D6RyrhGgDfcQ2AC6dXMvv166dzzz23zX2FhYXavn17oJsCEhH1D9/RA/AZ9Q/f0QNw4TRkDh8+XJs3b25z35YtWzRw4MBANwUkIuofvqMH4DPqH76jB+DCaci84447tG7dOs2bN08ffvihnnnmGS1atEjTpk3rqP0BCYP6h+/oAfiM+ofv6AG4cBoyhwwZomXLlunZZ59VUVGR5s6dqwULFmj8+PEdtT8gYVD/8B09AJ9R//AdPQAXTh/8I0ljx47V2LFjO2IvQMKj/uE7egA+o/7hO3oA0XIeMoPyy1/+UpmZmU6Z733ve6a1rDlJevnll025OXPmmHK9e/c25UaPHm3KIT4qKiqUnp7ulCkvLzettXbtWlNOks4++2xTzrrXHj16mHJILlOmTNEJJ5zglLnvvvtMa3322WemnCR94xvfMOU+//xzU65Pnz6mXGlpqSmH+HnjjTecrwFlZWWmtbZu3WrKSfZavuaaa0y5L774wpQ7+eSTTTnER0NDgyKRiFNmzJgxprV+/vOfm3KSNGnSJFPOtbdb7d+/35S7+OKLTbmO5vR2WQAAAAAAjochEwAAAAAQGIZMAAAAAEBgGDIBAAAAAIFhyAQAAAAABIYhEwAAAAAQGIZMAAAAAEBgGDIBAAAAAIFhyAQAAAAABIYhEwAAAAAQGIZMAAAAAEBgGDIBAAAAAIHJ6OwFI5GIJOnQoUPO2f3795vWtKzVqr6+3pRraGgw5Q4ePGjKdabWv1vrY4notZ6z5uZm5+yBAwdMa4bDYVNOkpqamjp1Teo/tbWeM8tzubUWY3n+7+w1ret1ttZ90gPuYrkGWJ9XY6mrxsbGTl3Tul5nat0j9e+u9ZxZHmfrDBBL/Vt/7kpPTzflYvl5rbO41H9apJO7ZMeOHcrPz+/MJdFBqqurNWDAgHhvI6lQ/6mD+ndH/acWesAdPZA6qH931H/qiKb+O33IbGlp0c6dO5WTk6O0tLQ2/199fb3y8/NVXV2t3NzcztxWQku08xKJRLR37171799fXbrwjmsX1L9NIp0b6t/uePUvJdbjnEgS7bzQA3ZcA2wS6dxQ/3bUv00inRuX+u/0t8t26dLlSyff3NzcuJ/ERJRI5yUvLy/eW0hK1H9sEuXcUP820dS/lDiPc6JJpPNCD9hwDYhNopwb6t+G+o9NopybaOufX8EAAAAAAALDkAkAAAAACExCDZmhUEj333+/QqFQvLeSUDgvfuBxbh/nxg88zsfGefEDj3P7ODepj8e4fcl6bjr9g38AAAAAAKkroV7JBAAAAAAkN4ZMAAAAAEBgGDIBAAAAAIFhyAQAAAAABCZhhsyFCxeqoKBA2dnZKi4u1tq1a+O9pbibPXu20tLS2tz69u0b722hg9ADbVH/fqH+j0YP+IP6Pxr17xd64GjJ3gMJMWQuXbpUM2bM0KxZs7RhwwaNGDFCY8aM0fbt2+O9tbg777zztGvXriO3jRs3xntL6AD0wLFR/36g/ttHD6Q+6r991L8f6IH2JXMPJMSQOX/+fE2ePFlTpkxRYWGhFixYoPz8fFVUVMR7a3GXkZGhvn37Hrn16tUr3ltCB6AHjo369wP13z56IPVR/+2j/v1AD7QvmXsg7kNmY2OjqqqqVFpa2ub+0tJSvfHGG3HaVeLYunWr+vfvr4KCAl1//fXatm1bvLeEgNED7aP+Ux/1f3z0QGqj/o+P+k999MDxJXMPxH3IrK2tVXNzs/r06dPm/j59+qimpiZOu0oMl1xyiZ544gmtWrVKixcvVk1NjYYNG6Y9e/bEe2sIED1wbNS/H6j/9tEDqY/6bx/17wd6oH3J3gMZ8d5Aq7S0tDZ/jkQiR93nmzFjxhz534MGDdLQoUN1+umn6/HHH1dZWVkcd4aOQA+0Rf37hfo/Gj3gD+r/aNS/X+iBoyV7D8T9lcyePXsqPT39qN9W7N69+6jfaviue/fuGjRokLZu3RrvrSBA9EB0qP/URP1Hjx5IPdR/9Kj/1EQPRC/ZeiDuQ2ZWVpaKi4tVWVnZ5v7KykoNGzYsTrtKTOFwWB988IH69esX760gQPRAdKj/1ET9R48eSD3Uf/So/9RED0Qv6XogkgCee+65SGZmZuSxxx6LvP/++5EZM2ZEunfvHvn444/jvbW4uvPOOyOrV6+ObNu2LbJu3brI2LFjIzk5Od6fl1REDxyN+vcH9X9s9IAfqP9jo/79QQ8cW7L3QEL8m8xx48Zpz549mjNnjnbt2qWioiKtXLlSAwcOjPfW4mrHjh264YYbVFtbq169eunSSy/VunXrvD8vqYgeOBr17w/q/9joAT9Q/8dG/fuDHji2ZO+BtEgkEunMBVtaWrRz507l5OR4/w96k1UkEtHevXvVv39/dekS93dcJxXqP/lR/3bUf2qgB+zogeRH/dtR/8nPpf47/ZXMnTt3Kj8/v7OXRQeorq7WgAED4r2NpEL9pw7q3x31n1roAXf0QOqg/t1R/6kjmvrv9CEzJydH0uHN5ebmOmUnT55sWvPzzz835SQpI8N2ir7yla+YctOmTTPlXnrpJVNOkvM/rN67d6/OP//8I48lohdL/T/99NOmNT/77DNTTrLX/+jRo0056yem/fu//7spJ0kTJ050Or6hoUHl5eXUv0Es9X/jjTea1ty9e7cpJ0mZmZmm3DXXXGPKWf+Of/zjH005SXrllVecM+FwWBUVFfSAQSw9sGzZMtOav/nNb0w56fA+Lc444wxTrri42JSL5dM2b775Zqfj9+3bp1GjRlH/BrHUv9Xdd99tzlrnhwsvvNCU27lzpyl30UUXmXKSNGfOHKfjW1patH379qjqv9OHzNaXx3Nzc50LzHrBt/6gHMua2dnZppy16bp162bKxbImb3VwF0v9d+3a1bSmtRYle++ccMIJppy1jtPT0005yX5+qH93yfb8b81ae9X6XNy9e3dTTpJCoZA5Sw+4i6UHrM+P1t6R7M+tWVlZppy1d6zrSfbrFfXvLpb6t4qlNqxZ688V1ufjWGYA61u+o6l/03954cKFKigoUHZ2toqLi7V27VrLfwZIStQ/fEcPwGfUP3xHDyAazkPm0qVLNWPGDM2aNUsbNmzQiBEjNGbMGG3fvr0j9gckFOofvqMH4DPqH76jBxAt5yFz/vz5mjx5sqZMmaLCwkItWLBA+fn5qqio6Ij9AQmF+ofv6AH4jPqH7+gBRMtpyGxsbFRVVZVKS0vb3F9aWqo33njjmJlwOKz6+vo2NyAZUf/wnWsPUP9IJVwD4DuuAXDhNGTW1taqublZffr0aXN/nz59VFNTc8xMeXm58vLyjtz46GIkK+ofvnPtAeofqYRrAHzHNQAuTB/88/efKBSJRNr9lKGZM2eqrq7uyM36cdhAoqD+4btoe4D6RyriGgDfcQ1ANJw+n71nz55KT08/6rcVu3fvPuq3Gq1CoVBMH5EOJArqH75z7QHqH6mEawB8xzUALpxeyczKylJxcbEqKyvb3F9ZWalhw4YFujEg0VD/8B09AJ9R//AdPQAXzt80XVZWpokTJ6qkpERDhw7VokWLtH37dk2dOrUj9gckFOofvqMH4DPqH76jBxAt5yFz3Lhx2rNnj+bMmaNdu3apqKhIK1eu1MCBAztif0BCof7hO3oAPqP+4Tt6ANFKi0Qikc5csL6+Xnl5ebrnnns67X3asXxBbFNTkyn31FNPmXIvvPCCKTd69GhTTpLuvfdep+MbGxu1ZMkS1dXVKTc317yuj1rrv7KyUt27d++UNT/++GNztrGx0ZSbNGmSKdfQ0GDKde3a1ZSTpPXr1zsdv2/fPo0aNYr6N2it//vuu0/Z2dmdsubmzZvN2XA4bMo999xzptyaNWtMuQMHDphykvQf//EfzplDhw5p9erV9IBBaw90pksuucScbe/fmn6Z5cuXm3JXXHGFKderVy9TTpJuvfVWp+P379+vMWPGUP8GrfX/4IMPdto1wPpzvCT17dvXlJswYYIpN336dFMulr+j689Azc3Neuedd6Kqf9OnywIAAAAAcCwMmQAAAACAwDBkAgAAAAACw5AJAAAAAAgMQyYAAAAAIDAMmQAAAACAwDBkAgAAAAACw5AJAAAAAAgMQyYAAAAAIDAMmQAAAACAwDBkAgAAAAACw5AJAAAAAAgMQyYAAAAAIDAZ8Vo4EokoEok4ZfLy8kxrFRUVmXKSdOedd5pyDz/8sCmXkWF7SGbPnm3KSdKDDz7odHx9fb2WLFliXg+2+s/NzTWtNWjQIFNOsvfOX/7yF1PunHPOMeVWrlxpyknSxIkTnY5vbm42r4XDWlpa1NLS4pTp2bOnaa2hQ4eacpI0depUU27RokWmXK9evUy5tWvXmnKSNHnyZOfMgQMHtHr1avOakAoLC5Wenu6U6d69ewftpn3Lly835azXjp/+9Kem3N69e005SRoxYoTT8fX19ea1cFhWVpZCoZBTJh71P2HCBFOurKzMlMvPzzflPvvsM1NOkqZMmeJ0/MGDB6OejXglEwAAAAAQGIZMAAAAAEBgGDIBAAAAAIFhyAQAAAAABMZpyCwvL9eQIUOUk5Oj3r176+qrr9bmzZs7am9AQqH+4Tt6AD6j/uE7egAunIbM119/XdOmTdO6detUWVmpQ4cOqbS0VPv37++o/QEJg/qH7+gB+Iz6h+/oAbhw+r6MV155pc2flyxZot69e6uqqkqXX355oBsDEg31D9/RA/AZ9Q/f0QNwEdP3ZNbV1UmSTj755HaPCYfDCofDR/7M9wshVVD/8N2X9QD1j1TGNQC+4xqA4zF/8E8kElFZWZkuu+yy437pbnl5ufLy8o7crF80CiQS6h++i6YHqH+kKq4B8B3XAHwZ85B566236t1339Wzzz573ONmzpypurq6I7fq6mrrkkDCoP7hu2h6gPpHquIaAN9xDcCXMb1ddvr06VqxYoXWrFmjAQMGHPfYUCikUChk2hyQiKh/+C7aHqD+kYq4BsB3XAMQDachMxKJaPr06Vq2bJlWr16tgoKCjtoXkHCof/iOHoDPqH/4jh6AC6chc9q0aXrmmWe0fPly5eTkqKamRpKUl5enrl27dsgGgURB/cN39AB8Rv3Dd/QAXDj9m8yKigrV1dVp5MiR6tev35Hb0qVLO2p/QMKg/uE7egA+o/7hO3oALpzfLhuUWbNmKTc31ykzadIk01qXXXaZKSdJtbW1plzrxzq7evfdd025qVOnmnKStGjRIqfjGxoazGslsyDr/5JLLnGu/6eeesq01siRI005SbruuutMuTvvvNOU++lPf2rK/fGPfzTlJGno0KFOxzc2Nmrr1q3m9ZJZUD1w9913O9f/xIkTTWt97WtfM+Uk6bXXXjPlPvvsM1PupZdeMuV+8IMfmHKS9OijjzpnGhsbzeslsyCvAevWrXPugeHDh5vWOuOMM0w5yV5b5513nin3yCOPmHIlJSWmnCS99957Tsf7+jOQFFwPpKenKz093Smzc+dO01onnniiKSdJ1157rSmXl5dnylnr+KyzzjLlJOmZZ55xOr6pqSnqY82fLgsAAAAAwN9jyAQAAAAABIYhEwAAAAAQGIZMAAAAAEBgGDIBAAAAAIFhyAQAAAAABIYhEwAAAAAQGIZMAAAAAEBgGDIBAAAAAIFhyAQAAAAABIYhEwAAAAAQGIZMAAAAAEBgGDIBAAAAAIHJiNfC//Vf/6Xs7GynzIUXXmha65NPPjHlJOmRRx4x5QoLC025k046yZTLzMw05STp448/djq+sbHRvBYO27Jli0444QSnTElJiWmtHTt2mHKSdPvtt5ty1vq/9NJLTbkXXnjBlJOkW2+91en4AwcO6OmnnzavB+kXv/iFunbt6pQZMWKEaa1t27aZcpK0efNmU+7iiy825c444wxTLpa/Y0aG+48BLS0t5vVw2M0336ysrCynTF1dnWmthoYGU06SnnrqKVPuK1/5iik3YcIEU+7ll1825SQpNzfX6fjm5mbzWjhs//79zufx008/Na31l7/8xZSTpCuuuMKUe//99025v/3tb6bchg0bTDlJGjBggNPx4XA46mN5JRMAAAAAEBiGTAAAAABAYBgyAQAAAACBiWnILC8vV1pammbMmBHQdoDkQf3Dd/QAfEb9w2fUP76Mechcv369Fi1apPPPPz/I/QBJgfqH7+gB+Iz6h8+of0TDNGTu27dP48eP1+LFi82fhgokK+ofvqMH4DPqHz6j/hEt05A5bdo0XXXVVbryyiu/9NhwOKz6+vo2NyCZUf/wXbQ9QP0jFXENgM+of0TL+QuynnvuOb311ltav359VMeXl5frgQcecN4YkIiof/jOpQeof6QargHwGfUPF06vZFZXV+v222/XU089pezs7KgyM2fOVF1d3ZFbdXW1aaNAvFH/8J1rD1D/SCVcA+Az6h+unF7JrKqq0u7du1VcXHzkvubmZq1Zs0aPPvqowuGw0tPT22RCoZBCoVAwuwXiiPqH71x7gPpHKuEaAJ9R/3DlNGSOHj1aGzdubHPfTTfdpHPOOUf33HPPUcUFpBLqH76jB+Az6h8+o/7hymnIzMnJUVFRUZv7unfvrh49ehx1P5BqqH/4jh6Az6h/+Iz6hyvz92QCAAAAAPD3nD9d9u+tXr3alPvd736njAy35a+66irTWj169DDlJOm1114z5Xr27GnKTZ061ZR7+OGHTTlJ+uKLL5yOb2pqMq+Vaqz1//bbb6tbt25OmW9+85umtWLx05/+1JS77LLLTLlvfetbptyPfvQjU06S3nnnHafjDx48aF4rFVl64He/+50yMzOdMuPHj3deR5J69eplyknSkiVLTLnCwkJTbvDgwabcSy+9ZMpJ0meffeac4Rrw/1mvARdddJG6du3qlLn55ptNa8XikUceMeUmTJhgyv3P//yPKffP//zPppwk1dTUOB1/4MAB81qpxlr/VVVVztcA6/Nj3759TTlJOvnkk025/Px8U2758uWm3LXXXmvKSdK2bducjm9sbIz6WF7JBAAAAAAEhiETAAAAABAYhkwAAAAAQGAYMgEAAAAAgWHIBAAAAAAEhiETAAAAABAYhkwAAAAAQGAYMgEAAAAAgWHIBAAAAAAEhiETAAAAABAYhkwAAAAAQGAYMgEAAAAAgWHIBAAAAAAEJiNeC19yySXKzs52ynz66aemtTIy7H/N6dOnm3IDBgww5R5++GFTbubMmaacJN11111Ox4fDYfNaOOz000/XCSec4JRx7ZdWq1evNuUk6YUXXjDlvv71r5tyZWVlptwDDzxgyknStGnTnI7fv3+/eS0cVlJS4lzPGzduNK2VmZlpyknSv/7rv5py3bp1M+VeeuklU2706NGmnCTV1NQ4Z7gGxG7QoEHq3r27U6axsdG0ViyP15lnnmnKnXvuuaZcWlqaKXfZZZeZcpL06quvOh2fnp5uXguHnXvuuc7XgIMHD5rWmj17tiknSdddd50p16WL7TW8a6+91pQ7++yzTTlJ+uKLL5yOd3k+4ZVMAAAAAEBgGDIBAAAAAIFhyAQAAAAABMZ5yPzkk080YcIE9ejRQ926ddOFF16oqqqqjtgbkHCof/iOHoDPqH/4jh5AtJw+Eefzzz/X8OHDNWrUKL388svq3bu3PvroI5144okdtD0gcVD/8B09AJ9R//AdPQAXTkPmQw89pPz8fC1ZsuTIfaeddlrQewISEvUP39ED8Bn1D9/RA3Dh9HbZFStWqKSkRNddd5169+6twYMHa/HixcfNhMNh1dfXt7kByYj6h+9ce4D6RyrhGgDfcQ2AC6chc9u2baqoqNCZZ56pVatWaerUqbrtttv0xBNPtJspLy9XXl7ekVt+fn7MmwbigfqH71x7gPpHKuEaAN9xDYALpyGzpaVFF110kebNm6fBgwfrlltu0be//W1VVFS0m5k5c6bq6uqO3Kqrq2PeNBAP1D9859oD1D9SCdcA+I5rAFw4DZn9+vXTueee2+a+wsJCbd++vd1MKBRSbm5umxuQjKh/+M61B6h/pBKuAfAd1wC4cBoyhw8frs2bN7e5b8uWLRo4cGCgmwISEfUP39ED8Bn1D9/RA3DhNGTecccdWrdunebNm6cPP/xQzzzzjBYtWqRp06Z11P6AhEH9w3f0AHxG/cN39ABcOA2ZQ4YM0bJly/Tss8+qqKhIc+fO1YIFCzR+/PiO2h+QMKh/+I4egM+of/iOHoALp+/JlKSxY8dq7NixHbEXIOFR//AdPQCfUf/wHT2AaDkPmUHJyspSVlaWUyYtLc201saNG005Sfr88887NdenTx9T7q677jLlEB//9m//powMt/b70Y9+ZFrrvffeM+Uk6fbbbzflXnnlFVOuX79+phxv1UkutbW1CoVCThlr/V9zzTWmnCTt3r3blPv0009NOeu/a3r66adNOcRPv379lJOT45Q5/fTTTWv95Cc/MeUkafbs2abciy++aMq5npNWr776qimH+Fi9erXzz0D/+I//aFqrsbHRlJOkf/mXfzHlCgoKTLkvvvjClHvzzTdNuY7m9HZZAAAAAACOhyETAAAAABAYhkwAAAAAQGAYMgEAAAAAgWHIBAAAAAAEhiETAAAAABAYhkwAAAAAQGAYMgEAAAAAgWHIBAAAAAAEhiETAAAAABAYhkwAAAAAQGAYMgEAAAAAgcno7AUjkYgkqaGhwTnb1NRkWtOak6RwOGzKNTY2dup6nan179b6WCJ6refs0KFDztmDBw8GvZ0vdeDAAVPOutf9+/ebcp2p9ZxQ/+5az1lnPs/F4/nfumYyPP9L/3+f9IC71nO2b98+52x9fb1pTcvPW7Guab12dOmS+K99cA2wi+VnIGsdW2tYsu1TSu0ZwOX5Py3SyV2yY8cO5efnd+aS6CDV1dUaMGBAvLeRVKj/1EH9u6P+Uws94I4eSB3UvzvqP3VEU/+dPmS2tLRo586dysnJUVpaWpv/r76+Xvn5+aqurlZubm5nbiuhJdp5iUQi2rt3r/r3758Uv3VMJNS/TSKdG+rf7nj1LyXW45xIEu280AN2XANsEuncUP921L9NIp0bl/rv9LfLdunS5Usn39zc3LifxESUSOclLy8v3ltIStR/bBLl3FD/NtHUv5Q4j3OiSaTzQg/YcA2ITaKcG+rfhvqPTaKcm2jrn1/BAAAAAAACw5AJAAAAAAhMQg2ZoVBI999/v0KhULy3klA4L37gcW4f58YPPM7HxnnxA49z+zg3qY/HuH3Jem46/YN/AAAAAACpK6FeyQQAAAAAJDeGTAAAAABAYBgyAQAAAACBYcgEAAAAAASGIRMAAAAAEJiEGTIXLlyogoICZWdnq7i4WGvXro33luJu9uzZSktLa3Pr27dvvLeFDkIPtEX9+4X6Pxo94A/q/2jUv1/ogaMlew8kxJC5dOlSzZgxQ7NmzdKGDRs0YsQIjRkzRtu3b4/31uLuvPPO065du47cNm7cGO8toQPQA8dG/fuB+m8fPZD6qP/2Uf9+oAfal8w9kBBD5vz58zV58mRNmTJFhYWFWrBggfLz81VRURHvrcVdRkaG+vbte+TWq1eveG8JHYAeODbq3w/Uf/vogdRH/beP+vcDPdC+ZO6BuA+ZjY2NqqqqUmlpaZv7S0tL9cYbb8RpV4lj69at6t+/vwoKCnT99ddr27Zt8d4SAkYPtI/6T33U//HRA6mN+j8+6j/10QPHl8w9EPchs7a2Vs3NzerTp0+b+/v06aOampo47SoxXHLJJXriiSe0atUqLV68WDU1NRo2bJj27NkT760hQPTAsVH/fqD+20cPpD7qv33Uvx/ogfYlew9kxHsDrdLS0tr8ORKJHHWfb8aMGXPkfw8aNEhDhw7V6aefrscff1xlZWVx3Bk6Aj3QFvXvF+r/aPSAP6j/o1H/fqEHjpbsPRD3VzJ79uyp9PT0o35bsXv37qN+q+G77t27a9CgQdq6dWu8t4IA0QPRof5TE/UfPXog9VD/0aP+UxM9EL1k64G4D5lZWVkqLi5WZWVlm/srKys1bNiwOO0qMYXDYX3wwQfq169fvLeCANED0aH+UxP1Hz16IPVQ/9Gj/lMTPRC9pOuBSAJ47rnnIpmZmZHHHnss8v7770dmzJgR6d69e+Tjjz+O99bi6s4774ysXr06sm3btsi6desiY8eOjeTk5Hh/XlIRPXA06t8f1P+x0QN+oP6Pjfr3Bz1wbMneAwnxbzLHjRunPXv2aM6cOdq1a5eKioq0cuVKDRw4MN5bi6sdO3bohhtuUG1trXr16qVLL71U69at8/68pCJ64GjUvz+o/2OjB/xA/R8b9e8PeuDYkr0H0iKRSKQzF2xpadHOnTuVk5Pj/T/oTVaRSER79+5V//791aVL3N9xnVSo/+RH/dtR/6mBHrCjB5If9W9H/Sc/l/rv9Fcyd+7cqfz8/M5eFh2gurpaAwYMiPc2kgr1nzqof3fUf2qhB9zRA6mD+ndH/aeOaOq/04fMnJwcSYc3l5ub65QdP368ac3a2lpTTpIyMzNNuauuusqU27Fjhyn3+uuvm3KSNG/ePKfjDxw4oHHjxh15LBG91nNWXFys9PR0p2zfvn1Na7a0tJhy0uEvSbZ49dVXTbnvfve7ptyQIUNMOUn68MMPnY4Ph8P68Y9/TP0btJ6z0tJS5+fWcDhsWnP06NGmnCTzD0MfffSRKff++++bcqeddpopJ0mvvPKKc6a5uVnvvfcePWDQes5GjhypjAy3H8Gsn7QZyxvWrNeA1157zZSbOHGiKRfLB6H84he/cDq+ublZW7Zsof4NYpkBfvOb35jWXLBggSknyfnntFhzb775pilXWFhoyknSk08+6XT8vn37NHLkyKjqv9OHzNaXx3Nzc50LzDrwuT6RB5HNzs425UKhkClnLWjp8EciW/BWB3et5yw9Pd25tqz1H8uQ2cnvpjfXf7du3cxrWnuV+nfXes4yMzOd69lax9bHV7LXlXXNrKwsU87aN1Js1w56wF3rOcvIyHDuAWt9xPI8bs1a30ZqreVY+tzaA9S/u1hmAOvzcSzPcdYZIJY1O3u9E044wZSLpv5NzwILFy5UQUGBsrOzVVxcrLVr11r+M0BSov7hO3oAPqP+4Tt6ANFwHjKXLl2qGTNmaNasWdqwYYNGjBihMWPGaPv27R2xPyChUP/wHT0An1H/8B09gGg5D5nz58/X5MmTNWXKFBUWFmrBggXKz89XRUVFR+wPSCjUP3xHD8Bn1D98Rw8gWk5DZmNjo6qqqlRaWtrm/tLSUr3xxhvHzITDYdXX17e5AcmI+ofvXHuA+kcq4RoA33ENgAunIbO2tlbNzc1HfcJZnz59VFNTc8xMeXm58vLyjtz46GIkK+ofvnPtAeofqYRrAHzHNQAuTB/88/efKBSJRNr9lKGZM2eqrq7uyK26utqyJJAwqH/4LtoeoP6RirgGwHdcAxANp8/m7dmzp9LT04/6bcXu3bvb/f6mUCgU08erA4mC+ofvXHuA+kcq4RoA33ENgAunVzKzsrJUXFysysrKNvdXVlZq2LBhgW4MSDTUP3xHD8Bn1D98Rw/AhfO3jJaVlWnixIkqKSnR0KFDtWjRIm3fvl1Tp07tiP0BCYX6h+/oAfiM+ofv6AFEy3nIHDdunPbs2aM5c+Zo165dKioq0sqVKzVw4MCO2B+QUKh/+I4egM+of/iOHkC00iKRSKQzF6yvr1deXp7uvvvuTnuf9ocffmjONjY2mnLPP/+8Kbdy5UpT7mc/+5kpJ0nLli0z5erq6pSbm2te10et9T9+/HhlZWV1ypp5eXnm7DvvvGPKzZw505T7/ve/b8rdc889ppwkzZo1y+n45uZmbd26lfo3aK3/Bx98UNnZ2Z2y5t69e83ZnJwcU669r7P4Mtddd50p9/Of/9yUk6QNGzY4Z1paWrR79256wKC1B2655ZZO+xkolsdo06ZNptxtt91myj344IOm3JVXXmnKSdLjjz/udHxzc7M++OAD6t+gtf779++vLl1Mnz3qbMiQIeas9eenJUuWmHL333+/KbdlyxZTTpKuuuoqp+MPHjyom2++Oar675xHGAAAAADgBYZMAAAAAEBgGDIBAAAAAIFhyAQAAAAABIYhEwAAAAAQGIZMAAAAAEBgGDIBAAAAAIFhyAQAAAAABIYhEwAAAAAQGIZMAAAAAEBgGDIBAAAAAIFhyAQAAAAABIYhEwAAAAAQmIx4LRyJRBSJRJwyJ510kmmtkpISU06SysrKTLlHHnnElPvf//1fU65///6mnCQ99thjTscfPHhQt956q3k9SDt27FBGhlv7/eUvf+mg3bRvx44dptyQIUNMuQ0bNphyL774oiknSQMGDHA6/tChQ9q6dat5PUi5ubnq2rWrU6Zbt26mtfr162fKSdI3vvENU65Pnz6m3Pvvv2/KnXLKKaacJFVXVztnmpubtXv3bvOakHbt2qXMzEynTDyed9555x1T7qtf/aop9+qrr5pyZ511liknSePGjXM6vqGhQT/84Q/N60G66KKLnOv/7LPP7qDdtK+8vNyUs147TjjhBFNu0qRJppx0+LnIRVNTU9TH8komAAAAACAwDJkAAAAAgMAwZAIAAAAAAsOQCQAAAAAIjNOQWV5eriFDhignJ0e9e/fW1Vdfrc2bN3fU3oCEQv3Dd/QAfEb9w3f0AFw4DZmvv/66pk2bpnXr1qmyslKHDh1SaWmp9u/f31H7AxIG9Q/f0QPwGfUP39EDcOH0HQqvvPJKmz8vWbJEvXv3VlVVlS6//PJjZsLhsMLh8JE/19fXG7YJxB/1D9+59gD1j1TCNQC+4xoAFzH9m8y6ujpJ0sknn9zuMeXl5crLyztyy8/Pj2VJIGFQ//Ddl/UA9Y9UxjUAvuMagOMxD5mRSERlZWW67LLLVFRU1O5xM2fOVF1d3ZGb5YufgURD/cN30fQA9Y9UxTUAvuMagC/j9HbZ/+vWW2/Vu+++qz/84Q/HPS4UCikUClmXARIS9Q/fRdMD1D9SFdcA+I5rAL6MacicPn26VqxYoTVr1mjAgAFB7wlIaNQ/fEcPwGfUP3xHDyAaTkNmJBLR9OnTtWzZMq1evVoFBQUdtS8g4VD/8B09AJ9R//AdPQAXTkPmtGnT9Mwzz2j58uXKyclRTU2NJCkvL09du3btkA0CiYL6h+/oAfiM+ofv6AG4cPrgn4qKCtXV1WnkyJHq16/fkdvSpUs7an9AwqD+4Tt6AD6j/uE7egAunN8uG5Tvf//7ys3Ndcpcf/31prX+6Z/+yZSTpLFjx5pyJSUlptxbb71lyr3++uumnCStX7/e6fjm5mbzWsksyPpfsWKFc/0f7xMMj+f999835STpmmuuMeX69etnyp1yyimmnPXcSIc//c7Fvn37VFxcbF4vmQXVA+PHj3eu/6eeesq0VixfEn7eeeeZcvPmzTPlMjJsn8V3ww03mHKS9Kc//ck5E+RzYTIJ8u/95JNPOvfAiBEjTGtt2bLFlJOkSZMmmXK9e/c25SZOnGjKFRYWmnKSnJ/P9+3bpx/+8Ifm9ZJZUD2QkZHh/Hz361//2rRWLAPwD37wA1MuPT3dvKbFu+++a8526eL2RSMNDQ3R/7ddNwMAAAAAQHsYMgEAAAAAgWHIBAAAAAAEhiETAAAAABAYhkwAAAAAQGAYMgEAAAAAgWHIBAAAAAAEhiETAAAAABAYhkwAAAAAQGAYMgEAAAAAgWHIBAAAAAAEhiETAAAAABAYhkwAAAAAQGAy4rVwRUWFsrOznTJDhw41rfXxxx+bcpJ04YUXmnKDBg0y5Wpqaky5oqIiU06Szj//fKfjGxsbVVVVZV4PUl5ennOmpKTEtNbYsWNNOUlKS0sz5Zqamky5MWPGmHK1tbWmnCTNnTvX6fjGxkbzWjhswYIFzs//p556qmmtvn37mnKSVFZWZsrt37/flGtoaDDl7rjjDlNOki644ALnTGNjozZt2mReE9I//MM/qEsXt9/zDx482LRWaWmpKSfJeY+twuGwKVdQUGDKbdiwwZSTpI0bNzodzzUgdrt27VJGhtsIYnmukqSHHnrIlJMO79PimmuuMeXmzJljyvXv39+Uk6QpU6Y4Hb93717dd999UR3LK5kAAAAAgMAwZAIAAAAAAsOQCQAAAAAITExDZnl5udLS0jRjxoyAtgMkD+ofvqMH4DPqHz6j/vFlzEPm+vXrtWjRIucPjQFSAfUP39ED8Bn1D59R/4iGacjct2+fxo8fr8WLF+ukk04Kek9AQqP+4Tt6AD6j/uEz6h/RMg2Z06ZN01VXXaUrr7zyS48Nh8Oqr69vcwOSGfUP30XbA9Q/UhHXAPiM+ke0nL8n87nnntNbb72l9evXR3V8eXm5HnjgAeeNAYmI+ofvXHqA+keq4RoAn1H/cOH0SmZ1dbVuv/12PfXUU1F/kfbMmTNVV1d35FZdXW3aKBBv1D9859oD1D9SCdcA+Iz6hyunVzKrqqq0e/duFRcXH7mvublZa9as0aOPPqpwOKz09PQ2mVAopFAoFMxugTii/uE71x6g/pFKuAbAZ9Q/XDkNmaNHj9bGjRvb3HfTTTfpnHPO0T333HNUcQGphPqH7+gB+Iz6h8+of7hyGjJzcnJUVFTU5r7u3burR48eR90PpBrqH76jB+Az6h8+o/7hyvw9mQAAAAAA/L20SCQS6cwF6+vrlZeXp0suuUQZGW4fbjty5MiO2dRxvPXWW6bcqaeeasqdccYZptyhQ4dMOUn605/+5HR8U1OTXnrpJdXV1Sk3N9e8ro9a6//KK690rv/KysoO2lX7xo4da8pdccUVptxf//pXU66lpcWUk6RJkyY5Hb9v3z595Stfof4NWut/8ODBzm+t+trXvtZBu2qfa4+2amhoMOXmzp1rys2ePduUk6QBAwY4Zw4ePKjp06fTAwatPTBmzBhlZmY6ZVesWNFBu2rfxIkTTblTTjnFlPvkk09MOUsdt7rhhhucjt+3b5+GDRtG/Ru01v+cOXOi/vCgVh988EEH7Sp4y5cvN+XuvvtuU2748OGmnCTV1NQ4HX/gwAFNmjQpqvrnlUwAAAAAQGAYMgEAAAAAgWHIBAAAAAAEhiETAAAAABAYhkwAAAAAQGAYMgEAAAAAgWHIBAAAAAAEhiETAAAAABAYhkwAAAAAQGAYMgEAAAAAgWHIBAAAAAAEhiETAAAAABAYhkwAAAAAQGAy4rXw6NGjlZ2d7ZQ5ePCgaa3MzExTTpL27Nljyv3tb38z5U477TRT7t577zXlJKmsrMzp+HA4bF4LhxUUFCgrK8spM3DgQNNa1pxk77m3337blMvJyTHlhg4daspJ0sKFC52Ob2xsNK+Fw0aNGqVQKOSU+eKLL0xruV5n/q///u//NuVeeOEFUy49Pd2Uu/zyy005SfrVr37lnKEHYnf22Wc798BZZ51lWmvAgAGmnCRt2rTJlPvoo49MuTPOOMOUq66uNuUk6ec//7nT8fwMFLuSkhJ1797dKfPuu++a1tqxY4cpJ0nDhw835T755BNT7s9//rMpZ70+SlJeXp7T8RkZ0Y+OvJIJAAAAAAgMQyYAAAAAIDAMmQAAAACAwDgPmZ988okmTJigHj16qFu3brrwwgtVVVXVEXsDEg71D9/RA/AZ9Q/f0QOIltMH/3z++ecaPny4Ro0apZdfflm9e/fWRx99pBNPPLGDtgckDuofvqMH4DPqH76jB+DCach86KGHlJ+fryVLlhy5z/ppqECyof7hO3oAPqP+4Tt6AC6c3i67YsUKlZSU6LrrrlPv3r01ePBgLV68+LiZcDis+vr6NjcgGVH/8J1rD1D/SCVcA+A7rgFw4TRkbtu2TRUVFTrzzDO1atUqTZ06VbfddpueeOKJdjPl5eXKy8s7csvPz49500A8UP/wnWsPUP9IJVwD4DuuAXDhNGS2tLTooosu0rx58zR48GDdcsst+va3v62Kiop2MzNnzlRdXd2RWyxfmAvEE/UP37n2APWPVMI1AL7jGgAXTkNmv379dO6557a5r7CwUNu3b283EwqFlJub2+YGJCPqH75z7QHqH6mEawB8xzUALpyGzOHDh2vz5s1t7tuyZYsGDhwY6KaARET9w3f0AHxG/cN39ABcOA2Zd9xxh9atW6d58+bpww8/1DPPPKNFixZp2rRpHbU/IGFQ//AdPQCfUf/wHT0AF05D5pAhQ7Rs2TI9++yzKioq0ty5c7VgwQKNHz++o/YHJAzqH76jB+Az6h++owfgwul7MiVp7NixGjt2bEfsBUh41D98Rw/AZ9Q/fEcPIFrOQ2ZQtmzZoszMTKfMm2++aVqrqKjIlJOk4uJiU662ttaU2717tylXVlZmyiE+LrjgAnXt2tUpc99995nW6tWrlyknSffee68pN2fOHFNuypQpptxvf/tbUw7x8eijjyotLc0p09DQYFrrq1/9qikn2Z//77zzTlPulFNOMeV+9atfmXKIn8LCQudrQI8ePUxr/ehHPzLlJOn222835a6++mpT7o477jDlevbsacohPl566SWFQiGnzJNPPtlBu2lfY2OjKffSSy+Zcjk5Oaac67nsLE5vlwUAAAAA4HgYMgEAAAAAgWHIBAAAAAAEhiETAAAAABAYhkwAAAAAQGAYMgEAAAAAgWHIBAAAAAAEhiETAAAAABAYhkwAAAAAQGAYMgEAAAAAgWHIBAAAAAAEhiETAAAAABCYjM5eMBKJSJKampqcsy0tLaY1LWu1amxs7NQ1w+GwKdeZWs9J62OJ6LWes4MHDzpnrfXf3NxsyknSgQMHTLn6+npTztpvnYn6t2s9Z5ZzZ62pQ4cOmXJS5z//J0P9S/RALGK5Blifj2Ppgc6+BvAzUGprPWeW5zprTcXC+pxs7ZsuXRL/tb/Wv1s09Z8W6eQu2bFjh/Lz8ztzSXSQ6upqDRgwIN7bSCrUf+qg/t1R/6mFHnBHD6QO6t8d9Z86oqn/Th8yW1patHPnTuXk5CgtLa3N/1dfX6/8/HxVV1crNze3M7eV0BLtvEQiEe3du1f9+/dPit+6JBLq3yaRzg31b3e8+pcS63FOJIl2XugBO64BNol0bqh/O+rfJpHOjUv9d/rbZbt06fKlk29ubm7cT2IiSqTzkpeXF+8tJCXqPzaJcm6of5to6l9KnMc50STSeaEHbLgGxCZRzg31b0P9xyZRzk209c+vYAAAAAAAgWHIBAAAAAAEJqGGzFAopPvvv1+hUCjeW0konBc/8Di3j3PjBx7nY+O8+IHHuX2cm9THY9y+ZD03nf7BPwAAAACA1JVQr2QCAAAAAJIbQyYAAAAAIDAMmQAAAACAwDBkAgAAAAACw5AJAAAAAAhMwgyZCxcuVEFBgbKzs1VcXKy1a9fGe0txN3v2bKWlpbW59e3bN97bQgehB9qi/v1C/R+NHvAH9X806t8v9MDRkr0HEmLIXLp0qWbMmKFZs2Zpw4YNGjFihMaMGaPt27fHe2txd95552nXrl1Hbhs3boz3ltAB6IFjo/79QP23jx5IfdR/+6h/P9AD7UvmHkiIIXP+/PmaPHmypkyZosLCQi1YsED5+fmqqKiI99biLiMjQ3379j1y69WrV7y3hA5ADxwb9e8H6r999EDqo/7bR/37gR5oXzL3QNyHzMbGRlVVVam0tLTN/aWlpXrjjTfitKvEsXXrVvXv318FBQW6/vrrtW3btnhvCQGjB9pH/ac+6v/46IHURv0fH/Wf+uiB40vmHoj7kFlbW6vm5mb16dOnzf19+vRRTU1NnHaVGC655BI98cQTWrVqlRYvXqyamhoNGzZMe/bsiffWECB64Niofz9Q/+2jB1If9d8+6t8P9ED7kr0HMuK9gVZpaWlt/hyJRI66zzdjxow58r8HDRqkoUOH6vTTT9fjjz+usrKyOO4MHYEeaIv69wv1fzR6wB/U/9Gof7/QA0dL9h6I+yuZPXv2VHp6+lG/rdi9e/dRv9XwXffu3TVo0CBt3bo13ltBgOiB6FD/qYn6jx49kHqo/+hR/6mJHohesvVA3IfMrKwsFRcXq7Kyss39lZWVGjZsWJx2lZjC4bA++OAD9evXL95bQYDogehQ/6mJ+o8ePZB6qP/oUf+piR6IXtL1QCQBPPfcc5HMzMzIY489Fnn//fcjM2bMiHTv3j3y8ccfx3trcXXnnXdGVq9eHdm2bVtk3bp1kbFjx0ZycnK8Py+piB44GvXvD+r/2OgBP1D/x0b9+4MeOLZk74GE+DeZ48aN0549ezRnzhzt2rVLRUVFWrlypQYOHBjvrcXVjh07dMMNN6i2tla9evXSpZdeqnXr1nl/XlIRPXA06t8f1P+x0QN+oP6Pjfr3Bz1wbMneA2mRSCQS700AAAAAAFJD3P9NJgAAAAAgdTBkAgAAAAACw5AJAAAAAAgMQyYAAAAAIDAMmQAAAACAwDBkAgAAAAACw5AJAAAAAAgMQyYAAAAAIDAMmQAAAACAwDBkAgAAAAACw5AJAAAAAAjM/wNLGjOwcsJ7lwAAAABJRU5ErkJggg=="
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": "(357699, 64)"
},
"execution_count": 114,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"with open(os.path.join('data', 'images.npy'), 'rb') as f:\n",
" images = np.load(f)\n",
" \n",
"print('Shape:', images.shape)\n",
"show_images(images[:18], n_row=3, n_col=5, figsize=[12,5])\n",
"images.reshape(images.shape[0], -1).shape"
]
},
{
"cell_type": "markdown",
"id": "cbe832b6",
"metadata": {},
"source": [
"## Data Exploration & Preparation"
]
},
{
"cell_type": "markdown",
"id": "2f6a464c",
"metadata": {},
"source": [
"### 1. Descriptive Analysis"
]
},
{
"cell_type": "code",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Columns with high NaNs: Index(['V15', 'V38', 'V39', '0', '19', '24'], dtype='object')\n",
"Columns with high zeros Index(['55', '61', '62', '63'], dtype='object')\n",
"['42', '60', '35', 'V20', 'V35', '5', '0', '57', '63', '2', '9', '12', '40', 'V41', '53', '33', '20', '29', '50', '58', 'V40', '28', 'V39', '52', '37', '54', 'V52', '61', '56', '39', '21', 'V45', 'V7', '47', '51', '55', 'V26', '16', '41', 'V15', '19', '62', '14', '25', '49', '22', '7', '4', '18', '36', '13', 'V48', 'V49', '23', '1', '48', 'V44', '6', 'V11', 'V53', '32', 'V38', '43', '11', '44', 'V10', '3', 'V2', '15', '38', '30', '45', '26', '24', 'V43', '34', '46', 'V42']\n"
]
}
],
"source": [
"Y = df['target']\n",
"X = df.drop('target', axis=1)\n",
"\n",
"def nan_columns(X, threshold=0.5):\n",
" count = X.shape[0] * threshold\n",
" nan_columns = X.isna().sum()\n",
" return nan_columns[nan_columns >= count].index\n",
"def zero_columns(X, threshold=0.5):\n",
" count = X.shape[0] * threshold\n",
" zero_cols = (X == 0).sum()\n",
" return zero_cols[zero_cols >= count].index\n",
"\n",
"def object_columns(X):\n",
" return X.dtypes[X.dtypes == 'object'].index\n",
"\n",
"def convert_to_ordinal(X, columns):\n",
" encoder = OrdinalEncoder()\n",
" return encoder.fit_transform(X[columns])\n",
"\n",
"def correlated_columns(X, threshold=0.99):\n",
" corr = X.corr()\n",
" upper = corr.where(np.triu(np.ones(corr.shape), k=1).astype(bool))\n",
" return [column for column in upper.columns if any(upper[column] > threshold)]\n",
"\n",
"# Identify columns with High Nans\n",
"nan_columns = nan_columns(X, 0.5)\n",
"print('Columns with high NaNs:', nan_columns)\n",
"zero_cols = zero_columns(X, 0.9)\n",
"print('Columns with high zeros', zero_cols)\n",
"object_columns = object_columns(X)\n",
"ordinal_columns = convert_to_ordinal(X, object_columns)\n",
"X[object_columns] = ordinal_columns\n",
"\n",
"correlated_cols = correlated_columns(X, 0.95)\n",
"\n",
"columns_to_drop = list(set(nan_columns) | set(zero_cols) | set(correlated_cols))\n",
"print(columns_to_drop)\n"
],
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-27T07:50:40.607409Z",
"start_time": "2024-04-27T07:50:30.338617Z"
}
},
"id": "3b1f62dd",
"execution_count": 121
},
{
"cell_type": "code",
"outputs": [
{
"data": {
"text/plain": " V0 V1 V3 V4 V5 V6 V8 V9 \\\n0 8315.0 1784.0 37115.0 317.0 105.016815 296559.0 2470.0 1.0 \n1 8315.0 1272.0 18683.0 230.0 NaN 340059.0 2820.0 0.0 \n2 8315.0 3832.0 147707.0 607.0 105.018240 279159.0 2330.0 1.0 \n3 8315.0 2296.0 55547.0 404.0 NaN 313959.0 2610.0 1.0 \n4 11021.0 1784.0 37115.0 375.0 105.024985 232701.0 1490.0 0.0 \n... ... ... ... ... ... ... ... ... \n357694 8315.0 1272.0 18683.0 230.0 105.012445 270459.0 2260.0 0.0 \n357695 8315.0 2296.0 55547.0 404.0 NaN 244359.0 2050.0 0.0 \n357696 8315.0 1784.0 37115.0 375.0 NaN 348759.0 2890.0 0.0 \n357697 8315.0 1784.0 37115.0 375.0 105.016815 348759.0 2890.0 0.0 \n357698 8315.0 1784.0 37115.0 317.0 NaN 244359.0 2050.0 0.0 \n\n V12 V13 ... V56 V57 V58 V59 8 10 \\\n0 85.0 737.0 ... 1089 293 2.0 7428.249334 0.249110 0.283362 \n1 42.0 585.0 ... 9801 1085 7.0 9693.829502 -1.144696 -1.343454 \n2 335.0 1041.0 ... 1485 304 6.0 7609.258214 0.129641 -0.258910 \n3 113.0 889.0 ... -495 711 4.0 4258.532609 0.726987 0.283362 \n4 186.0 737.0 ... 1683 117 0.0 9492.484802 0.249110 0.283362 \n... ... ... ... ... ... ... ... ... ... \n357694 4.0 585.0 ... 6336 1855 2.0 4634.276235 -4.290717 -2.427998 \n357695 110.0 889.0 ... 2970 854 8.0 8379.073980 0.129641 -0.258910 \n357696 163.0 737.0 ... -4257 942 8.0 5359.986193 0.408403 0.283362 \n357697 147.0 737.0 ... 2376 1195 7.0 9095.239127 0.726987 0.283362 \n357698 46.0 737.0 ... 9108 502 3.0 9379.720939 0.129641 -0.258910 \n\n 17 27 31 59 \n0 -1.523953 -0.689523 -0.637881 1.465378 \n1 -0.425715 -1.246596 -1.090949 -0.852887 \n2 0.306444 1.538767 1.627457 -0.080132 \n3 0.672524 -0.132450 -0.184813 -0.080132 \n4 -0.425715 -0.410987 -0.637881 1.465378 \n... ... ... ... ... \n357694 -4.086510 -1.246596 -1.090949 2.238133 \n357695 0.306444 -0.132450 -0.184813 -0.080132 \n357696 0.672524 -0.410987 -0.637881 -0.080132 \n357697 0.672524 -0.410987 -0.637881 -0.080132 \n357698 0.306444 -0.689523 -0.637881 -0.080132 \n\n[357699 rows x 46 columns]",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>V0</th>\n <th>V1</th>\n <th>V3</th>\n <th>V4</th>\n <th>V5</th>\n <th>V6</th>\n <th>V8</th>\n <th>V9</th>\n <th>V12</th>\n <th>V13</th>\n <th>...</th>\n <th>V56</th>\n <th>V57</th>\n <th>V58</th>\n <th>V59</th>\n <th>8</th>\n <th>10</th>\n <th>17</th>\n <th>27</th>\n <th>31</th>\n <th>59</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>8315.0</td>\n <td>1784.0</td>\n <td>37115.0</td>\n <td>317.0</td>\n <td>105.016815</td>\n <td>296559.0</td>\n <td>2470.0</td>\n <td>1.0</td>\n <td>85.0</td>\n <td>737.0</td>\n <td>...</td>\n <td>1089</td>\n <td>293</td>\n <td>2.0</td>\n <td>7428.249334</td>\n <td>0.249110</td>\n <td>0.283362</td>\n <td>-1.523953</td>\n <td>-0.689523</td>\n <td>-0.637881</td>\n <td>1.465378</td>\n </tr>\n <tr>\n <th>1</th>\n <td>8315.0</td>\n <td>1272.0</td>\n <td>18683.0</td>\n <td>230.0</td>\n <td>NaN</td>\n <td>340059.0</td>\n <td>2820.0</td>\n <td>0.0</td>\n <td>42.0</td>\n <td>585.0</td>\n <td>...</td>\n <td>9801</td>\n <td>1085</td>\n <td>7.0</td>\n <td>9693.829502</td>\n <td>-1.144696</td>\n <td>-1.343454</td>\n <td>-0.425715</td>\n <td>-1.246596</td>\n <td>-1.090949</td>\n <td>-0.852887</td>\n </tr>\n <tr>\n <th>2</th>\n <td>8315.0</td>\n <td>3832.0</td>\n <td>147707.0</td>\n <td>607.0</td>\n <td>105.018240</td>\n <td>279159.0</td>\n <td>2330.0</td>\n <td>1.0</td>\n <td>335.0</td>\n <td>1041.0</td>\n <td>...</td>\n <td>1485</td>\n <td>304</td>\n <td>6.0</td>\n <td>7609.258214</td>\n <td>0.129641</td>\n <td>-0.258910</td>\n <td>0.306444</td>\n <td>1.538767</td>\n <td>1.627457</td>\n <td>-0.080132</td>\n </tr>\n <tr>\n <th>3</th>\n <td>8315.0</td>\n <td>2296.0</td>\n <td>55547.0</td>\n <td>404.0</td>\n <td>NaN</td>\n <td>313959.0</td>\n <td>2610.0</td>\n <td>1.0</td>\n <td>113.0</td>\n <td>889.0</td>\n <td>...</td>\n <td>-495</td>\n <td>711</td>\n <td>4.0</td>\n <td>4258.532609</td>\n <td>0.726987</td>\n <td>0.283362</td>\n <td>0.672524</td>\n <td>-0.132450</td>\n <td>-0.184813</td>\n <td>-0.080132</td>\n </tr>\n <tr>\n <th>4</th>\n <td>11021.0</td>\n <td>1784.0</td>\n <td>37115.0</td>\n <td>375.0</td>\n <td>105.024985</td>\n <td>232701.0</td>\n <td>1490.0</td>\n <td>0.0</td>\n <td>186.0</td>\n <td>737.0</td>\n <td>...</td>\n <td>1683</td>\n <td>117</td>\n <td>0.0</td>\n <td>9492.484802</td>\n <td>0.249110</td>\n <td>0.283362</td>\n <td>-0.425715</td>\n <td>-0.410987</td>\n <td>-0.637881</td>\n <td>1.465378</td>\n </tr>\n <tr>\n <th>...</th>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n </tr>\n <tr>\n <th>357694</th>\n <td>8315.0</td>\n <td>1272.0</td>\n <td>18683.0</td>\n <td>230.0</td>\n <td>105.012445</td>\n <td>270459.0</td>\n <td>2260.0</td>\n <td>0.0</td>\n <td>4.0</td>\n <td>585.0</td>\n <td>...</td>\n <td>6336</td>\n <td>1855</td>\n <td>2.0</td>\n <td>4634.276235</td>\n <td>-4.290717</td>\n <td>-2.427998</td>\n <td>-4.086510</td>\n <td>-1.246596</td>\n <td>-1.090949</td>\n <td>2.238133</td>\n </tr>\n <tr>\n <th>357695</th>\n <td>8315.0</td>\n <td>2296.0</td>\n <td>55547.0</td>\n <td>404.0</td>\n <td>NaN</td>\n <td>244359.0</td>\n <td>2050.0</td>\n <td>0.0</td>\n <td>110.0</td>\n <td>889.0</td>\n <td>...</td>\n <td>2970</td>\n <td>854</td>\n <td>8.0</td>\n <td>8379.073980</td>\n <td>0.129641</td>\n <td>-0.258910</td>\n <td>0.306444</td>\n <td>-0.132450</td>\n <td>-0.184813</td>\n <td>-0.080132</td>\n </tr>\n <tr>\n <th>357696</th>\n <td>8315.0</td>\n <td>1784.0</td>\n <td>37115.0</td>\n <td>375.0</td>\n <td>NaN</td>\n <td>348759.0</td>\n <td>2890.0</td>\n <td>0.0</td>\n <td>163.0</td>\n <td>737.0</td>\n <td>...</td>\n <td>-4257</td>\n <td>942</td>\n <td>8.0</td>\n <td>5359.986193</td>\n <td>0.408403</td>\n <td>0.283362</td>\n <td>0.672524</td>\n <td>-0.410987</td>\n <td>-0.637881</td>\n <td>-0.080132</td>\n </tr>\n <tr>\n <th>357697</th>\n <td>8315.0</td>\n <td>1784.0</td>\n <td>37115.0</td>\n <td>375.0</td>\n <td>105.016815</td>\n <td>348759.0</td>\n <td>2890.0</td>\n <td>0.0</td>\n <td>147.0</td>\n <td>737.0</td>\n <td>...</td>\n <td>2376</td>\n <td>1195</td>\n <td>7.0</td>\n <td>9095.239127</td>\n <td>0.726987</td>\n <td>0.283362</td>\n <td>0.672524</td>\n <td>-0.410987</td>\n <td>-0.637881</td>\n <td>-0.080132</td>\n </tr>\n <tr>\n <th>357698</th>\n <td>8315.0</td>\n <td>1784.0</td>\n <td>37115.0</td>\n <td>317.0</td>\n <td>NaN</td>\n <td>244359.0</td>\n <td>2050.0</td>\n <td>0.0</td>\n <td>46.0</td>\n <td>737.0</td>\n <td>...</td>\n <td>9108</td>\n <td>502</td>\n <td>3.0</td>\n <td>9379.720939</td>\n <td>0.129641</td>\n <td>-0.258910</td>\n <td>0.306444</td>\n <td>-0.689523</td>\n <td>-0.637881</td>\n <td>-0.080132</td>\n </tr>\n </tbody>\n</table>\n<p>357699 rows × 46 columns</p>\n</div>"
},
"execution_count": 122,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_dropped = X.drop(columns_to_drop, axis=1)\n",
"X_dropped"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2024-04-27T07:50:42.584344Z",
"start_time": "2024-04-27T07:50:42.498150Z"
}
},
"id": "b8383cb1d724181c",
"execution_count": 122
},
{
"cell_type": "code",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(357699, 46)\n"
]
}
],
"source": [
"print(X_dropped.shape)"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2024-04-27T07:54:05.509713Z",
"start_time": "2024-04-27T07:54:05.505067Z"
}
},
"id": "c64798f73ec3412f",
"execution_count": 134
},
{
"cell_type": "markdown",
"source": [
"### 2. Detection and Handling of Missing Values"
],
"metadata": {},
"id": "adb61967"
},
{
"cell_type": "code",
"execution_count": 135,
"id": "4bb9cdfb",
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-27T07:54:06.587195Z",
"start_time": "2024-04-27T07:54:06.478662Z"
}
},
"outputs": [],
"source": [
"# For the columns with nan's that are not the object columns, fill them with mean\n",
"# For the object columns, fill them with the mode\n",
"X_missing = X_dropped.fillna(X_dropped.mean())\n",
"# TODO: Replace with mode for object columns"
]
},
{
"cell_type": "markdown",
"id": "8adcb9cd",
"metadata": {},
"source": [
"### 3. Detection and Handling of Outliers"
]
},
{
"cell_type": "code",
"outputs": [],
"source": [
"# Time to do PCA\n",
"from sklearn.decomposition import PCA\n",
"pca = PCA(n_components=30)\n",
"X_pca = pca.fit_transform(X_missing)\n",
"# plt.scatter(X_pca[:, 0], X_pca[:, 1], c=Y)\n",
"# plt.colorbar()\n",
"# plt.show()"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2024-04-27T07:59:38.839920Z",
"start_time": "2024-04-27T07:59:34.454737Z"
}
},
"id": "878c95195942e270",
"execution_count": 151
},
{
"cell_type": "code",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.9999008890228839\n",
"2\n"
]
}
],
"source": [
"res = 0\n",
"variance = pca.explained_variance_ratio_\n",
"for i in range(len(variance)):\n",
" if np.sum(variance[0:i]) >= 0.99:\n",
" res = i\n",
" break\n",
"print(np.sum(variance[:res]))\n",
"print(res)\n"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2024-04-27T07:59:42.918071Z",
"start_time": "2024-04-27T07:59:42.915297Z"
}
},
"id": "724586267e51a3c5",
"execution_count": 155
},
{
"cell_type": "markdown",
"id": "d4916043",
"metadata": {},
"source": [
"### 4. Detection and Handling of Class Imbalance"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ad3ab20e",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "2552a795",
"metadata": {},
"source": [
"### 5. Understanding Relationship Between Variables"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "29ddbbcf",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "757fb315",
"metadata": {},
"source": [
"### 6. Data Visualization"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "93f82e42",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "2a7eebcf",
"metadata": {},
"source": [
"## Data Preprocessing"
]
},
{
"cell_type": "markdown",
"id": "ae3e3383",
"metadata": {},
"source": [
"### 7. General Preprocessing"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "19174365",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "fb3aa527",
"metadata": {},
"source": [
"### 8. Feature Selection"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a85808bf",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "4921e8ca",
"metadata": {},
"source": [
"### 9. Feature Engineering"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dbcde626",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "fa676c3f",
"metadata": {},
"source": [
"## Modeling & Evaluation"
]
},
{
"cell_type": "markdown",
"id": "589b37e4",
"metadata": {},
"source": [
"### 10. Creating models"
]
},
{
"cell_type": "code",
"execution_count": 158,
"id": "d8dffd7d",
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-27T08:00:10.769193Z",
"start_time": "2024-04-27T08:00:10.657136Z"
}
},
"outputs": [],
"source": [
"# Split the data into train and test\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.metrics import mean_squared_error\n",
"X_train, X_test, y_train, y_test = train_test_split(X_missing, Y, test_size=0.2, random_state=42)"
]
},
{
"cell_type": "code",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"MSE: 5311.417393315556\n"
]
}
],
"source": [
"# Linear Regression\n",
"# # Train the model\n",
"model = LinearRegression()\n",
"model.fit(X_train, y_train)\n",
"# # Predict\n",
"y_pred = model.predict(X_test)\n",
"# # Evaluate\n",
"mse = mean_squared_error(y_test, y_pred)\n",
"print('MSE:', mse)\n"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2024-04-27T08:00:12.375826Z",
"start_time": "2024-04-27T08:00:11.942486Z"
}
},
"id": "9864de4426d22d9b",
"execution_count": 159
},
{
"cell_type": "code",
"outputs": [],
"source": [],
"metadata": {
"collapsed": false
},
"id": "5381f534af74f626"
},
{
"cell_type": "markdown",
"id": "495bf3c0",
"metadata": {},
"source": [
"### 11. Model Evaluation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9245ab47",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "8aa31404",
"metadata": {},
"source": [
"### 12. Hyperparameters Search"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "81addd51",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}