#set page(paper: "a4", flipped: true, margin: 0.5cm, columns: 4) #set text(size: 8pt) #show heading: set block(spacing:0.6em) = Storage - Parts of disk - Platter has 2 surfaces - Surface has many tracks - Each track is broken up into sectors - Cylinder is the same tracks across all surfaces - Block comprises of multiple sectors - *Disk Access Time* - $"Seek time" + "Rotational Delay" + "Transfer Time"$ - *Seek Time* - Move arms to position disk head - *Rotational Delay* - $1/2 60/"RPM"$ - *Transfer time*(for n sectors) - $n times "time for 1 revolution"/ "sectors per track"$ - $n$ is requested sectors on track - Access Order + Contiguous Blocks within same track (same surface) + Cylinder track within same cylinder + next cylinder == Buffer Manager #image("buffer-manager.png") - Data stored in block sized pages called frames - Each frame maintains pin count(PC) and dirty flag === Replacement Policies - Decide which unpinned page to replace - *LRU* - queue of pointers to frames with PC = 0 - *clock* - LRU variant - *Reference bit* - turns on when PC = 0 - Replace a page when ref bit off and PC = 0 #image("clock-replacement-policy.png") == Files - Heap File Implementation - Linked List - 2 linked lists, 1 of free pages, 1 of data pages - Page Directory Implementation - Directory structure, 1 entry per page. - to insert, scan directory to find page with space to store record *Page Formats* - *RID* = (page id, slot number) - Fixed Length records - Packed Organization: Store records in contiguous slots (requires swapping last item to deleted location during deletion) - Unpacked organization: Use bit array to maintain free slots - *Variable Length Records*: Slotted page organization *Record Formats* - Fixed Length Records: Stored consecutively - Variable length Records - Delimit fields with special symbols (F1, \$, F2 \$, F3) - Array of field offsets ($o_1, o_2, o_3, F 1, F 2, F 3$) *Data Entry Formats* 1. $k*$ is an actual data record (with search key value k) 2. $k*$ is of the form *(k, rid)* 3. $k*$ is of the form *(k, rid-list)* list of rids of data with key $k$ = B+ Tree index - *Search key* is sequence of $k$ data attributes $k >= 1$ - *Composite search key* if $k > 1$ - *unique key* if search key contains _candidate_ key of table - index is stored as file - *Clustered index* - Ordering of data is same as data entries - key is known as *clustering key* - Format 1 index is clustered index (Assume format 2 and 3 to be unclustered) == Tree based Index - *root node* at level 0 - Height of tree = no of levels of internal node - *Leaf nodes* - level h, where h is height of tree - *internal nodes* store entries in form $(p_0, k_1, p_1, k_2, p_2, ..., p_n)$ - $k_1 < k_2 < ... < k_n$ - $p_i$ = disk page address - *Order* of index tree - Each non-root node has $m in [d, 2d]$ entries - Root node has $m in [1, 2d]$ entries - *Equality search*: At each _internal_ node $N$, find largest key $k_i$ in N, such that $k_i <= k$ - if $k_i$ exists, go subtree $p_i$, else $p_0$ - *Range search*: First matching record, and traverse doubly linked list - *Min nodes at level* i is $2 times (d + 1)^(i-1), i >= 1$ - *Max nodes at level* i is $(2d + 1)^(i)$ === Operations (Right sibling first, then left) === Insertion + *Leaf node Overflow* - Redistribute and then split - *Split* - Create a new leaf $N$ with $d+1$ entries. Create a new index entry $(k, square.filled)$ where $k$ is smallest key in $N$ - *Redistribute* - If sibling is not full, take from it. If given right, update right's parent pointer, else current node's parent pointer + *Internal node Overflow* - Node has $2d+1$ keys. - Push middle $(d+1)$-th key up to parent. === Deletion + *Leaf node* - Redistribute then merge - *Redistribution* - Sibling must have $> d$ recordsto borrow - Update parent pointers to right sibling's smallest key) - *Merge* - If sibling has $d$ entries, then merge - Combine with sibling, and then remove parent node + *Internal Node Underflow* - Let $N'$ be adjacent _sibling_ node of $N$ with $l, l > d$ entries - Insert $(K, N' . p_i)$ into $N$, where $i$ is the leftmost(0) or rightmost entry(l) - Replace $K$ in parent node with $N'.k_i$ - Remove $(p_i, k_i)$ entry from $N'$ === Bulk Loading + Sort entries by search keys. + Load leaf pages with $2d$ entries + For each leaf page, insert index entry to rightmost parent page = Hash based Index == Static Hashing - Data stored in $N$ buckets, where hash function $h(dot)$ is used to id bucket - record with key $k$ is inserted into $B_i, "where" i = h(k) mod N$ - Bucket is primary data page with 0+ overflow data pages == Linear Hashing - Grows linearly by splitting buckets - Systematic splitting: Bucket $B_i$ is split before $B_(i+1)$ - Let $N_i = 2^i N_0$ be file size at beginning of round $i$ - How to split bucket $B_i$ - Add bucket $B_j$ (split image of $B_i$) - Redistribute entries in $B_i$ between $B_i$ and $B_j$ - `next++; if next == NLevel: (level++; next = 0)` === Performance - Average: 1.2 IO for uniform data - Worst Case: Linear in number of entries == Extensible Hashing - Overflowed bucket is resolved by splitting overflowed bucket - No overflow pages, and order in which buckets are split is random - Directory of pointers to buckets, directory has $2^d$ entries - $d$ is global depth of hashed file - Each bucket maintains a local depth $l in [0, d]$ - Entries in a bucket of local depth $l$: same last $l$ bits === Bucket Overflow - Number of directory entries could be more than number of buckets - Number of dir entries pointing to bucket = $2^(d-l)$ - When bucket $B$ with depth $l$ overflows, - Increment local depth of $B$ to $l+1$ - Allocate split image $B'$ - Redistribute entries between $B$ and $B'$ using $(l+1)$th bit - if $l+1 > "global depth " d$ - Directory is doubled in size, , global depth to $d+1$ - New entries point to same bucket as corresponding entry - if $l+1 <= "global depth " d$ - Update dir entry corresponding to split bucket's directory entry to point to split image === Bucket Deletion - $B_i$ & $B_j$(with same local depth $l$ and differ only in $l$th bit) can be merged if entries fit bin bucket - $B_i$ is deallocated, $B_j$'s local depth decremented by 1. Directory entries that point to $B_i$ points to $B_j$ === Performance - At most 2 disk IOs for equality selection - Collisions: If they have same hashed value. - Need overflow pages if collisions exceed page capacity