cs3223: lec3

This commit is contained in:
2025-10-05 02:40:55 +08:00
parent c30ed93ec8
commit a4b1c9b357
2 changed files with 49 additions and 8 deletions

Binary file not shown.

View File

@@ -1,5 +1,5 @@
#set page(paper: "a4", flipped: true, margin: 0.5cm, columns: 4) #set page(paper: "a4", flipped: true, margin: 0.5cm, columns: 4)
#set text(size: 9pt) #set text(size: 8pt)
#show heading: set block(spacing:0.6em) #show heading: set block(spacing:0.6em)
= Storage = Storage
@@ -82,7 +82,7 @@
- *Min nodes at level* i is $2 times (d + 1)^(i-1), i >= 1$ - *Min nodes at level* i is $2 times (d + 1)^(i-1), i >= 1$
- *Max nodes at level* i is $(2d + 1)^(i)$ - *Max nodes at level* i is $(2d + 1)^(i)$
== Operations (Right sibling first, then left) === Operations (Right sibling first, then left)
=== Insertion === Insertion
+ *Leaf node Overflow* + *Leaf node Overflow*
- Redistribute and then split - Redistribute and then split
@@ -111,7 +111,48 @@
+ For each leaf page, insert index entry to rightmost parent page + For each leaf page, insert index entry to rightmost parent page
== Hash based Index = Hash based Index
=== Static Hashing == Static Hashing
=== Linear Hashing - Data stored in $N$ buckets, where hash function $h(dot)$ is used to id bucket
=== Extensible Hashing - record with key $k$ is inserted into $B_i, "where" i = h(k) mod N$
- Bucket is primary data page with 0+ overflow data pages
== Linear Hashing
- Grows linearly by splitting buckets
- Systematic splitting: Bucket $B_i$ is split before $B_(i+1)$
- Let $N_i = 2^i N_0$ be file size at beginning of round $i$
- How to split bucket $B_i$
- Add bucket $B_j$ (split image of $B_i$)
- Redistribute entries in $B_i$ between $B_i$ and $B_j$
- `next++; if next == NLevel: (level++; next = 0)`
=== Performance
- Average: 1.2 IO for uniform data
- Worst Case: Linear in number of entries
== Extensible Hashing
- Overflowed bucket is resolved by splitting overflowed bucket
- No overflow pages, and order in which buckets are split is random
- Directory of pointers to buckets, directory has $2^d$ entries
- $d$ is global depth of hashed file
- Each bucket maintains a local depth $l in [0, d]$
- Entries in a bucket of local depth $l$: same last $l$ bits
=== Bucket Overflow
- Number of directory entries could be more than number of buckets
- Number of dir entries pointing to bucket = $2^(d-l)$
- When bucket $B$ with depth $l$ overflows,
- Increment local depth of $B$ to $l+1$
- Allocate split image $B'$
- Redistribute entries between $B$ and $B'$ using $(l+1)$th bit
- if $l+1 > "global depth " d$
- Directory is doubled in size, , global depth to $d+1$
- New entries point to same bucket as corresponding entry
- if $l+1 <= "global depth " d$
- Update dir entry corresponding to split bucket's directory entry to point to split image
=== Bucket Deletion
- $B_i$ & $B_j$(with same local depth $l$ and differ only in $l$th bit) can be merged if entries fit bin bucket
- $B_i$ is deallocated, $B_j$'s local depth decremented by 1. Directory entries that point to $B_i$ points to $B_j$
=== Performance
- At most 2 disk IOs for equality selection
- Collisions: If they have same hashed value.
- Need overflow pages if collisions exceed page capacity