cs3223: lec3
This commit is contained in:
Binary file not shown.
@@ -1,5 +1,5 @@
|
|||||||
#set page(paper: "a4", flipped: true, margin: 0.5cm, columns: 4)
|
#set page(paper: "a4", flipped: true, margin: 0.5cm, columns: 4)
|
||||||
#set text(size: 9pt)
|
#set text(size: 8pt)
|
||||||
#show heading: set block(spacing:0.6em)
|
#show heading: set block(spacing:0.6em)
|
||||||
|
|
||||||
= Storage
|
= Storage
|
||||||
@@ -82,7 +82,7 @@
|
|||||||
- *Min nodes at level* i is $2 times (d + 1)^(i-1), i >= 1$
|
- *Min nodes at level* i is $2 times (d + 1)^(i-1), i >= 1$
|
||||||
- *Max nodes at level* i is $(2d + 1)^(i)$
|
- *Max nodes at level* i is $(2d + 1)^(i)$
|
||||||
|
|
||||||
== Operations (Right sibling first, then left)
|
=== Operations (Right sibling first, then left)
|
||||||
=== Insertion
|
=== Insertion
|
||||||
+ *Leaf node Overflow*
|
+ *Leaf node Overflow*
|
||||||
- Redistribute and then split
|
- Redistribute and then split
|
||||||
@@ -111,7 +111,48 @@
|
|||||||
+ For each leaf page, insert index entry to rightmost parent page
|
+ For each leaf page, insert index entry to rightmost parent page
|
||||||
|
|
||||||
|
|
||||||
== Hash based Index
|
= Hash based Index
|
||||||
=== Static Hashing
|
== Static Hashing
|
||||||
=== Linear Hashing
|
- Data stored in $N$ buckets, where hash function $h(dot)$ is used to id bucket
|
||||||
=== Extensible Hashing
|
- record with key $k$ is inserted into $B_i, "where" i = h(k) mod N$
|
||||||
|
- Bucket is primary data page with 0+ overflow data pages
|
||||||
|
== Linear Hashing
|
||||||
|
- Grows linearly by splitting buckets
|
||||||
|
- Systematic splitting: Bucket $B_i$ is split before $B_(i+1)$
|
||||||
|
- Let $N_i = 2^i N_0$ be file size at beginning of round $i$
|
||||||
|
- How to split bucket $B_i$
|
||||||
|
- Add bucket $B_j$ (split image of $B_i$)
|
||||||
|
- Redistribute entries in $B_i$ between $B_i$ and $B_j$
|
||||||
|
- `next++; if next == NLevel: (level++; next = 0)`
|
||||||
|
=== Performance
|
||||||
|
- Average: 1.2 IO for uniform data
|
||||||
|
- Worst Case: Linear in number of entries
|
||||||
|
|
||||||
|
== Extensible Hashing
|
||||||
|
- Overflowed bucket is resolved by splitting overflowed bucket
|
||||||
|
- No overflow pages, and order in which buckets are split is random
|
||||||
|
- Directory of pointers to buckets, directory has $2^d$ entries
|
||||||
|
- $d$ is global depth of hashed file
|
||||||
|
- Each bucket maintains a local depth $l in [0, d]$
|
||||||
|
- Entries in a bucket of local depth $l$: same last $l$ bits
|
||||||
|
=== Bucket Overflow
|
||||||
|
- Number of directory entries could be more than number of buckets
|
||||||
|
- Number of dir entries pointing to bucket = $2^(d-l)$
|
||||||
|
- When bucket $B$ with depth $l$ overflows,
|
||||||
|
- Increment local depth of $B$ to $l+1$
|
||||||
|
- Allocate split image $B'$
|
||||||
|
- Redistribute entries between $B$ and $B'$ using $(l+1)$th bit
|
||||||
|
- if $l+1 > "global depth " d$
|
||||||
|
- Directory is doubled in size, , global depth to $d+1$
|
||||||
|
- New entries point to same bucket as corresponding entry
|
||||||
|
- if $l+1 <= "global depth " d$
|
||||||
|
- Update dir entry corresponding to split bucket's directory entry to point to split image
|
||||||
|
|
||||||
|
=== Bucket Deletion
|
||||||
|
- $B_i$ & $B_j$(with same local depth $l$ and differ only in $l$th bit) can be merged if entries fit bin bucket
|
||||||
|
- $B_i$ is deallocated, $B_j$'s local depth decremented by 1. Directory entries that point to $B_i$ points to $B_j$
|
||||||
|
=== Performance
|
||||||
|
- At most 2 disk IOs for equality selection
|
||||||
|
- Collisions: If they have same hashed value.
|
||||||
|
- Need overflow pages if collisions exceed page capacity
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user