current progress
This commit is contained in:
Binary file not shown.
@@ -9,10 +9,10 @@
|
||||
- Each track is broken up into sectors
|
||||
- Cylinder is the same tracks across all surfaces
|
||||
- Block comprises of multiple sectors
|
||||
- *Disk Access Time* - $"Seek time" + "Rotational Delay" + "Transfer Time"$
|
||||
- *Seek Time* - Move arms to position disk head
|
||||
- *Rotational Delay* - $1/2 60/"RPM"$
|
||||
- *Transfer time*(for n sectors) - $n times "time for 1 revolution"/ "sectors per track"$
|
||||
- *Disk Access Time*: $"Seek time" + "Rotational Delay" + "Transfer Time"$
|
||||
- *Seek Time*: Move arms to position disk head
|
||||
- *Rotational Delay*: $1/2 60/"RPM"$
|
||||
- *Transfer time*(for n sectors): $n times "time for 1 revolution"/ "sectors per track"$
|
||||
- $n$ is requested sectors on track
|
||||
|
||||
- Access Order
|
||||
@@ -26,9 +26,9 @@
|
||||
- Each frame maintains pin count(PC) and dirty flag
|
||||
=== Replacement Policies
|
||||
- Decide which unpinned page to replace
|
||||
- *LRU* - queue of pointers to frames with PC = 0
|
||||
- *clock* - LRU variant
|
||||
- *Reference bit* - turns on when PC = 0
|
||||
- *LRU*: queue of pointers to frames with PC = 0
|
||||
- *clock*: LRU variant
|
||||
- *Reference bit*: turns on when PC = 0
|
||||
- Replace a page when ref bit off and PC = 0
|
||||
#image("clock-replacement-policy.png")
|
||||
|
||||
@@ -61,7 +61,7 @@
|
||||
- *Composite search key* if $k > 1$
|
||||
- *unique key* if search key contains _candidate_ key of table
|
||||
- index is stored as file
|
||||
- *Clustered index* - Ordering of data is same as data entries
|
||||
- *Clustered index*: Ordering of data is same as data entries
|
||||
- key is known as *clustering key*
|
||||
- Format 1 index is clustered index (Assume format 2 and 3 to be unclustered)
|
||||
|
||||
@@ -86,8 +86,8 @@
|
||||
=== Insertion
|
||||
+ *Leaf node Overflow*
|
||||
- Redistribute and then split
|
||||
- *Split* - Create a new leaf $N$ with $d+1$ entries. Create a new index entry $(k, square.filled)$ where $k$ is smallest key in $N$
|
||||
- *Redistribute* - If sibling is not full, take from it. If given right, update right's parent pointer, else current node's parent pointer
|
||||
- *Split*: Create a new leaf $N$ with $d+1$ entries. Create a new index entry $(k, square.filled)$ where $k$ is smallest key in $N$
|
||||
- *Redistribute*: If sibling is not full, take from it. If given right, update right's parent pointer, else current node's parent pointer
|
||||
+ *Internal node Overflow*
|
||||
- Node has $2d+1$ keys.
|
||||
- Push middle $(d+1)$-th key up to parent.
|
||||
@@ -156,3 +156,95 @@
|
||||
- Collisions: If they have same hashed value.
|
||||
- Need overflow pages if collisions exceed page capacity
|
||||
|
||||
#colbreak()
|
||||
= Sorting
|
||||
== Notation
|
||||
#table(
|
||||
columns: (auto, auto),
|
||||
$|r|$, [pages for R],
|
||||
$||r||$, [tuples in r],
|
||||
$pi_L (R)$, [project column by list $L$ from $R$],
|
||||
$pi_L^* (R)$, [project with duplicates],
|
||||
|
||||
|
||||
)
|
||||
== External Merge Sort
|
||||
- *File size*: $N$ pages
|
||||
- Memory pages available: $B$
|
||||
- *Pass 0*: Create sorted runs
|
||||
- Read and sort $B$ pages at a time
|
||||
- *Pass i*: Use $B-1$ pages for input, 1 for output, performing $B-1$-way merge sort
|
||||
- *Analysis*
|
||||
- Sorted runs: $N_0 = ceil(N/B)$
|
||||
- Total passes: $ceil(log_(B-1) (N_0))+1$
|
||||
- Total I/O: $2 N (ceil(log_(B-1) (N_0))+1)$
|
||||
=== Optimized Merge Sort
|
||||
- Read and write in blocks of $b$ pages
|
||||
- Allocate 1 Block for output
|
||||
- Remaining memory for input: $floor(B/b)-1$ blocks
|
||||
- *Analysis*
|
||||
- sorted runs: $N_0 = ceil(N/B)$
|
||||
- Runs Merged at each pass $F = floor(B/b)-1$
|
||||
- No of merge passes: $ceil(log_F (N_0))$(+1 for total)
|
||||
- Total IO: $2 N (ceil(log_F (N_0))+1)$
|
||||
- *Sorting with B+ Trees*: IO Cost: $h$ + Scan of leaf pages + Heap access (If not covering index)
|
||||
|
||||
== Projection
|
||||
=== Sort based approach
|
||||
- Extract attributes, Sort attributes, remove duplicates
|
||||
- *Analysis*
|
||||
+ Extract Attributes: $|R|"(scan)" + |pi_L^*(R)| "(output)"$
|
||||
+ Sort Attributes:
|
||||
- $N_0 = ceil((|pi_L^*(R)|)/B)$
|
||||
- Merging Passes: $log_(B-1) (N_0)$
|
||||
- Total IO: $2 |pi_L^*(R)| (log_(B-1) (N_0)+1)$
|
||||
+ Remove Duplicates: $|pi_L^*(R)|$
|
||||
=== Optimized approach
|
||||
- Merge Split step 2 into Creating and Merging sorted runs, and merge into step 1 and 3 respectively
|
||||
- *Analysis*
|
||||
- *Step 1*
|
||||
- $B-1$ pages for initial sorted run
|
||||
- Sorted Runs: $N_0 = ceil((|pi^*_L (R)|) / (B-1))$
|
||||
- Create sorted run = $|R| + |pi^*_L (R)|$
|
||||
- *Step 2*
|
||||
- Merging passes: $ceil(log_(B-1) (N_0))$
|
||||
- Cost of merging: $2 |pi^*_L (R)| ceil(log_(B-1) (N_0))$
|
||||
- Cost of merging excluding IO output: $(2 ceil(log_(B-1) (N_0))-1) |pi^*_L (R)| $
|
||||
|
||||
=== Hash based approach
|
||||
- *Partitioning*
|
||||
- Allocate 1 page for input, $B-1$ page for output.
|
||||
- Read 1 page at a time, for each tuple, create projection, hash($h$) to distribute to $B-1$ buffers
|
||||
- Flush to disk when full.
|
||||
- *Duplicate Elimination*
|
||||
- For each partition $R_i$, create hash table, hash each tuple with hash function $h' != h$ to bucket $B_j$ if $t in.not B_j$
|
||||
- *Partition Overflow*: hash table for $pi^*_L (R_i)$ is larger than memory pages allocated for $pi_L (R)$
|
||||
|
||||
- *Analysis*
|
||||
- IO Cost (no partition overflow) : $|R| + 2|pi^*_L (R)|$
|
||||
- Partitioning Phase: $|R| + |pi^*_L (R)|$
|
||||
- Duplicate Elimination: $|pi^*_L (R)|$
|
||||
- To Avoid partition overflows:
|
||||
- $|R_i| = (|pi^*_L (R)|) / (B-1)$
|
||||
- $B > "size of hash table", |R_i| times f$
|
||||
- $B > sqrt(f times |pi^*_L (R)|)$
|
||||
|
||||
= Selection
|
||||
- *Conjunct*: $1>=$ terms connected by $or$
|
||||
- *CNF predicate*: $1>=$ conjuncts connected by $and$
|
||||
- *Covered Conjunct* - predicate $p_i$ is covered conjunct if each attribute in $p_i$ is in key $K$ or include column of Index $I$
|
||||
- $sigma_p (R), p = ("age" > 5) and ("height" = 180) and ("level" = 3), I_1 "key" = ("level", "weight", "height"$
|
||||
- $p_c =
|
||||
- *Primary Conjunct* -
|
||||
- $sigma_p (R)$: Select rows from $R$ that satisfy predicate $p$
|
||||
- Access Path: way of accessing data records / entries
|
||||
- *Table Scan*: Scan all data pages (Cost: $|R|$)
|
||||
- *Index Scan*: Scan index pages
|
||||
- *Index Combination*: Combine from multiple index scans
|
||||
- Scan/Combination can be followed by RID lookup to retrieve data
|
||||
- *Index only plan*: Query where it does not need to access any data tuples in $R$
|
||||
- *Covering Index*: $I$ is covering index if all of $R$s attribute in query is part of the key / include columns of $I$
|
||||
|
||||
== B+ Trees
|
||||
- For Index Scan + RID Lookup, many matching RIDs could refer to same page
|
||||
- Sort matching RIDs before performing lookup: Avoid retrieving same page
|
||||
|
||||
Reference in New Issue
Block a user