cs3223: lecture 6
This commit is contained in:
Binary file not shown.
@@ -296,3 +296,72 @@ Cost of index scan = $nin + #nle + nlo$
|
||||
- $N_"bucket"$: max no of index's primary/overflow pages accessed
|
||||
- $nlo= nso + min{||sigma_p_c (R)||, |R|}$ if I is not covering index for $sigma_p (R)$
|
||||
]}
|
||||
|
||||
= Join Algorithms
|
||||
Considerations when choosing join Algorithm
|
||||
- Types of join predicates (Equality / inequality)
|
||||
- Sizes of join operands
|
||||
- Allocated memory pages
|
||||
- Available Access Methods
|
||||
- *Notation*: $R join_(A) S$
|
||||
- $R$ is outer relation, $S$ is inner relation
|
||||
- Nested Loop Join(NLJ) and Partition based join
|
||||
== Tuple-based NLJ
|
||||
Iterate through each page R, for each tuple in page, iterate through each page S, for each tuple in S, check if matches
|
||||
- *Cost*: $|R| + ||R|| times |S|$ - Read $S$ for each tuple in $R$
|
||||
- *Optimised*: Page based, Iterate through page R, iterate page S, then iterate tuples and check matching
|
||||
- *Cost*: $|R| + |R| times |S|$ - Read $S$ for each page in $R$
|
||||
|
||||
== Main Memory NLJ
|
||||
Assuming $|S| < |R|$, for optimal IO, compute $R join S$ with smaller operand as inner relation
|
||||
- Min pages needed: $B = |S| + 2$
|
||||
- 1 for $B_"outer"$, 1 for $B_"join"$, rest for $B_"inner"$ to read $S$
|
||||
- *Cost*: $|R| + |S|$
|
||||
|
||||
== Block NLJ
|
||||
- $R join S = union.big^k_(i=1) (R_i join S)$, $k = ceil((|R|)/B_"outer")$
|
||||
- To min IO Cost, we min $|R|$ or max $B_"outer"$
|
||||
- Choose smaller table as outer, ($R "if" |R| < |S|$)
|
||||
- IO Cost: $|R| + ceil((|R|)/(B-2)) times |S|$
|
||||
|
||||
== Index NLJ
|
||||
- Inner column: Table with index
|
||||
- Cost: $|R| + ||R|| times (N_"internal" + N_"leaf" + N_"lookup")$
|
||||
- Scan $R$ + search index for each tuple in $S$
|
||||
|
||||
== Sort-Merge Join
|
||||
- $R join S = union.big_(i in J) (R_i times S_i), "where" J = {i | R_i != emptyset, S_i != emptyset}$
|
||||
- $X_i subset.eq X$ is partition of $X$ where all records have join attribute value $i$
|
||||
- *Cost*: Sort $R$ + Sort $S$ + Merging cost
|
||||
- Sorting cost: $0$ if sorted, or internal sorting
|
||||
- Min merging cost: max $|B_"inner"|$
|
||||
- $S$ to be inner relation if $|"Max"P_S|<= |"Max"P_R|$
|
||||
- $"Max"P_x$: largest matching $X$-partition
|
||||
- If $|"Max"P_S|<= B-2$, Cost: $|R| + |S|$
|
||||
- else $|S| + ceil((|S|)/(B-2)) |R|$
|
||||
== Optimized SMJ
|
||||
- $S$ to be inner relation if $|"Max"P_S|<= |"Max"P_R|$
|
||||
- Find $i$ & $j$, $B > N(R, i) + N(S, j)$
|
||||
- $N(X, 0) = ceil((|X|) / B)$ and $N(X, k) = ceil((N(X, k-1))/(B-1))$
|
||||
- *Cost*: $2|R|(i+1) + 2|S|(j+1) + |R| + |S|$
|
||||
- Partial sort R + Partial sort S + merge & join
|
||||
|
||||
== Grace Hash Join
|
||||
- Partition $R$: $R_1, ..., R_k$, Partition $S$: $S_1, ..., S_k$
|
||||
- Read $R_i$ to build hash table (Build relation)
|
||||
- Read $S_i$ to probe hash table (Probe relation)
|
||||
- $R_i$ overflows if hash table is larger than memory page allocated
|
||||
- Recursively partition $R_i$ and $S_i$
|
||||
- *To avoid overflow*
|
||||
- Pick smaller operand $R$ as build relation $(|R| <= |S|)$
|
||||
- Partitioning: Max build partitions to min size
|
||||
- 1 page to read build $R$
|
||||
- 1 page to output $B-1$ partitions
|
||||
- Probing: Max memory allocated for hash table
|
||||
- 1 page to read probe $S_i$
|
||||
- 1 page to output $S_i join R_i$
|
||||
- $B-2$ pages for $R_i$'s hash table
|
||||
- $B > sqrt(f times |R|)$: size to avoid overflow
|
||||
- *Cost*: $2(|R| + |S|) + (|R| + |S|) = 3(|R| + |S|)$
|
||||
- 2 for partitioning, 1 for probing
|
||||
|
||||
|
||||
Reference in New Issue
Block a user