cs3223: lecture 6

This commit is contained in:
2025-10-14 12:05:50 +08:00
parent d41b75d722
commit 1676745e7b
2 changed files with 69 additions and 0 deletions

Binary file not shown.

View File

@@ -296,3 +296,72 @@ Cost of index scan = $nin + #nle + nlo$
- $N_"bucket"$: max no of index's primary/overflow pages accessed
- $nlo= nso + min{||sigma_p_c (R)||, |R|}$ if I is not covering index for $sigma_p (R)$
]}
= Join Algorithms
Considerations when choosing join Algorithm
- Types of join predicates (Equality / inequality)
- Sizes of join operands
- Allocated memory pages
- Available Access Methods
- *Notation*: $R join_(A) S$
- $R$ is outer relation, $S$ is inner relation
- Nested Loop Join(NLJ) and Partition based join
== Tuple-based NLJ
Iterate through each page R, for each tuple in page, iterate through each page S, for each tuple in S, check if matches
- *Cost*: $|R| + ||R|| times |S|$ - Read $S$ for each tuple in $R$
- *Optimised*: Page based, Iterate through page R, iterate page S, then iterate tuples and check matching
- *Cost*: $|R| + |R| times |S|$ - Read $S$ for each page in $R$
== Main Memory NLJ
Assuming $|S| < |R|$, for optimal IO, compute $R join S$ with smaller operand as inner relation
- Min pages needed: $B = |S| + 2$
- 1 for $B_"outer"$, 1 for $B_"join"$, rest for $B_"inner"$ to read $S$
- *Cost*: $|R| + |S|$
== Block NLJ
- $R join S = union.big^k_(i=1) (R_i join S)$, $k = ceil((|R|)/B_"outer")$
- To min IO Cost, we min $|R|$ or max $B_"outer"$
- Choose smaller table as outer, ($R "if" |R| < |S|$)
- IO Cost: $|R| + ceil((|R|)/(B-2)) times |S|$
== Index NLJ
- Inner column: Table with index
- Cost: $|R| + ||R|| times (N_"internal" + N_"leaf" + N_"lookup")$
- Scan $R$ + search index for each tuple in $S$
== Sort-Merge Join
- $R join S = union.big_(i in J) (R_i times S_i), "where" J = {i | R_i != emptyset, S_i != emptyset}$
- $X_i subset.eq X$ is partition of $X$ where all records have join attribute value $i$
- *Cost*: Sort $R$ + Sort $S$ + Merging cost
- Sorting cost: $0$ if sorted, or internal sorting
- Min merging cost: max $|B_"inner"|$
- $S$ to be inner relation if $|"Max"P_S|<= |"Max"P_R|$
- $"Max"P_x$: largest matching $X$-partition
- If $|"Max"P_S|<= B-2$, Cost: $|R| + |S|$
- else $|S| + ceil((|S|)/(B-2)) |R|$
== Optimized SMJ
- $S$ to be inner relation if $|"Max"P_S|<= |"Max"P_R|$
- Find $i$ & $j$, $B > N(R, i) + N(S, j)$
- $N(X, 0) = ceil((|X|) / B)$ and $N(X, k) = ceil((N(X, k-1))/(B-1))$
- *Cost*: $2|R|(i+1) + 2|S|(j+1) + |R| + |S|$
- Partial sort R + Partial sort S + merge & join
== Grace Hash Join
- Partition $R$: $R_1, ..., R_k$, Partition $S$: $S_1, ..., S_k$
- Read $R_i$ to build hash table (Build relation)
- Read $S_i$ to probe hash table (Probe relation)
- $R_i$ overflows if hash table is larger than memory page allocated
- Recursively partition $R_i$ and $S_i$
- *To avoid overflow*
- Pick smaller operand $R$ as build relation $(|R| <= |S|)$
- Partitioning: Max build partitions to min size
- 1 page to read build $R$
- 1 page to output $B-1$ partitions
- Probing: Max memory allocated for hash table
- 1 page to read probe $S_i$
- 1 page to output $S_i join R_i$
- $B-2$ pages for $R_i$'s hash table
- $B > sqrt(f times |R|)$: size to avoid overflow
- *Cost*: $2(|R| + |S|) + (|R| + |S|) = 3(|R| + |S|)$
- 2 for partitioning, 1 for probing