diff --git a/cs3223/cheatsheet.pdf b/cs3223/cheatsheet.pdf index c1e0afe..fe45048 100644 Binary files a/cs3223/cheatsheet.pdf and b/cs3223/cheatsheet.pdf differ diff --git a/cs3223/cheatsheet.typ b/cs3223/cheatsheet.typ index 9e5d74a..71c1259 100644 --- a/cs3223/cheatsheet.typ +++ b/cs3223/cheatsheet.typ @@ -296,3 +296,72 @@ Cost of index scan = $nin + #nle + nlo$ - $N_"bucket"$: max no of index's primary/overflow pages accessed - $nlo= nso + min{||sigma_p_c (R)||, |R|}$ if I is not covering index for $sigma_p (R)$ ]} + += Join Algorithms +Considerations when choosing join Algorithm +- Types of join predicates (Equality / inequality) +- Sizes of join operands +- Allocated memory pages +- Available Access Methods +- *Notation*: $R join_(A) S$ + - $R$ is outer relation, $S$ is inner relation +- Nested Loop Join(NLJ) and Partition based join +== Tuple-based NLJ +Iterate through each page R, for each tuple in page, iterate through each page S, for each tuple in S, check if matches +- *Cost*: $|R| + ||R|| times |S|$ - Read $S$ for each tuple in $R$ +- *Optimised*: Page based, Iterate through page R, iterate page S, then iterate tuples and check matching + - *Cost*: $|R| + |R| times |S|$ - Read $S$ for each page in $R$ + +== Main Memory NLJ +Assuming $|S| < |R|$, for optimal IO, compute $R join S$ with smaller operand as inner relation +- Min pages needed: $B = |S| + 2$ +- 1 for $B_"outer"$, 1 for $B_"join"$, rest for $B_"inner"$ to read $S$ +- *Cost*: $|R| + |S|$ + +== Block NLJ +- $R join S = union.big^k_(i=1) (R_i join S)$, $k = ceil((|R|)/B_"outer")$ +- To min IO Cost, we min $|R|$ or max $B_"outer"$ +- Choose smaller table as outer, ($R "if" |R| < |S|$) +- IO Cost: $|R| + ceil((|R|)/(B-2)) times |S|$ + +== Index NLJ +- Inner column: Table with index +- Cost: $|R| + ||R|| times (N_"internal" + N_"leaf" + N_"lookup")$ + - Scan $R$ + search index for each tuple in $S$ + +== Sort-Merge Join +- $R join S = union.big_(i in J) (R_i times S_i), "where" J = {i | R_i != emptyset, S_i != emptyset}$ + - $X_i subset.eq X$ is partition of $X$ where all records have join attribute value $i$ +- *Cost*: Sort $R$ + Sort $S$ + Merging cost + - Sorting cost: $0$ if sorted, or internal sorting + - Min merging cost: max $|B_"inner"|$ + - $S$ to be inner relation if $|"Max"P_S|<= |"Max"P_R|$ + - $"Max"P_x$: largest matching $X$-partition + - If $|"Max"P_S|<= B-2$, Cost: $|R| + |S|$ + - else $|S| + ceil((|S|)/(B-2)) |R|$ +== Optimized SMJ +- $S$ to be inner relation if $|"Max"P_S|<= |"Max"P_R|$ +- Find $i$ & $j$, $B > N(R, i) + N(S, j)$ + - $N(X, 0) = ceil((|X|) / B)$ and $N(X, k) = ceil((N(X, k-1))/(B-1))$ +- *Cost*: $2|R|(i+1) + 2|S|(j+1) + |R| + |S|$ + - Partial sort R + Partial sort S + merge & join + +== Grace Hash Join +- Partition $R$: $R_1, ..., R_k$, Partition $S$: $S_1, ..., S_k$ +- Read $R_i$ to build hash table (Build relation) +- Read $S_i$ to probe hash table (Probe relation) +- $R_i$ overflows if hash table is larger than memory page allocated + - Recursively partition $R_i$ and $S_i$ +- *To avoid overflow* +- Pick smaller operand $R$ as build relation $(|R| <= |S|)$ +- Partitioning: Max build partitions to min size + - 1 page to read build $R$ + - 1 page to output $B-1$ partitions +- Probing: Max memory allocated for hash table + - 1 page to read probe $S_i$ + - 1 page to output $S_i join R_i$ + - $B-2$ pages for $R_i$'s hash table +- $B > sqrt(f times |R|)$: size to avoid overflow +- *Cost*: $2(|R| + |S|) + (|R| + |S|) = 3(|R| + |S|)$ + - 2 for partitioning, 1 for probing +