Delete from B+ tree

  • Find correct leaf L
  • Remove data entry from L
  • If L has at least entries remaining, we’re done
  • Else, L has exactly entries
    • First, check if we can borrow from adjacent siblings
    • If we can’t, merge two siblings, resulting in a node with entries
    • Update parent node accordingly
    • This can happen recursively, if so, we continue borrowing/merging

Cost of using B+ tree

In the I/O model,

  • If index is in memory, search costs 0
  • Otherwise,
    • One node is typically one page, reading/writing costs 1, searching inside the node is 0 once in memory
    • If one node is not a page, we count number of pages per node read/write
    • Nodes that cannot be fully loaded into memory are impractical

Updates

  • Best/Average case - updating only one node, so 1
  • Worst case - (read both siblings + merge with sibling) (depth - 1)
    • There exists ‘bad’ sequences of updates where worst case delete and insert happen frequently, but these are rare in practice

Execution engine

flowchart TD

user(App/User)
compiler[Query compiler/optimizer]
engine[Execution engine]
index[Index/Record manager]
buffer[Buffer manager]
transaction[Transaction Manager]
storage[(Storage)]

user -- Transaction commands --> transaction
user -- Query / DML / DDL --> compiler
compiler -- Query execution plan --> engine
engine -- Record, Index requests --> index
index -- Page commands --> buffer
buffer -- Read/write pages --> storage

transaction --> engine
transaction --> buffer

Input of the execution engine is a query plan:

  • Logical tree
  • Implementation choice of each node
  • Scheduling, memory

Each operation is executed, its output is passed on

Relational algebra

Relational algebra is a formal way of creating relations from other relations There are two types of operators - Unary and Binary We can combine operators by using a Formula (defined recursively), which is in turn equivalent to a Logical tree

  • There are 5 basic operators:
    • Union:
    • Difference:
    • Selection:
    • Projection:
    • Cartesian product:
  • And derived or auxiliary operators:
    • Intersection
    • Joins (natural, equi-join, theta join, …)
    • Renaming:

In addition to that, there are two ways of treating relations formally - bag and set semantics Bag semantics treat relations as multisets, so they preserve duplicates Set semantics treat relations as sets, so they automatically eliminate any duplicates

Union, difference and intersection

Selection and Projection

  • Selection - return only rows of that fulfill the condition
    • is the condition for selection, which is composed of atomic formulas with and logical expressions over such formulas using
    • e.g.
    • Bag and set semantics are the same
  • Projection - return only columns of
    • is a subset of columns of
    • e.g.
    • Set semantics may lead to less tuples in the result

Cartesian product and joins

  • Cartesian product - return all pairs of tuples in relations
  • Theta join
  • Equi-join - a theta join where is an atomic formula in the form of , e.g.
    • Natural join - a theta join, where , e.g.

Renaming

  • Renaming attributes, e.g. renames all to
  • Renaming relations, e.g. renames relation to

Operations on bags also lead to another useful operator

  • Distinct (duplicate elimination) -

Executing operations

General scheme

For the relational algebra operators, for each implementation:

  • We state the necessary assumptions
    • Data is saved on external memory
    • Indices
    • Memory size
    • Data ordering and/or distribution
  • Then we estimate the cost of the operation in the I/O model
    • Answers are usually up to due to data organization
  • We usually ignore the cost of writing the output

Notations

  • - size in pages (blocks)
  • - number of records
  • - number of distinct values in the column
  • - relation
  • - output
  • - index

Distinct

  • Cost of the following is the cost of the full scan
    • If memory size is large enough, hold the search structure in memory (e.g. hash table), for each tuple in relation add it to the structure and output only if it wasn’t seen before, write to disk when output page is full.
    • Otherwise, if data is sorted, read the relation and write only the first copy of each tuple.
  • Otherwise, one can always sort the data, the cost is then the cost of sorting the data

For some operations, we can compute DISTINCT on-the-fly

Union, difference, intersection

  • Bag union - cost of reading two relations
  • Set union - cost of reading two relations + distinct
  • Set difference
    • If memory is large enough, keep in a search structure, proceed with as in set union
    • Otherwise, if data is sorted, read both relations in parallel to determine which tuples should be ignored
  • Set intersection - an exercise…

Selection

Without an index, cost of selection is clearly The cost of searching the index is

  • 0 if index is in memory
  • 1 for a hash index
  • 2-4 read for a B+ tree index

If the index is clustered, all relevant records are adjacent, we can read them in If the index is not clustered, it usually is a B+ tree, data entries are sorted, so we can read the relevant records in up to if each output tuple is in a different page, the improvement is to not read the same page twice, which gives us

Example

Execute

  • 50% of records fulfill the selection condition

  • If there is no index on price column and the records are unordered, the cost is

  • If there is a sorted clustered index on price, size of the lowest level of is 100, the cost is then index search +

  • If the same index is not clustered, the cost is index search + entry reading +

    • Using such an index is not helpful in the worst case
Reduction factor

The size of the output largely affects the cost of operations, the implementation choice can greatly benefit from knowing, or at least estimating this size The DBMS keeps statistics about the values of columns/indices Assuming values of some column are uniformly distributed, for , , similarly What can we do about ? We will see this in equi-joins For , we assume are known, then

The reduction factor is determined by the operation and statistics available The same factor is used to compute the relevant fraction of relation pages, relation tuples and data entries in the index

Projection

  • Bag projection - simply cost of the full scan
  • Set projection - cost projection + distinct
    • If using a search structure, keep only the remaining fields, cost of projection is then included in that of distinct

Using an index

  • If index contains projected attributes and is dense - no need to read records at all, cost is the size of the index in pages,

Executing joins

Implementing cartesian product

A naive implementation is a nested loop, for each record in , read all records in and write all pairs, cost is then It is clear then, that the outside loop should be run over a smaller relation, here we assumed A nested loop join is then identical, except that only pages that match condition are written to the output A key observation here is that we don’t use the most of available main memory, which leads to the following idea

Block nested loop join

Instead of reading one page at a time, we can read pages from at once and then compare each page of to these pages The cost then becomes A key observation here is that the size of a join is usually much smaller than that of a cartesian product, maybe we can find relevant pairs faster? How?

Index nested loop join

Use an index to fetch only relevant pairs Assume there is an index on (column in ) and we want to compute For each tuple of , we find matching tuples of using the index, read them and write pairs The cost is then reading + search per tuple of If index is clustered, the cost is

If index is not clustered, the cost is

Possible improvements are reading pages of per page of instead of “per tuple” and a block index nested loop join

Sort-merge join

If and are sorted by relevant columns, finding matches for an equi-join is easy, similar to merge in external merge sort Execution of

  • Read a page from and from
  • Output all the matches
  • If the last is smaller than last , read a page of , otherwise read a page of
  • Continue until reaching the end of or Cost of a Sort-Merge join is then the cost of sorting + cost of merge We assume the cost of merge to be , then the total cost is where is the number of passes in the sort, which is usually 2 We can pass the outputs of the last sorting pass directly to the merging, saving one read + write of and , depending on available memory. This is called pipelining

Hash join

First, partition both and using a hash function . Clearly, tuples in can only match tuples in Here, in order to be able to do this in one pass, meaning that each page is read only once, there can be no more than partitions. Then, for each partition , for each page in write matching pairs. This step assumes . If this is not fulfilled, partition larger partitions again! The cost of a hash join (assuming one partition step) is just

The only limitation is that , which is usually the case, as just 1MB of memory (256 4KB pages) is enough to hash 256GB of data

AlgorithmAssumptionsAdvantages
Nested loop joinNo assumptionsNaive, efficient if one relation is very small or the result is close to Cartesian product
Block nested loop joinNo assumptionsBetter than nested loop join
Index nested loop joinIndex on one relationUsually better than nested loop if
index is clustered or there are few
matches per tuple
Sort-merge joinEach tuple has relatively few matchesVery efficient if data is sorted, result is sorted, usually more efficient than loops (linear)
Hash joinData can be split into small enough partitionsUsually the most efficient (linear)