Overview of the relational algebra and SQL

Relational Algebra and SQL

Relational Algebra is an algebra on tables and consists of the following operators

  • Union $\cup$ , Intersection $\cap$ , Difference $-$ ,
  • Selection $\sigma$ , Projection $\Pi$ ,
  • (Natural) Join $\bowtie$ .

Note that the algebra is closed under the above operators, i.e. the output is always a table.

The relational algebra equipped with the following operators is called extended relational algebra.

  • Duplicate Elimination $d$ ,
  • Grouping and Aggregation $g$ ,
  • Sorting $t$ .

Union, Intersection and Difference

R1 =  
AB
a1b1
a2b1
     R2 =  
AB
a1b1
a3b4
     R1 ∪ R2 =  
AB
a1b1
a2b1
a1b1
a3b4
     R1 ∩ R2 =  
AB
a1b1
     R1 − R2 =  
AB
a2b1
  • SELECT * FROM R1 UNION ALL SELECT * FROM R2; : the union R1 $\cup$ R2
  • SELECT * FROM R1 INTERSECT SELECT * FROM R2; : the intersection R1 $\cap$ R2
  • SELECT * FROM R1 EXCEPT SELECT * FROM R2; : the difference R1 $-$ R2

Claim: R1 $\cap$ R2 = R1 $-$ (R1 $-$ R2). The RHS is obtained by

SELECT * FROM R1 EXCEPT SELECT * FROM (SELECT * FROM R1 EXCEPT SELECT * FROM R2);

Selection and Projection

S =  
idnamescore
100John70
101Smith50
102Fred90
     σscore≥60(S) =  
idnamescore
100John70
102Fred90
     ∏name,score(S) =  
namescore
John70
Smith50
Fred90

Selection

Pick rows satisfying a certain condition $c$. The result is written as $\sigma_c(R)$. Here $c$ is a map sending a row into a boolean value.

SELECT * FROM S WHERE score >= 60;

We use WHERE c for the selection $\sigma_c$ , not SELECT.

  • = is used for an equation. Not ==.
  • The condition score >= 50 AND score <= 70 can be described simpler with the BETWEEN operator.

    SELECT * FROM S WHERE score BETWEEN 50 AND 70;
    
  • For a condition $c$ we can use LIKE operator to find a pattern of a strings.

    SELECT name FROM S WHERE name LIKE '%h_';
    

    Here % means one or more characters and _ means a single character.

  • Instead of LIKE we can use the regular expression with REGEXP operator.

    SELECT name FROM S WHERE name REGEXP '^[aeiouAEIOU]';
    
  • For a condition $c$ we can use IN operator to specify multiple values in a column.

    SELECT * FROM S WHERE score IN ('50','90');
    
  • To obtain the first n rows in a table, use LIMIT.

    SELECT * FROM S LIMIT 50;
    
  • Use HAVING instead of WHERE when using an aggregate function.

    SELECT * FROM S 
    

Projection

Restrict a table to the specified columns, say A1, ... , An. The result is written as $\Pi$A1,...,An(R).

SELECT name,score from S;

We use SELECT name,score for the projection $\Pi$name,score.

Join and Cross Product

S =  
idnamescore
100John70
101Smith50
102Fred90
    S2 =  
namehight
Fred165.0
John181.0
Smith172.0
    S × S2 =  
idnamescorenamehight
100John70Fred165.0
100John70John181.0
100John70Smith172.0
101Smith50Fred165.0
101Smith50John181.0
101Smith50Smith172.0
102Fred90Fred165.0
102Fred90John181.0
102Fred90Smith172.0

The cross product R1$\times$R2 (a.k.a. the cross join) gives the all possible pair of a row of R1 and a row of R2. The cross proect is obtained by SELECT * FROM S,S2;.

The equi-join R1 $\bowtie$A=B R2 is defined by $\sigma$A=B(R1$\times$R2). Therefore S $\bowtie$ S.name=S2.name S2 is the following table

idnamescorenamehight
100John70John181.0
101Smith50Smith172.0
102Fred90Fred165.0

and we obtain it by SELECT * FROM S,S2 WHERE S.name = S2.name; by definition, or

SELECT * FROM S JOIN S2 ON S.name = S2.name;

The theta-join R1 $\bowtie_\theta$ R2 is an easy extension of the equi-join and defined by $\sigma_\theta$(R1$\times$R2).

Outer joins: left outer join, right outer join, full outer join

Duplicate Elimination and Sort

S =  
idnamescore
100John70
101Smith50
102Fred90
103Anna90
104Maria50
    d∏score(S) =  
score
70
50
90
    tscore(S) =  
idnamescore
101Smith50
104Maria50
100John70
102Fred90
103Anna90

Use DISTINCT to eliminate duplicates of rows. The second table is obtained by

SELECT DISTINCT score FROM S;

Use ORDER BY to sort the rows by one or more columns. The third table is obtained by

SELECT * FROM S ORDER BY score;

Note that the default order is the ascending order. Use the DESC keyword to sort the rows in a descending order. For example, SELECT * FROM S ORDER BY score DESC, id DESC; sorts the rows by the columns score and id in descending order .

Groupby

T =  
classnamescore
aFred55
aMaria99
bJohn60
bJulia82
cBritta50
cSmith50
    T1 =  
classaverage
a77.0
b71.0
c50
    T2 =  
class
a
b

The GROUP BY statement divides a table by a certain column. Applying it together with an aggregate function, we obtain a "summary" table.

Table T1 is the table of the averages of scores by class and obtained by

SELECT class, AVG(score) AS average FROM T GROUP BY class;

Table T2 is the table of the classes in which there is a person whose score > 60. It is obtained by

SELECT class FROM T GROUP BY class HAVING MAX(score) > 60;

Note the HAVING keyword.

Aggregate functions

  • AGV() gives the mean value.
  • COUNT() gives the number of rows. A typical usage is COUNT(*).
  • MAX()/MIN() gives the maximum/minimum value in the column.
  • SUM() gives the sum of the values in the column.
  • POWER(,) is used instead of **.
  • CEIL()/FLOOR()
  • REPLACE(col,'a','A') : replace substring with a new one

Some examples:

  • SELECT *, MAX(score) AS max FROM T GROUP BY class; finds the top score by class.
  • SELECT class, COUNT(*) AS count FROM T GROUP BY class; counts the number of rows in each class.

Comparison between RA, SQL and R (with dplyr)

RASQLR/dplyr
unionUNIONrbind()
intersectionINTERSECTintersect()
differenceEXCEPTanti_join()
selectionWHEREfilter()
projectionSELECTselect()
joinJOINmerge()
duplicate eliminationDISTINCTdistinct()
sortingORDER BYarrange()
groupingGROUP BYgroup_by()

Notes for R

  • The sqldf package allows us to work with SQL commands on R. Manual, Tutorial 1, Tutorial 2.
  • The plyr package provides join() function. Manual.
  • merge(df,dg,by=NULL) gives a cross join.

Modifying a Table

  • Add a new row

    INSERT INTO Table VALUES (val1,val2,val3,...);
    INSERT INTO Table (col1,col2,col3,...) VALUES (val1,val2,val3,...);
    
  • Modify an exsisting row

    UPDATE Table SET col1=val1,col2=val2,... WHERE coln=val;
    
  • Delete a row

    DELETE FROM Table WHERE col=val;
    

Managing Tables

  • Create a table

    CREATE TABLE Table (A TEXT, B TEXT);
    
  • Creates a virtual table.

    CREATE VIEW NewTable AS SELECT * FROM Table WHERE score>60;
    

    A virtual table always gives up-to-date data.

  • Deletes a table

    DROP TABLE Table;
    

Alter TABLE statement

  • Tutorial1, Tutorial2.
  • According to SQLite FAQ: (11) How do I add or delete columns from an existing table in SQLite.

    SQLite has limited ALTER TABLE support that you can use to add a column to the end of a table or to change the name of a table. If you want to make more complex changes in the structure of a table, you will have to recreate the table. You can save existing data to a temporary table, drop the old table, create the new table, then copy the data back in from the temporary table.

Datatype (schema)

  • SQLite: NULL, INTEGER, REAL, TEXT, BLOB.
  • MariaDB, MySQL, Overview.
    • Integer: TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT
    • Text: TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT
    • Blob: TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB
    • Floating-Point: FLOAT, DOUBLE
    • Fixed-Point Types: DECIMAL( , )
    • Bit Value Types: BIT(N)
    • Date and Time Types: DATETIME, DATE, TIMESTAMP
    • CHAR and VARCHAR Types: CHAR, VARCHAR
    • BINARY and VARBINARY Types: BINARY, VARBINARY

Show the information of a table

  • SQLite: pragma table_info(Table);

    sqlite> pragma table_info(S);
    cid         name        type        notnull     dflt_value  pk        
    ----------  ----------  ----------  ----------  ----------  ----------
    0           id          int         0                       0         
    1           name        text        0                       0         
    2           score       int         0                       0 
    

    .schema shows the commands which are used to create all tables.

  • MySQL/MariaDB: DESCRIBE Table. Manual. This is equivalent to SHOW COLUMNS FROM Table;. More details including character encoding can be obtained by SHOW FULL COLUMNS FROM Table;. About UTF-8 on MySQL (German).

Window functions

A fundamental syntax is:

<window function> OVER (
    [PARTITION BY <expression list>]
    [ORDER BY <expression [ASC|DESC] list>]
    [ROWS|RANGE <window frame>]
)
  • The usage of PARTITION BY is the same as one of GROUP BY.
  • When using ORDER BY, the window is going to be

    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    

    Assume we use PARTITION BY with ORDER BY. To make the range the whole of the corresponding group, use the following range

    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    

    Such a frame could be needed when we use a "order sensitive" function such as first_value().

  • OVER () means all rows
  • We can create an alias of a window by WINDOW AS w (ORDER BY col).

    SELECT col, LAG(col,1) OVER w FROM generate_series(1,15,2) as T(col) WINDOW w AS (ORDER BY col);
    

Window functions

  • Every aggregate function (including a user defined aggregate function) can be used as a window function.
  • The build-in aggregate functions / window functions of PostgreSQL.
  • row_number() gives the row index to each row.
  • Some functions requires ORDER BY <exp> instead of an expression.
    • rank() / dense_rank() : give the rank according to <exp>
    • percent_rank() percenteil function
    • cume_dist() gives the cumulative distribution.

In the following example, a table q contains only one column x with values 4, 2, 5, 1, 9, 4, 0, 6.

select x,
    row_number() over (),
    rank() over w,
    dense_rank() over w,
    percent_rank() over w,
    cume_dist() over w
from q window w as (order by x);

The above statement creates the following table. Note that the order of rows are changed because of the window w.

 x | row_number | rank | dense_rank |   percent_rank    | cume_dist 
---+------------+------+------------+-------------------+-----------
 0 |          1 |    1 |          1 |                 0 |     0.125
 1 |          2 |    2 |          2 | 0.142857142857143 |      0.25
 2 |          3 |    3 |          3 | 0.285714285714286 |     0.375
 4 |          4 |    4 |          4 | 0.428571428571429 |     0.625
 4 |          5 |    4 |          4 | 0.428571428571429 |     0.625
 5 |          6 |    6 |          5 | 0.714285714285714 |      0.75
 6 |          7 |    7 |          6 | 0.857142857142857 |     0.875
 9 |          8 |    8 |          7 |                 1 |         1
Share this page on