Relational Algebra



Relational Algebra is the mathematical basis for the query language SQL


  • Introduction.

    • So now you have learn how to design good conceptual models to store information with the ER-model
    • And you also know how to turn an ER-model into a Relational model
    • Let us jump the next step - which is defining the Relational Database and entering the data into the database - and look at data manipulation
    • What can we do with the data ????




  • Relational Algebra: Overview
    • is a mathematical language for manipulating relations (ala mathematical definition of "relation" - which is a subset of some cartesian product)
    • contains:
      • set operators
      • relational database specific operators
      • set functions
    • is used to specify a result set that satisfies a certain "selection criteria"
    • the result of an operation in relational algebra is always a relation




  • Set operations
    • You should all know what set union is:

      • Suppose A = {i, j, k} and B = {k, x, y}
      • Then the union of A and B is {i, j, k, x, y}
      • Note that tuples in sets A and B must have the same FORMAT (same number and type of attributes) to be used in the union operation.
    • You should also have learned what set intersection is:
      • Suppose A = {i, j, k} and B = {k, x, y}
      • Then the intersection of A and B is {k}
      • Note that tuples in sets A and B must have the same FORMAT (same number and type of attributes) to be used in the intersection operation.
    • You should also have learned what set difference is:
      • Suppose A = {i, j, k} and B = {k, x, y}
      • Then the difference of A and B is {i, j}
      • Note that tuples in sets A and B must have the same FORMAT (same number and type of attributes) to be used in the set difference operation.
    • I have already discussed the cartesian productclick here




  • Set Functions
    • Operates on a set of values and produce a single value
    • To illustrate the various set function, we apply the set functions on this set of value:

      A = {1, 4, 9}

      Results:

      • sum(A) = sum of all values of A (answer = 14)
      • avg(A) = average of all values in A (answer= 4.6666)
      • max(A) = max of all values in A (answer = 9)
      • min(A) = min of all values in A (answer = 1)
      • any(A) = test if set A has at least one element (answer = TRUE) (Note: result FALSE means set A is empty)
      • count(A) = cardinality of set A (answer = 3 (= number of elements in set))


    Relational database specific operations



  • σ (Condition) (R)         (selection)
    • Selects tuples from a relation R that satisfies the condition Condition
    • The Condition expression contains constants and/or attributes of relation R (the condition expression cannot be a set !!!)
    • The result of σ (Condition) (R) is a subset of tuples of R that satisfies the boolean condition Condition
    • Examples:
      • Retrieve all employee tuples working for department number 4:

        σ (dno = 4) (employee) 
      • The following diagram shows the result graphically (hopefully, it will illustrates the concept unambiguously):

      • Retrieve all employees earning more than $30,000:
        σ (salary > 30000) (employee) 

      • Retrieve all employees working for department number 4 and eaning more than $30,000:
        σ (dno = 4 and salary > 30000) (employee) 

        OR:

        σ (salary > 30000) ( σ (dno = 4) (employee) )




  • π (attribute-list)         (projection)
    • Selects out only the attribute values attribute-list from all tuples in relation R
    • The attribute-list contains the list of attributes in relation R that will be selected.
    • The result of π (attribute-list) (R) contains the is a subset of tuples of R that satisfies the boolean condition Condition
    • Examples:
      • Retrieve all SSN from the employees

        π ssn (employee) 
      • The following diagram shows the result graphically (hopefully, it will illustrates the concept unambiguously):

      • Retrieve the sex information from all employees:
        π sex (employee) 

        The following diagram shows the result graphically (hopefully, it will illustrates the concept of projection unambiguously):

        Notes:

        • The output of a Relational Algebra query is set.

          set in Mathematics does not have duplicate elements

          Therefore, the result of πsex (employee) is the set {F, M} because duplicates are removed


        • In SQL, the user will have the option to remove duplicates

          That is because removing duplicates require longer time to process....

          (The user has the option to incur the cost or not...)





  • Combining information from 2 relations
    • Consider the following query:

      Find FName and LName of all employees in the "Research" department
    • Observation:
      • We will need to combine information in the employee and department relations to answer the query.
    • But we must be careful in combining the information:
         Employee:
      +-----------+--------+---------+-----------+-----+
      | ssn | fname | lname | bdate | dno | .... (other columns omitted)
      +-----------+--------+---------+-----------+-----+
      | 123456789 | John | Smith | 09-JAN-55 | 5 |
      | 333445555 | Frankl | Wong | 08-DEC-45 | 5 |
      | 999887777 | Alicia | Zelaya | 19-JUL-58 | 4 |
      | 987654321 | Jennif | Wallace | 20-JUN-31 | 4 |
      | 666884444 | Ramesh | Narayan | 15-SEP-52 | 5 |
      | 453453453 | Joyce | English | 31-JUL-62 | 5 |
      | 987987987 | Ahmad | Jabbar | 29-MAR-59 | 4 |
      | 888665555 | James | Borg | 10-NOV-27 | 1 |
      +-----------+--------+---------+-----------+-----+ Department:
      +----------------+---------+-----------+--------------+
      | dname | dnumber | mgrssn | mgrstartdate |
      +----------------+---------+-----------+--------------+
      | Research | 5 | 333445555 | 22-MAY-78 |
      | Administration | 4 | 987654321 | 01-JAN-85 |
      | Headquarters | 1 | 888665555 | 19-JUN-71 |
      +----------------+---------+-----------+--------------+

      How to "combine" tuples from Employee and Department:

      • Only combine a tuple from employee with a tuple from department when their department numbers are EQUAL


        (Remember we used the department number in Employee as a foreign key !!!

        The  department number in Employee "refers" to one tuple in Department


    • Solution:
      1. First find the department number of the "Research" department:

        R1 = π dnumber ( σ (dname = 'Research') ( department ) )

        Graphically:

      2. Then we combine the department and employee through a "mindless" cartesian product operation:
        R2 = R1 x employee

        The result of this operation is "tagging" an extra column "dnumber" to the employee relation:

      3. Now we can use a select operation to select out the "right tuples" :

        Because dnumber is now an attribute of the (new) relation R2 !!!

        π(FName, LName) (σ (dno = dnumber) ( R2 ) )
      4. Graphically:





  • Efficiency consideration of queries
    • Correctness is the most important property of a query

      The second most important property of a query is efficiency


    • There maybe multiple queries that retrieve the correct set of tuples (answer)

      The difference of the queries may be their execution speed


    • Consider the following solution:
      π(FName, LName) (σ (dno = dnumber and dname = 'Research') ( employee x department ) )
    • Although this query is correct , it will take longer time to process because of the larger amount of intermediate data that the query produces:
    • The cartesian product employee x department will produce huge number of tuples - many more than in the previous solution.




  • Join operations
    • The join operation (bow tie symbol) is equivalent to:


    • Example:
        Retrieve all employees from the "Research" department
      • slower (but correct) solution:

        It is slower because the join operation will result in more tuples as intermediate results


      • The faster version is:




  • The "theta join" operation
    • The condition in the join operation is of the form:

         Condition

      1

       and Condition

      2

       and ....
         
    • Each condition is of the form "x θ y", where x and y are attributes or constants
    • And θ is one of the following relational operators:
      • ==
      • !=
      • <
      • <=
      • >=
      • >
    • A join operation with a "general" join condition (where θ can be any relational operation), is called a theta join




  • The "equi-join" operation
    • Often (almost always :-)), we use the equal relational operation in the join operation:

           Condition

      1

       and Condition

      2

       and ....
      

      where each condition is of the form:

          x == y
      

      where x and y are attributes or constants


    • Equi-join:
      • join operation where the the join condition contain only equality compare operators is called a equi-join




  • The "natural join" (*) operation
    • The natural join operation is an equi-join operation follow by a projection operation that removes one of the duplicate attributes.
    • Since the equi-join will for tuples when the attribute values are equal, all tuples that produced by an equi-join will have the same attribute value under the join condition attributes.
    • The natural join operation is denoted by a star (*):
          R1 * attr1,attr2 R2
      


  • Outer-join
    • Outer-joins:

      • An outer join does not require each record in the two joined tables to have a matching values.
      • The rusulting joined table will retains records from one or both tables even if no other matching value exists.

    • The 3 flavors of outer-joins:
      • Left outer-join: A ⟕ B

        • The result of the left outer join (or simply left joinA ⟕ B always contains all records of the "left" table (A)even if the join-condition does not find any matching record in the "right" table (B).


        • unmatched tuple from A is processed "normally" (discuss in the join operation)
        • An  matched tuple from A is combined with a row of NULL value (so that the resulting tuple have the same columns as an matched tuple)

        • Example:
               A:                 B:
          
                  X     Y            U     V
          +-----+-----+ +-----+-----+
          | 1 | 7 | | 1 | 8 |
          +-----+-----+ +-----+-----+
          | 2 | 5 | | 2 | 3 |
          +-----+-----+ +-----+-----+
          | 3 | 4 | | 4 | 9 |
          +-----+-----+ +-----+-----+ A ⟕A.X=B.U B: X Y U V
          +-----+-----+-----+-----+
          | 1 | 7 | 1 | 8 |
          +-----+-----+-----+-----+
          | 2 | 5 | 2 | 3 |
          +-----+-----+-----+-----+
          | 3 | 4 | NULL| NULL|
          +-----+-----+-----+-----+

      • Right outer-join: A ⟖ B
        • The result of the right outer join (or simply right joinA ⟖ B always contains all records of the "right" table (B)even if the join-condition does not find any matching record in the "left" table (A).


        • matched tuple from B is processed "normally" (discuss in the join operation)
        • An  unmatched tuple from B is combined with a row of NULL value (so that the resulting tuple have the same columns as an matched tuple)

        • Example:
               A:                 B:
          
                  X     Y            U     V
          +-----+-----+ +-----+-----+
          | 1 | 7 | | 1 | 8 |
          +-----+-----+ +-----+-----+
          | 2 | 5 | | 2 | 3 |
          +-----+-----+ +-----+-----+
          | 3 | 4 | | 4 | 9 |
          +-----+-----+ +-----+-----+ A ⟖A.X=B.U B: X Y U V
          +-----+-----+-----+-----+
          | 1 | 7 | 1 | 8 |
          +-----+-----+-----+-----+
          | 2 | 5 | 2 | 3 |
          +-----+-----+-----+-----+
          | NULL| NULL| 4 | 9 |
          +-----+-----+-----+-----+

      • Full outer-join: A ⟗ B
        • The result of the full outer join (or simply outer joinA ⟗ B always contains all records of both tableseven if the join-condition does not find any matching record...


        • matched tuple is processed "normally" (discuss in the join operation)
        • An  unmatched tuple in either table is combined with a row of NULL value (so that the resulting tuple have the same columns as an matched tuple)

        • Example:
               A:                 B:
          
                  X     Y            U     V
          +-----+-----+ +-----+-----+
          | 1 | 7 | | 1 | 8 |
          +-----+-----+ +-----+-----+
          | 2 | 5 | | 2 | 3 |
          +-----+-----+ +-----+-----+
          | 3 | 4 | | 4 | 9 |
          +-----+-----+ +-----+-----+ A ⟗A.X=B.U B: X Y U V
          +-----+-----+-----+-----+
          | 1 | 7 | 1 | 8 |
          +-----+-----+-----+-----+
          | 2 | 5 | 2 | 3 |
          +-----+-----+-----+-----+
          | 3 | 4 | NULL| NULL|
          +-----+-----+-----+-----+
          | NULL| NULL| 4 | 9 |
          +-----+-----+-----+-----+

      Note:

      • If attribute names are the same in both relations, you can omit the join condition in outer joins
      • In that case, you will also remove one column of the same name from the resulting table

      • Example:
             A:                 B:
        
                X     Y            X     V
        +-----+-----+ +-----+-----+
        | 1 | 7 | | 1 | 8 |
        +-----+-----+ +-----+-----+
        | 2 | 5 | | 2 | 3 |
        +-----+-----+ +-----+-----+
        | 3 | 2 | | 4 | 9 |
        +-----+-----+ +-----+-----+ A ⟗ B: X Y V
        +-----+-----+-----+
        | 1 | 7 | 8 |
        +-----+-----+-----+
        | 2 | 5 | 3 |
        +-----+-----+-----+
        | 3 | 2 | NULL|
        +-----+-----+-----+
        | 4 | NULL| 9 |
        +-----+-----+-----+

    • Teaching notes:
      • I personally dislike NULL values....


      • Therefore, I will avoid using outer joins in my solutions....




  • More examples....
    1. Find fname and lname of all employees working in the "Research" department that earn more than $50,000


    2. Find fname and lname of John Smith's supervisor

    3. Find fname and lname of all employees that have dependents

    4. Find fname and lname of all employees that do not have dependents

      This solution is wrong:

      The following diagram shows how the query works on an example database, which illustrates why the query is wrong:


      The correct solution is:

      • Help1 = set of SSN of employees with dependents
      • Help2 = set of SSN of employees without any dependents

    5. Find fname and lname of employees who has more than one dependents of the same sex (i.e., 2 or more boys or 2 or more girls):
  • NOTE: so join conditions do not always have to be "equal" !!!





  • A comment...
    • Clearly, I can easily formulate a gad-sillion different queries in the next day or so, so:

      DO NOT

          prepare for tests with

      memorizing

        facts - you gotta understand what's going on and do the queries "from scratch"....
    • And for those students that have never had me before: I have a reputation to challenge student's grey cells to work overtime.... so you should expect some very interesting queries.... and plenty of opportunity to practice your skill (in the homeworks and SQL projects).
  • http://www.mathcs.emory.edu/~cheung/Courses/377/Syllabus/4-RelAlg/intro.html

Relational Algebra 关系代数的更多相关文章

  1. Advanced SQL: Relational division in jOOQ

              i   Rate This Relational algebra has its treats. One of the most academic features is the ...

  2. 转:如何学习SQL(第二部分:从关系角度理解SQL)

    转自:http://blog.163.com/mig3719@126/blog/static/285720652010950825538/ 6. 从关系角度理解SQL 6.1. 关系和表 众所周知,我 ...

  3. A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets(中英双语)

    文章标题 A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets 且谈Apache Spark的API三剑客:RDD.Dat ...

  4. 数据库系统概论(2)——Chap. 2 关系数据库基础

    数据库系统概论(2)--Chap.2 关系数据库基础 一.关系数据结构及形式化定义 1.关系 关系模型的数据结构只包含单一的数据结构--关系.在关系模型中,现实世界的实体及实体间的各种联系均用单一的结 ...

  5. SQL:百科

    ylbtech-SQL:百科 结构化查询语言(Structured Query Language)简称SQL,是一种特殊目的的编程语言,是一种数据库查询和程序设计语言,用于存取数据以及查询.更新和管理 ...

  6. [源码分析] 带你梳理 Flink SQL / Table API内部执行流程

    [源码分析] 带你梳理 Flink SQL / Table API内部执行流程 目录 [源码分析] 带你梳理 Flink SQL / Table API内部执行流程 0x00 摘要 0x01 Apac ...

  7. Python数据科学手册-Pandas:合并数据集

    将不同的数据源进行合并 , 类似数据库 join merge . 工具函数 concat / append pd.concat() 简易合并 合并高维数据 默认按行合并. axis=0 ,试试 axi ...

  8. 数据库精华知识点总结(1)—数据库的三层模式和二级映像,E-R(实体联系图)图,关系模型

    Data base: 长期存储在计算机内,有组织的,可共享的大量数据集合.基本特征:永久存储,可共享,有一定的物理和逻辑结构. Data base manage system(DBMS):用户和os之 ...

  9. 10分钟学习pandas

    10 Minutes to pandas This is a short introduction to pandas, geared mainly for new users. You can se ...

随机推荐

  1. ABAP ODATA接口开发

    1.SE37 创建好 Remote-Enabled函数. 2.SEGW,新建项目 创建项目之后,展开,右键Data Model,Import 函数. 导入函数之后,创建mapping 3./IWFND ...

  2. +Java中的native关键字浅析(Java+Native+Interface)++

    JNI是Java Native Interface的 缩写.从Java 1.1开始,Java Native Interface (JNI)标准成为java平台的一部分,它允许Java代码和其他语言写的 ...

  3. Cmake的介绍和使用 Cmake实践【转】

    本文转载自:http://www.cppblog.com/Roger/archive/2011/11/17/160368.html Cmake的介绍和使用 Cmake实践 Cmake优点: 1.    ...

  4. Ubuntu系统下实现Android工程调用独立编译的C++程序和GMP、PBC库

    目的: 实现使用C++编写代码供Android工程调用.C++代码中可以使用STL库,也可以使用常用的由源码编译生成的库,如PBC.因为PBC是基于GMP库的,所以这里只记录了GMP和PBC库的编译安 ...

  5. 配置RabbitMQ远程访问

    本文参考自:http://flashing.iteye.com/blog/1797531 1.如果远程客户端网络状况不是太好,比如adsl什么的,那么一定在客户端打开requstedHeartbeat ...

  6. 【SDOI 2011】染色

    [题目链接] 点击打开链接 [算法] 树链剖分 [代码] 本题,笔者求最近公共祖先并没有用树链剖分“往上跳”的方式,而是用倍增法.笔者认为这样比较好写,代码可读性 比较高 此外,笔者的线段树并没有用懒 ...

  7. PHP命名空间namespace使用小结

    1.介绍一下php的命名空间 什么是命名空间?从广义上来说,命名空间是一种封装事物的方法.在很多地方都可以见到这种抽象概念.例如,在操作系统中目录用来将相关文件分组,对于目录中的文件来说,它就扮演了命 ...

  8. bzoj 1296: [SCOI2009]粉刷匠【dp+背包dp】

    参考:http://hzwer.com/3099.html 神题神题 其实只要知道思路就有点都不难-- 先对每一行dp,设g[i][j]为这行前i个格子粉刷了k次最大粉刷正确数,随便n^3一下就行 设 ...

  9. 【插件开发】—— 10 JFace开发详解

    前文回顾: 1 插件学习篇 2 简单的建立插件工程以及模型文件分析 3 利用扩展点,开发透视图 4 SWT编程须知 5 SWT简单控件的使用与布局搭配 6 SWT复杂空间与布局搭配 7 SWT布局详解 ...

  10. Fastjson详解

    Fastjson是一个Java语言编写的高性能功能完善的JSON库.它采用一种"假定有序快速匹配"的算法,把JSON Parse的性能提升到极致,是目前Java语言中最快的JSON ...