From Database to Big Data

Serge Miranda

Course content : Organization of the course around seven weekly modules with nine attached complementary short seminars. This MBDS course on Big data management is self-contained and organized into 7 weekly modules along with nine complementary short seminars (summarizing pre-requisites or presenting some applications and extensions) : C1  “Spiralist innovation on Big Data systems” : This module is a strategic multidisciplinary introduction around big data systems with definitions of key concepts (data, big data, machine learning, data lake, deep learning, etc.) and disruptive supporting technologies which will be useful during this course. We illustrate spiralist ICT innovation around three major dimensions of our data-centrics future. Three complementary short seminars are given on a Big-data use case (MBDS Big Bridge project), Convulational Neural Nets and Blockchain 2.0 concepts. 

  • C2 “Data paradigms and Codd’s relational data model” : There exists a plethora of big data management systems. In the first part of the course, we propose a classification of these systems using data paradigms that we illustrate with SQL standards (TIPS/ACID, RICE and WHAT properties). The second part concerns a key prerequisite : Codd’s relational data model which represents a formal unifying foundation and reference for big data management systems involving STRUCTURED data and explaining the success of SQL standard (due to Codd’s theorem). Two complementary seminars are proposed one on relational schema design method by Codd&Date using RM-T ENTITIES as (SURROGATE, VALUE), and the other one on major data base access methods (dynamic hashing and B-trees) used in every big-data system. 

  • C3 «  SQL2 introduction » : This prerequisite course is devoted to SQL2 standard presentation (including the Transaction concept with Gray’s theorem) for structured relational data bases. SQL will be the Esperanto for big data management with a Matriochka-layered approach starting with SQL2 for structured data. One complementary seminar is proposed on Datawarehouse, Olap, Cube operator and associated ALL value.  

  • C4 «  Third Date’s manifesto (underlying object-relational data models)” : Date’s manifesto is the neutral symmetric of Codd’s model for SQL2 for hybrid object-oriented data bases. We clarify the concepts of objects, (OID, VALUE), which will be useful for N.O.SQL systems based upon (KEY,VALUE). Two complementary short seminars on the second Stonebraker’s manifesto and DCOM object middleware by Microsoft.

  • C5 “Introduction to ODMG” : Object-oriented data models based upon Bancilhon’s manifesto was designed initially for object programmers willing to have data base access. ODMG is a data base extension of OMG (Object Management Group) proposed on top of Java, C++ and Smalltalk languages. 

  • C6 “Introduction to SQL3”: SQL3 is the fusion of Date’s and Stonebraker’s manifesto whose salient features are presented and discussed in this module with a focus on line pointers (ROWID) and their definition and manipulation consequences (REF type attribute containing ROWID and dereferencing operator on REF type attribute). 

  • C7 «  Overview of N.O. SQL and NEWS SQL » : In this module we introduce N.O.SQL systems around both (KEY-VALUE) paradigm (Hadoop, BLOB, Json Document, columns) and GRAPHS merging into NEW SQL systems and identify the expected functionalities of an upcoming BIG SQL standard. One complementary research-oriented seminar on formal unifying theories underlying SQL and N.O.SQL systems with the promising approach of Category theory both for multi-model data systems and for polystored data-lake access and analysis. 

Note : These short seminars are attached to some BD2 courses : to define complementary concepts or prerequesites useful for the whole sprectrum of Big data management and to meet the objective of a self-contained graduate course. 

