Mastering SQL for Data Science: A Comprehensive Guide Day-1
Mastering SQL for Data Science: A Comprehensive Guide Day-1
Ø Data: The piece of any information is called data. E.g. Name of any person or books or place. The data is only data until it is organized in a meaningful way at which point it will become information.
Ø Information: Collection of different types of data is called information. E.g. Details of ranker in the class.
Ø Table: Table is a collection of different number of rows and columns.
Ø Database: A database can be defined as a collection of coherent and meaningful data.
Ø DBMS: (Database Management System)
A Database Management System is essentially a collection of interrelated data and a set of programs to access this data.
Ø RDBMS: (Relational Database Management System)
RDBMS is acronym for Relational Database Management System. Dr. Edger F. Codd first introduces the Relational Database Model in 1970. It allows data to be represented in a simple row-column format. Each data field is considered as a column and each record is considered as a row of a table. RDBMS stores data in to the form of related tables. RDBMS are powerful because they require few assumptions about how data is related or how it will extracted from the
database.
Ø DBMS V/S RDBMS:
DBMS:
Ø In DBMS relationship between two tables or files are maintained programmatically.
Ø DBMS does not support Client/Server Architecture.
Ø It is also not supported to Distributed Databases.
Ø In DBMS there is not any type of security of data.
Ø Each table is given an extension in DBMS is .dbf
Ø DBMS may satisfy less than 7 to 8 rules of Dr. E. F. Codd.
RDBMS:
ü In RDBMS relationship between two tables or files can be specified at the time of table creation.
ü Most of the RDBMS supports Client/Server Architecture.
ü Most of the RDBMS supports Distributed Database.
ü In RDBMS there are multiple levels of security.
1. Logging at O/S level
2. Command Level (RDBMS level)
3. Object Level
ü Many tables are grouped in one database in RDBMS.
ü RDBMS usually satisfy more than 7 to 8 rules of Dr. E. F. Codd.
Ø Introduction of SQL:
Oracle’s Query language has structure, just as English or any other language has structure. This language allows end users to manipulate information of table. To use SQL you need not to require any of the programming experience. SQL is database language used for storing and retrieving data from the database.
Rules of SQL:
1) SQL starts with a verb. (SQL action words). E.g. select
2) Each verb is followed by different number of clauses. E.g. from, where..
3) A space separates clauses. E.g. Drop table stud;
4) A comma(,) separates parameters without a clause.
5) A semicolon (;) is used to end a statement.
6) Statements may be split across lines but keywords may not.
7) Reserved words cannot be used as an identifiers unless enclosed with double quotes.
8) Identifiers can contain upto 30 characters and must start with an alphabetic character.
9) Character and Date literals must be enclosed with a single quotes.
10) Comment of single line may be represented with – symbol and multiline comment represented with /* and */.
Ø Introduction of SQL* plus:
SQL* plus is an oracle tool (specific program) which accepts SQL commands and PL/SQL blocks and also executes them. SQL* plus enables manipulations of SQL commands and PL/SQL blocks. It also performs additional tasks such as calculations store and print query results in the form of reports, list column definitions of any table, access and copy data between SQL databases and send messages to and accept responses from the user.
Ø SQL Commands (components):
To communicate with oracle, SQL supports the following categories of languages or commands.
1) DDL (Data Definition Language):
It is a set of SQL commands used to create, modify and delete database structures but not data. These commands are normally not used by a general user, who should be accessing the database via an application. This language supports the following commands;
Create, alter, desc, drop, truncate.
2) DML (Data Manipulation Language):
It is the area of SQL that allows changing data within the database. This language supports the following commands;
3) DCL (Data Control Language):
It is the component of SQL statement that control access to data and to the database. The database occasionally, DCL statements are grouped with DML statements. This language supports the following commands;
Grant, Revoke.
4) TCL (Transaction Control Language):
It is the language of SQL transaction that allows to save any work, to identify any point or to restore the database. This language supports the following commands;
Commit, Rollback, Savepoint.
5) DQL (Data Query Language):
It is the components of SQL statement that allows getting data from the database and commanding ordering upon it. It includes the select statement; this command is the heart of the SQL. When a select is fired against a table or tables the result is also compiled into a further temporary table, which is displayed or perhaps received by the programs. i.e. front-end. This language supports following command.
Select
Ø SQL Datatypes:
When we create a table in oracle, a few items should be important. We have to specify a field or column type at the table creation time.
1) Char(size):
This data type is used to store character strings value of fixed length. The size in brackets determines the number of characters the cell can hold. The maximum number of characters this data type can hold is 255 characters. Default and minimum size is 1 byte.
2) Varchar(size)/Varchar2(size):
This data type is used to store variable length alphanumeric data. It is a more flexible form of the character data type. The storage capacity of this data type is 4000 characters.
3) Date:
This data type is used to represent date and time. The standard format is DD-MON-YY as in 02-JUL-09. By default, the time in a date field is 12:00:00 am, if no time portion is specified.
4) Number(P,S):
The number data type is used to store numeric values of virtually. This data type can hold up to 38 digits of precision. in particular scientific notations the precision (p), determines the maximum length of data, whereas the scale(s), determines the number of places to the right of the decimal.
5) Long:
This data types is used to store variable length character string containing upto 2 GB. Only one long value can be defined per table. Long values cannot be used in subqueries, functions and also expressions where clause or indexes and the normal character functions.
6) Raw/Long Raw:
This data type is used to store binary data such as digitized pictures or images. Raw data types can have a maximum length of 255 bytes. Long Raw data type can contain up to 2 GB. Values stored in columns having long raw data types cannot be indexed.