Computer Science (2210)
Topic 11 of 11Cambridge O Levels

Databases & Data Storage

File vs database, SQL queries, primary keys, data types and storage

Introduction to Databases: Your Digital Organizer


Salaam, SeekhoAsaan students! Imagine you're trying to keep track of all the delicious orders at your family's biryani shop in Karachi, or perhaps the scores and player details for your local cricket team. If you just use separate pieces of paper (or separate files on your computer) for everything, things can quickly become messy, hard to find, and full of mistakes. This is where databases come in – they are like super-organised digital filing cabinets designed to store, manage, and retrieve large amounts of information efficiently and securely.


In this lesson, we'll dive into the world of databases. We'll understand why they are so much better than simple files for storing complex information, learn about the different building blocks of a database, explore how we use a special language called SQL to talk to them, and understand the importance of primary keys and data types.


Files vs. Databases: Why Databases Win


Before databases became popular, people stored data in separate computer files. For example, a shop might have one file for customer names, another for product prices, and yet another for daily sales. Let's look at why this file-based approach has limitations compared to a database system.


#### File-Based Systems (The Old Way)


In a file-based system, data is stored in individual files, often managed by the operating system's file manager. Think of it like a collection of separate notebooks for different subjects.


Limitations of File-Based Systems:

* Data Redundancy: The same information might be stored in multiple files. For instance, a customer's address might be in their customer file and also in their order file. This wastes storage space.

* Data Inconsistency: If data is redundant, updating one copy of the information (like a customer's new address) might not update all copies. This leads to conflicting and incorrect data.

* Difficulty in Sharing Data: It's hard for different applications or users to access and use the same data simultaneously without causing conflicts or errors.

* Lack of Data Integrity: No built-in rules to ensure data is correct and valid (e.g., ensuring an age is a positive number).

* Poor Security: It's difficult to set fine-grained security permissions (e.g., allowing only certain users to see financial data, but everyone to see product names).

* Limited Querying: Finding specific information requires writing custom programs for each search, which is inefficient.


#### Database Systems (The Smart Way)


A database is an organized collection of structured information, or data, typically stored electronically in a computer system. It's usually controlled by a Database Management System (DBMS). A DBMS is a software package that allows users to create, maintain, and interact with a database. Examples of popular DBMS include MySQL, Oracle, PostgreSQL, and Microsoft SQL Server.


Advantages of Database Systems:

* Reduced Data Redundancy: Information is stored once, eliminating unnecessary duplication.

* Improved Data Consistency: Since data is stored once, updating it ensures consistency across all uses.

* Enhanced Data Sharing: Multiple users and applications can access and share data simultaneously in a controlled manner.

* Better Data Integrity: The DBMS enforces rules (like data types, primary keys, and validation rules) to ensure data is accurate and reliable.

* Improved Data Security: The DBMS provides mechanisms to control who can access what data and what operations they can perform (e.g., read, write, delete).

* Efficient Data Retrieval (Querying): Databases use powerful query languages (like SQL) to quickly find and retrieve specific information.

* Data Independence: The way data is stored physically (on disk) can be changed without affecting how users view and interact with the data (logical data independence). Also, the overall logical structure can be changed without affecting applications (to some extent) (physical data independence).


Database Structure: Tables, Records, and Fields


Most modern databases, especially those you'll encounter at O Level, are relational databases. In a relational database, data is organised into one or more tables (also called relations). Think of each table as a spreadsheet.


* Table: A collection of related data organised into rows and columns. For example, a `Customers` table, a `Products` table, or an `Orders` table.

* Record (Row / Tuple): A single entry in a table, containing all the information about one specific item or entity. In a `Customers` table, one row would represent one customer with all their details.

* Field (Column / Attribute): A single piece of information or category of data within a table. In a `Customers` table, 'Customer ID', 'Customer Name', 'Address', and 'Phone Number' would be fields.


| CustomerID | CustomerName | Address | Phone |

|------------|---------------|------------------|---------------|

| 101 | Ali Khan | F-6, Islamabad | 0300-1234567 |

| 102 | Zara Ahmed | Gulberg, Lahore | 0321-9876543 |

| 103 | Fatima Iqbal | DHA, Karachi | 0333-5551234 |


In the table above:

* The entire structure is a `Customers` table.

* Each horizontal line (e.g., `101 | Ali Khan | F-6, Islamabad | 0300-1234567`) is a record.

* Each vertical category (e.g., `CustomerID`, `CustomerName`) is a field.


Primary Keys: The Unique Identifier


Imagine you have two students in your class named 'Ahmed'. How do you uniquely identify them? You might use their roll number or admission number. In a database, we use a primary key for this exact purpose.


A primary key is a field (or a combination of fields) in a table that uniquely identifies each record in that table. It's like a unique ID card for every row.


Characteristics of a Primary Key:

  1. Unique: No two records can have the same primary key value.
  2. Not Null: A primary key field cannot be empty; it must always contain a value.
  3. Stable: The value of a primary key should ideally not change over time.

In our `Customers` table example, `CustomerID` is a perfect choice for a primary key because each customer will have a unique ID, and it will always have a value.


#### Candidate Keys


A candidate key is any field or combination of fields that could potentially serve as a primary key because it uniquely identifies each record. For example, if every customer has a unique `CNIC` number, then `CNIC` could also be a candidate key for the `Customers` table, along with `CustomerID`.


#### Compound Keys (Composite Keys)


Sometimes, a single field isn't enough to uniquely identify a record. In such cases, we use a compound key (also known as a composite key), which is a primary key made up of two or more fields combined. For example, in a `StudentCourseEnrollment` table, `StudentID` alone isn't unique (a student takes many courses) and `CourseID` alone isn't unique (many students take the same course). But the combination of `StudentID` and `CourseID` together would uniquely identify a specific student enrolled in a specific course.


#### Foreign Keys (Linking Tables)


While not strictly a primary key, understanding foreign keys is essential for relational databases. A foreign key is a field (or collection of fields) in one table that refers to the primary key in another table. It's how tables are linked together. For example, an `Orders` table might have a `CustomerID` field, which is a foreign key linking back to the `CustomerID` primary key in the `Customers` table. This allows us to know which customer placed which order.


Data Types: The Right Kind of Information


When you create a field in a database table, you must define its data type. A data type specifies the kind of data that can be stored in that field. Choosing the correct data type is crucial for:


* Data Integrity: Ensures that only valid data is entered (e.g., you can't put text into a number field).

* Storage Efficiency: Different data types require different amounts of storage space. Choosing the smallest appropriate type saves disk space.

* Performance: Databases can process and query data more efficiently when they know its type.


Here are common data types you'll encounter:


  1. Text (String/VARCHAR/CHAR):

* Used for storing letters, numbers, and special characters. Examples: names, addresses, product descriptions, phone numbers (even though they contain digits, you usually don't perform calculations on them).

* `CHAR`: Fixed-length string. If you declare `CHAR(10)` and store "Ali", it still uses 10 bytes, padding with spaces.

* `VARCHAR` (Variable Character): Variable-length string. If you declare `VARCHAR(10)` and store "Ali", it uses only 3 bytes (plus a small overhead).


  1. Number (Numeric):

* Used for storing numerical values that you might perform calculations on.

* `INTEGER` (INT): Whole numbers (e.g., `5`, `100`, `-30`). Good for quantities, IDs, counts.

* `REAL` (FLOAT/DOUBLE/DECIMAL): Numbers with decimal points (e.g., `19.99`, `3.14159`). Good for prices, measurements, percentages.


  1. Date/Time:

* Used for storing dates, times, or both. Examples: `2023-10-27`, `14:30:00`, `2023-10-27 14:30:00`.


  1. Boolean (Logical/Yes/No):

* Used for storing true/false values. Examples: `IsPaid` (True/False), `InStock` (Yes/No).


Worked Example 1: Choosing Data Types for a Pakistani Cricket Team Database


Imagine we're setting up a database for a local cricket team called "Lahore Lions". We need a `Players` table. Let's decide on fields and their appropriate data types:


| Field Name | Description | Data Type | Primary Key? |

|------------------|--------------------------|----------------|--------------|

| `PlayerID` | Unique ID for each player| `INTEGER` | Yes |

| `PlayerName` | Full name of the player | `VARCHAR(100)` | No |

| `DateOfBirth` | Player's birth date | `DATE` | No |

| `JerseyNumber` | Player's jersey number | `INTEGER` | No |

| `BattingHand` | Right-handed or Left-handed | `VARCHAR(10)` | No |

| `IsCaptain` | Is the player the captain? | `BOOLEAN` | No |

| `MatchesPlayed` | Total matches played | `INTEGER` | No |

| `TotalRuns` | Total runs scored | `INTEGER` | No |

| `BattingAverage` | Average runs per match | `REAL` | No |


*Explanation:* `PlayerID` is `INTEGER` and unique, perfect for a primary key. `PlayerName` needs to be flexible for names of various lengths, so `VARCHAR` is good. `DateOfBirth` needs the `DATE` type for date calculations. `IsCaptain` is a simple yes/no, so `BOOLEAN`. `BattingAverage` involves decimals, so `REAL` is appropriate.


Data Storage: How Information Lives on Disk


When we talk about data storage in a database context, we consider both the logical and physical views of data.


* Logical Data: This is how users and applications perceive and interact with the data (e.g., viewing data in tables, fields, and records). It's the conceptual organisation. We don't worry about how it's physically stored.

* Physical Data: This is how the data is actually stored on the storage media (like hard disk drives or Solid State Drives). This involves details like file formats, indexing, and actual byte allocation. The DBMS handles this aspect, abstracting it away from the user.


The choice of data type directly impacts physical storage. For example:

* An `INTEGER` typically uses 2 or 4 bytes, regardless of the value (e.g., `5` or `50000`).

* A `REAL` (or `FLOAT`) might use 4 or 8 bytes.

* A `CHAR(10)` field will always reserve 10 bytes.

* A `VARCHAR(10)` field will only use the number of bytes equivalent to the characters stored plus a small overhead (e.g., 'Ali' uses 3 bytes + overhead). This is why `VARCHAR` is often preferred for text where lengths vary significantly, as it saves space.


Efficient data type selection and proper database design (like normalization, which ensures data is structured to reduce redundancy) directly contribute to optimised storage use and faster performance.


SQL Queries: Talking to Your Database


SQL stands for Structured Query Language. It's the standard language used to communicate with relational databases. Think of it as the command language that lets you ask questions, add data, change data, and even create or delete tables in your database.


SQL is divided into several sub-languages:

* DDL (Data Definition Language): Used for defining the database structure (creating, altering, dropping tables).

* DML (Data Manipulation Language): Used for managing data within the tables (inserting, retrieving, updating, deleting records).


#### Data Definition Language (DDL)


##### `CREATE TABLE`


This command is used to create a new table in the database. You specify the table name and define each field (column) with its name, data type, and any constraints (like `PRIMARY KEY`, `NOT NULL`).


Syntax:

`CREATE TABLE TableName (`

` Column1Name DataType [Constraints],`

` Column2Name DataType [Constraints],`

` ...`

` PRIMARY KEY (ColumnName)`

`);`


Worked Example 2: Creating a Table for a Bazaar Inventory


Let's create a `Products` table for a shop selling goods in Anarkali Bazaar, Lahore.


`CREATE TABLE Products (`

` ProductID INT PRIMARY KEY,`

` ProductName VARCHAR(255) NOT NULL,`

` Category VARCHAR(50),`

` UnitPrice DECIMAL(10, 2) NOT NULL,`

` StockQuantity INT DEFAULT 0`

`);`


*Explanation:*

* `ProductID` is an `INTEGER` and the `PRIMARY KEY`. It must be unique and not null.

* `ProductName` is a `VARCHAR` with a maximum length of 255 characters and `NOT NULL`, meaning every product must have a name.

* `Category` is also a `VARCHAR`.

* `UnitPrice` is `DECIMAL(10, 2)`. `DECIMAL` is similar to `REAL` but offers higher precision for monetary values. `10, 2` means it can store up to 10 digits in total, with 2 digits after the decimal point (e.g., 99999999.99). It's also `NOT NULL`.

* `StockQuantity` is an `INTEGER` and has a `DEFAULT` value of `0` if not specified during insertion.


##### `ALTER TABLE` (Briefly)


Used to add, delete, or modify columns in an existing table, or to add/drop constraints. For O Level, a basic understanding is sufficient.


Example: `ALTER TABLE Products ADD COLUMN SupplierID INT;`


##### `DROP TABLE`


Used to delete an existing table from the database. Use with extreme caution! All data in the table will be lost permanently.


Syntax: `DROP TABLE TableName;`


Example: `DROP TABLE Products;`


#### Data Manipulation Language (DML)


##### `INSERT INTO`


Used to add new records (rows) into a table.


Syntax:

`INSERT INTO TableName (Column1, Column2, ...) VALUES (Value1, Value2, ...);`


Example (Adding products to our Anarkali Bazaar `Products` table):


`INSERT INTO Products (ProductID, ProductName, Category, UnitPrice, StockQuantity)`

`VALUES (1, 'Kameez Shalwar', 'Clothing', 2500.00, 50);`


`INSERT INTO Products (ProductID, ProductName, Category, UnitPrice, StockQuantity)`

`VALUES (2, 'Peshawari Chappal', 'Footwear', 1800.50, 30);`


`INSERT INTO Products (ProductID, ProductName, Category, UnitPrice)`

`VALUES (3, 'Embroidered Shawl', 'Clothing', 1200.00);` -- StockQuantity will default to 0


##### `SELECT`


The most common SQL command, used to retrieve data from a database. You can select specific columns, all columns, filter results, sort them, and much more.


Basic Syntax:

`SELECT Column1, Column2 FROM TableName;`

`SELECT * FROM TableName;` -- Selects all columns


Filtering with `WHERE` Clause:

Used to specify criteria to filter the records returned.


Syntax: `SELECT Column1, Column2 FROM TableName WHERE Condition;`


Operators used in `WHERE` clause:

* `=` (Equal to)

* `!=` or `<>` (Not equal to)

* `>` (Greater than)

* `<` (Less than)

* `>=` (Greater than or equal to)

* `<=` (Less than or equal to)

* `AND`, `OR` (Logical operators to combine conditions)

* `LIKE` (For pattern matching)

* `BETWEEN` (Range search)

* `IN` (Multiple possible values)


Sorting with `ORDER BY` Clause:

Used to sort the result set in ascending (`ASC`) or descending (`DESC`) order.


Syntax: `SELECT ... FROM TableName ORDER BY Column1 [ASC|DESC], Column2 [ASC|DESC];`


Keywords and Functions for `SELECT`:

* `DISTINCT`: Returns only unique values for a specified column.

* Aggregate Functions:

* `COUNT()`: Returns the number of rows that match a specified criterion.

* `SUM()`: Calculates the sum of a numeric column.

* `AVG()`: Calculates the average value of a numeric column.

* `MIN()`: Returns the smallest value of a selected column.

* `MAX()`: Returns the largest value of a selected column.


Worked Example 3: Querying the WAPDA Customer Database (Fictional)


Let's assume a simplified WAPDA `Customers` table with `CustomerID`, `CustomerName`, `Address`, `City`, `UnitsConsumed`, `BillAmount`, `IsPaid`.


* Retrieve all customer details:

`SELECT * FROM Customers;`


* Get names and cities of customers who haven't paid their bill:

`SELECT CustomerName, City FROM Customers WHERE IsPaid = FALSE;`

(Or `WHERE IsPaid = 0;` if using integer for boolean)


* Find customers in Lahore who consumed more than 300 units:

`SELECT CustomerName, UnitsConsumed FROM Customers WHERE City = 'Lahore' AND UnitsConsumed > 300;`


* List all distinct cities where customers reside:

`SELECT DISTINCT City FROM Customers;`


* Find the total number of unpaid bills in Karachi:

`SELECT COUNT(*) FROM Customers WHERE City = 'Karachi' AND IsPaid = FALSE;`


* Calculate the average bill amount for all customers:

`SELECT AVG(BillAmount) FROM Customers;`


* Find the customer with the highest units consumed:

`SELECT CustomerName, UnitsConsumed FROM Customers ORDER BY UnitsConsumed DESC LIMIT 1;`

(Note: `LIMIT 1` is specific to some SQL dialects like MySQL/SQLite. Other databases might use `TOP 1` or `ROWNUM <= 1`)


* Find products whose names start with 'P' (from our Anarkali Bazaar example):

`SELECT ProductName FROM Products WHERE ProductName LIKE 'P%';`


##### `UPDATE`


Used to modify existing records in a table.


Syntax:

`UPDATE TableName SET Column1 = NewValue1, Column2 = NewValue2 WHERE Condition;`


Example (Updating an unpaid WAPDA bill to paid):


`UPDATE Customers SET IsPaid = TRUE WHERE CustomerID = 12345;`


Example (Updating stock quantity in Anarkali Bazaar):


`UPDATE Products SET StockQuantity = StockQuantity - 5 WHERE ProductID = 1;` -- Reduce stock for Kameez Shalwar


##### `DELETE`


Used to delete existing records from a table.


Syntax:

`DELETE FROM TableName WHERE Condition;`


Example (Deleting a customer who moved out of the city):


`DELETE FROM Customers WHERE CustomerID = 67890;`


Example (Deleting all products with zero stock):


`DELETE FROM Products WHERE StockQuantity = 0;`


Caution: If you omit the `WHERE` clause in an `UPDATE` or `DELETE` statement, it will affect *all* records in the table. Always be careful with these commands!


Conclusion


You've covered a lot of ground today! From understanding the fundamental difference between simple files and powerful databases to learning about the building blocks like tables, records, fields, and the all-important primary keys. We also discussed why choosing the right data type is crucial and how SQL is the language you use to interact with databases – to define structures, add data, retrieve information, and make changes. Databases are at the heart of almost every digital system we use daily, from banking to social media, making these concepts incredibly valuable for your journey in Computer Science. Keep practicing your SQL queries, and you'll soon be a database wizard!

Key Points to Remember

  • 1Databases organise data efficiently, reducing redundancy and inconsistency compared to file systems.
  • 2A **DBMS** (Database Management System) is software used to manage databases, providing features for security, integrity, and access.
  • 3Data is organised into **tables** (relations), composed of **records** (rows) and **fields** (columns).
  • 4A **primary key** uniquely identifies each record in a table, must be unique and not null. A **foreign key** links tables by referencing a primary key in another table.
  • 5**Data types** (Text, Number, Date/Time, Boolean) define the kind of data a field can store, impacting storage, validation, and performance.
  • 6**SQL (Structured Query Language)** is used to interact with databases, including DDL (Data Definition Language) for defining structure and DML (Data Manipulation Language) for data manipulation.
  • 7Key SQL DDL commands include `CREATE TABLE`, `ALTER TABLE`, `DROP TABLE`.
  • 8Key SQL DML commands include `INSERT INTO` (add records), `SELECT` (retrieve records, with `WHERE`, `ORDER BY`, aggregate functions), `UPDATE` (modify records), and `DELETE` (remove records).

Pakistan Example

Managing WAPDA Electricity Bills

Imagine WAPDA (Water and Power Development Authority) needing to manage millions of customer records, including their meter readings, bill amounts, payment statuses, and addresses. A robust database system, rather than individual files, is essential to handle this vast amount of data, process payments accurately, generate reports on consumption, and ensure customer service efficiency across Pakistan.

Quick Revision Infographic

Computer Science — Quick Revision

Databases & Data Storage

Key Concepts

1Databases organise data efficiently, reducing redundancy and inconsistency compared to file systems.
2A **DBMS** (Database Management System) is software used to manage databases, providing features for security, integrity, and access.
3Data is organised into **tables** (relations), composed of **records** (rows) and **fields** (columns).
4A **primary key** uniquely identifies each record in a table, must be unique and not null. A **foreign key** links tables by referencing a primary key in another table.
5**Data types** (Text, Number, Date/Time, Boolean) define the kind of data a field can store, impacting storage, validation, and performance.
6**SQL (Structured Query Language)** is used to interact with databases, including DDL (Data Definition Language) for defining structure and DML (Data Manipulation Language) for data manipulation.
Pakistan Example

Managing WAPDA Electricity Bills

Imagine WAPDA (Water and Power Development Authority) needing to manage millions of customer records, including their meter readings, bill amounts, payment statuses, and addresses. A robust database system, rather than individual files, is essential to handle this vast amount of data, process payments accurately, generate reports on consumption, and ensure customer service efficiency across Pakistan.

SeekhoAsaan.com — Free RevisionDatabases & Data Storage Infographic

Test Your Knowledge!

8 questions to test your understanding.

Start Quiz