Data & Databases
Organising raw data into structured databases and using SQL to retrieve meaningful information.
1. From Data to Information
In science and everyday life, we encounter vast amounts of raw facts and figures. This is data. For example, a list of daily temperature readings for a patient: 38.5, 38.9, 39.1, 38.6. By itself, this data has limited meaning. When we process this data—by plotting it on a graph to show a fever trend—it becomes information. Information is data that has been processed, organised, and structured to provide context and meaning, allowing us to make decisions.
* Data: Raw, unprocessed facts (e.g., a single blood glucose reading of 180 mg/dL).
* Information: Processed data with context (e.g., a report stating the patient's blood glucose is 'High', based on the reading of 180 mg/dL, and comparing it to previous trends).
2. Fundamental Data Types
To store data efficiently in a computer system, we must define its type. Each data type has specific properties and uses a certain amount of memory.
* Integer: Whole numbers, positive or negative, with no decimal part. Used for countable items.
* *Example:* `Number_of_Chromosomes = 46`
* Real / Float: Numbers that can have a fractional part (decimal numbers). Essential for measurements.
* *Example:* `Patient_Temperature = 37.5` (SI unit: Kelvin, but Celsius is common in medical contexts).
* Character (Char): A single letter, number, or symbol enclosed in single quotes.
* *Example:* `Blood_Group_Letter = 'A'`
* String: A sequence of characters enclosed in double quotes. Used for text.
* *Example:* `Patient_Name = "Fatima Jilani"`
* Boolean: Represents one of two possible values: True or False. Ideal for yes/no questions.
* *Example:* `Is_Vaccinated = True`
* Date/Time: Stores date and time values in a specific format.
* *Example:* `Date_of_Birth = 05/08/1998`
Common Misconception: Storing data like a phone number (`03001234567`) or a CNIC number as an Integer is a mistake. Since you will never perform mathematical calculations on them and they can start with a zero, they should be stored as a **String** to preserve their format.
3. File and Database Storage Concepts
Data is stored in an organised manner. The basic building blocks are:
* Field: A single piece of data about an entity. It's like a column header in a table (e.g., `StudentName`, `Age`, `City`).
* Record: A complete set of fields for a single entity. It's like a single row in a table (e.g., all the data for one student).
* File: A collection of related records (e.g., a file containing all student records for a class).
* Primary Key (Key Field): A field that contains a unique value for each record. This ensures that every record can be identified without ambiguity. A student's Roll Number or a citizen's CNIC are perfect examples of primary keys.
While simple files can store data, they become inefficient for large, complex datasets. This is where databases excel.
A database is a structured collection of data, managed by a Database Management System (DBMS). A relational database is the most common type, organising data into tables that can be linked together.
Imagine a hospital system. We could have one table for `Patients` and another for `Appointments`.
`Patients` Table:
| PatientID (Primary Key) | PatientName | City |
|-------------------------|----------------|----------|
| P101 | Ahmed Khan | Lahore |
| P102 | Sana Ali | Karachi |
`Appointments` Table:
| ApptID (Primary Key) | ApptDate | DoctorName | PatientID (Foreign Key) |
|----------------------|------------|-----------------|-------------------------|
| A501 | 2024-05-21 | Dr. Iqbal | P101 |
| A502 | 2024-05-22 | Dr. Yasmin | P102 |
| A503 | 2024-05-23 | Dr. Iqbal | P101 |
Here, the `PatientID` in the `Appointments` table is a foreign key. It creates a link back to the `Patients` table's primary key. This link, or relationship, allows us to query the database to find all appointments for Ahmed Khan without duplicating his name and city in the appointments table. This is a classic one-to-many relationship: one patient can have many appointments.
4. Querying Databases with Basic SQL
SQL (Structured Query Language) is the standard language for communicating with relational databases. It allows us to retrieve, insert, update, and delete data. For your level, you need to master retrieving data using the `SELECT` statement.
The basic structure of a query is:
`SELECT Field1, Field2 FROM TableName WHERE Condition;`
* `SELECT`: Specifies the columns (fields) you want to see.
* `FROM`: Specifies the table you are querying.
* `WHERE`: Filters the results based on a condition (this part is optional).
Step-by-Step Example: Finding all patients from Lahore.
- Identify the goal: We need the names of patients who live in Lahore.
- Identify the fields needed: We need the `PatientName`.
- Identify the table: The data is in the `Patients` table.
- Identify the condition: The `City` field must be equal to 'Lahore'.
Construct the SQL query:
```sql
SELECT PatientName
FROM Patients
WHERE City = 'Lahore';
Result:
| PatientName |
|-------------|
| Ahmed Khan |
Example 2: Finding all appointments with Dr. Iqbal.
- Identify the goal: Find the date for every appointment with Dr. Iqbal.
- Identify the fields needed: `ApptDate`.
- Identify the table: `Appointments`.
- Identify the condition: `DoctorName` must be 'Dr. Iqbal'.
Construct the SQL query:
```sql
SELECT ApptDate
FROM Appointments
WHERE DoctorName = 'Dr. Iqbal';
Result:
| ApptDate |
|------------|
| 2024-05-21 |
| 2024-05-23 |
Databases and SQL are crucial in modern biology for managing everything from clinical trial data and patient records in hospitals like Shaukat Khanum to genomic sequences and ecological data for conservation projects in the Northern Areas of Pakistan.`SELECT * FROM students WHERE grade = 'A';` — retrieve records
- `INSERT INTO students VALUES (1, 'Fatima', 'A');` — add a record
- `UPDATE students SET grade = 'B' WHERE id = 1;` — modify a record
- `DELETE FROM students WHERE id = 1;` — remove a record
Data validation: checks data is reasonable (range check, type check, presence check, length check). **Verification:** checks data was entered correctly (double entry, proofreading).
Key Points to Remember
- 1Integer, real, string, Boolean, date/time — key data types
- 2Record = full entry; field = one item; key field = unique ID
- 3SQL: SELECT, INSERT, UPDATE, DELETE
- 4Validation checks data is reasonable; verification checks it was entered correctly
Pakistan Example
NADRA — Pakistan's National Database and Data Management
NADRA (National Database and Registration Authority) manages 220+ million Pakistanis' records. Each citizen has a unique CNIC (key field). Fields include name, date of birth, address, and fingerprint data. NADRA uses relational databases to link CNIC to passport, driving licence, and tax records — real-world database design on a national scale.
Quick Revision Infographic
Computer Studies — Quick Revision
Data & Databases
Key Concepts
Formulas to Know
key field = unique IDNADRA — Pakistan's National Database and Data Management
NADRA (National Database and Registration Authority) manages 220+ million Pakistanis' records. Each citizen has a unique CNIC (key field). Fields include name, date of birth, address, and fingerprint data. NADRA uses relational databases to link CNIC to passport, driving licence, and tax records — real-world database design on a national scale.