Information Representation
How computers store numbers, text, images, and sound using binary.
At its most fundamental level, a computer only understands two states: on and off, represented by the digits 1 and 0. Each of these digits is called a bit (binary digit), and it is the smallest unit of data. All information—from numbers and text to complex images and sounds—must be converted into a binary format to be processed and stored. Bits are typically grouped into an 8-bit unit called a byte.
### Number Systems
To represent numbers, we use different bases. Humans use denary (base-10), but computers use binary (base-2).
* Binary (Base-2): This system uses only two digits, 0 and 1. Each position in a binary number represents a power of 2, starting from the right (2⁰, 2¹, 2², etc.). For example, the binary number 1011 is converted to denary as: (1 × 2³) + (0 × 2²) + (1 × 2¹) + (1 × 2⁰) = 8 + 0 + 2 + 1 = 11.
* Hexadecimal (Base-16): Binary numbers can be long and difficult for humans to read. Hexadecimal is used as a compact, human-friendly representation of binary. It uses digits 0-9 and letters A-F (representing 10-15). Each hexadecimal digit corresponds to a 4-bit binary sequence (a nibble). For example, the binary number `1110 0101` can be grouped into `1110` (E in hex) and `0101` (5 in hex), making it E5 in hexadecimal. This system is commonly used in memory addressing and defining colours in web design (e.g., #FF0000 for red).
* Two's Complement: To represent negative integers, computers use two's complement. The Most Significant Bit (MSB) indicates the sign (1 for negative, 0 for positive). The process to find the two's complement of a negative number is:
For example, to represent -6 in 8 bits: Positive 6 is `00000110`. Invert it to get `11111001`. Add 1 to get `11111010`.
### Text Representation
* ASCII (American Standard Code for Information Interchange): An early standard using 7 bits (later 8 bits in Extended ASCII) to represent 128 (or 256) different characters, including English letters, numbers, and common symbols. Its main limitation is its inability to represent characters from other languages.
* Unicode: A modern, universal character encoding standard that can represent almost every character from every writing system in the world. UTF-8 is the most common Unicode encoding, using a variable number of bytes per character, making it space-efficient and backward-compatible with ASCII.
### Image Representation
* Bitmap Images: A bitmap image is stored as a grid of tiny dots called pixels. For each pixel, a binary value is stored that represents its colour. The number of bits used per pixel is called the colour depth. A 1-bit colour depth can only represent 2 colours (e.g., black and white), while a 24-bit colour depth (True Colour) can represent over 16.7 million colours (2²⁴).
* File Size Calculation: The size of a bitmap image can be calculated with the formula:
File Size (bits) = Image Width (pixels) × Image Height (pixels) × Colour Depth (bits). This does not include metadata (additional data like creation date or camera settings) stored in the file header.
### Sound Representation
Sound is an analogue wave. To be stored on a computer, it must be digitised through a process called sampling. An Analogue-to-Digital Converter (ADC) takes measurements (samples) of the sound wave's amplitude at regular intervals.
* Sampling Rate: The number of samples taken per second, measured in Hertz (Hz). A higher sampling rate (e.g., 44,100 Hz for CD quality) results in a more accurate digital representation and better sound quality.
* Sample Resolution (Bit Depth): The number of bits used to store each sample. Higher sample resolution allows for a more precise representation of the wave's amplitude.
* File Size Calculation: The size of a sound file is calculated by:
File Size (bits) = Sampling Rate (Hz) × Sample Resolution (bits) × Duration (seconds) × Number of Channels (e.g., 1 for mono, 2 for stereo).
### Data Compression and Encryption
* Compression: This is the process of reducing file size.
* Lossless Compression: Reduces file size without losing any original data. When the file is decompressed, it is an exact replica of the original. This is essential for text files and program code. A simple method is Run-Length Encoding (RLE), which replaces repeated sequences of data with a count and a single data value (e.g., `AAAAA` becomes `5A`). Common formats include PNG and ZIP.
* Lossy Compression: Reduces file size by permanently removing data that is considered non-essential to human perception. This offers much higher compression ratios but with a loss of quality. It is widely used for images (JPEG), audio (MP3), and video (MPEG).
* Encryption: This is the process of converting plaintext (readable data) into ciphertext (unreadable data) using an algorithm and a key. This ensures data confidentiality, preventing unauthorised access. Symmetric encryption uses the same key for both encryption and decryption, while asymmetric encryption uses a public key to encrypt and a private key to decrypt.
Key Points to Remember
- 1All computer data is fundamentally stored in binary (bits and bytes), with hexadecimal used as a human-friendly shorthand.
- 2Two's complement is the standard method for representing positive and negative integers in binary.
- 3Text is encoded using character sets like ASCII for English and the more comprehensive Unicode for global languages.
- 4Bitmap images are composed of pixels, with file size determined by resolution (width x height) and colour depth.
- 5Analogue sound is digitised through sampling, and its quality depends on the sampling rate and sample resolution.
- 6Lossless compression (e.g., RLE, ZIP) reduces file size with no data loss, perfect for text and code.
- 7Lossy compression (e.g., JPEG, MP3) achieves greater size reduction by permanently removing non-essential data.
- 8Encryption transforms plaintext into ciphertext using a key to protect data from unauthorised access.
Pakistan Example
Urdu Language Representation with Unicode
Early computing systems in Pakistan were limited by the **ASCII** character set, which could not represent the characters of the Urdu alphabet (Nastaliq script). The global adoption of **Unicode** was a critical development, allowing software, websites (e.g., government portals, news sites like Jang), and mobile keyboards to correctly display and process Urdu text. This enabled the creation of digital services and content for millions of Pakistanis in their native language, a challenge that was impossible to solve with older, limited character sets.
Quick Revision Infographic
Computer Science — Quick Revision
Information Representation
Key Concepts
Urdu Language Representation with Unicode
Early computing systems in Pakistan were limited by the **ASCII** character set, which could not represent the characters of the Urdu alphabet (Nastaliq script). The global adoption of **Unicode** was a critical development, allowing software, websites (e.g., government portals, news sites like Jang), and mobile keyboards to correctly display and process Urdu text. This enabled the creation of digital services and content for millions of Pakistanis in their native language, a challenge that was impossible to solve with older, limited character sets.