Data Representation
Aim of the chapter
This chapter explains how data can be represented in binary form so it can
be stored in and processed by a computer.
Bits and bytes
At the most fundamental (and slightly simplified) level, today’s computers only recognise two
states: either there is an electric current running through a circuit or not. As a consequence, all information that is represented within a computer needs to be represented with the help of
these two states. Rather than referring to a state as ‘running current’ or ‘no running current’, computer science uses a 1 and a 0 for the two different possibilities. Hence the most basic unit of information is the binary digit (referred to as a bit of information). A bit of information can contain either a 1 or a 0. A collection of eight bits of information is generally referred to as one byte, and memory size is generally measured in the number bytes of information a computer can store. The memory size of our early computers was measured in 210 bytes.
Since 210 = 1024, this was rounded down to 1000, and 1024 bytes were actually referred to (not quite correctly) as one kilobyte. Accordingly a machine with a capacity of
16 kilobytes had 16,384 bytes. We now talk about memory size in terms of :
• Megabyte: 1 megabyte = 220 bytes =1,048,576 bytes
• Gigabyte: 1 gigabyte = 230 bytes = 1,073,741,824 bytes.
Now that we know that all information that we feed into a computer needs
to be translated into a code of 0s and 1s, i.e. into binary digits (bits), the
following sections explain how text and numbers can be encoded in this
way.
A common way to represent text is to agree on a unique code for every symbol (e.g. for every letter, punctuation mark or number) that needs to be presented. Each code consists of a fixed-length sequence of bits (but obviously the sequence is different for each code). A word can then
be ‘written’ by determining the code for each letter and stringing them together. The most well-known code that is used in this way is the ASCII (American Standard Code for Information Interchange).
It uses 8-bit strings to represent the English alphabet. You can find an overview of the ASCII code in Appendix A of Brookshear (2009) or you can look it up on the internet.
You should then be able to encode simple text, as in the following example.
The files we refer to as text files generally contain long sequences of code (mostly ASCII) as described above. A simple text editor can translate the code and make it readable to us (and the other way around).The programming language Java, which you will be introduced to in
Part 2 of this guide, uses the Unicode standard for the representation of characters. Unicode provides 16 bits for the representation of a character (rather than 8), and therefore the Unicode character set is much larger than the ASCII set. The Unicode character set, therefore, also includes characters from alphabets other than English (e.g. Chinese and Greek).
Note that word-processor files on the other hand are more complex.
Encoding schemes such as ASCII and Unicode only address the encoding
of the text. They are not suitable for the formatting information a word
processor provides.
Representing numbers – the binary system
When you look at an ASCII (or Unicode) table, you will find that it also includes codes for the representation of numbers. However, this is not an ideal way of representing numbers if they are used within a calculation. Also, within ASCII we would be using 8 bits to represent a single-digit number, and the largest number we could store would be 9. However, if we represent all numbers as binary numbers (rather than decimal) we can actually represent the numbers 0 to 127 with 8 bits. We also have the advantage that a number is translated into a different numeric system (i.e. the binary system) where it remains a number and where hence mathematical calculations can still be carried out (in ASCII a number is merely a symbol in the same way as a letter is a symbol).
Binary numbers are represented using bit sequences. The representation follows the same simple principle as the decimal system. So you need to remind yourself of these underlying principles:
The decimal system uses 10 as a base, and the 10 digits available are 0,
1, 2, 3, … , 9. Depending on its position in the whole number, each digit
represents the amount of 1s, 10s, 100s, 1000s, etc. (i.e. 100s, 101s, 102s,103s, etc.).
For example, the decimal number 9348 represents 9×103 + 3×102 + 4×101+ 8×100
.
The binary system uses 2 as its base with the digits 0 and 1. Let’s look at
110100 as an example of a binary number (in order to avoid confusion
the base of a number may be indicated using the following convention:
1101002
= 0×20 + 0×21 + 1×22 + 0×23 + 1×24 + 1×25 = 0 + 0 + 4 + 0 +
16 + 32 = 5210
In summary, what you need to be able to understand is that the value that is represented by a digit depends on the digit’s position within the number and the base of the numeric system used. With every move of a digit to the left the value represented increases by a power of the base. For the decimal system this means digits – from right to left – need to be multiplied by 1, 10, 100, 1000, etc. For the binary system, digits – again from right (the least significant bit) to left (the most significant bit) – need to be multiplied by 1, 2, 4, 8, 16, etc. to calculate their value.
As part of the explanation above you have been shown how a binary number can be converted into a decimal number.You also need to be able to convert a decimal number into a binary number. This can be done by:
• dividing the decimal number by 2 and recording the quotient and the remainder
• continuing to do this with every resultant quotient until the quotient is 0.
The backwards sequence of the recorded remainders is then the binary representation of the original decimal number.
Let us illustrate this with our example of 52:
52 : 2 = 26 remainder 0
26 : 2 = 13 remainder 0
13 : 2 = 6 remainder 1
6 : 2 = 3 remainder 0
3 : 2 = 1 remainder 1
1 : 2 = 0 remainder 1
Hence 5210 = 1101002
You should now have an understanding of the underlying ideas of the
binary system and how to convert from decimal into binary and vice versa.
Binary addition
The next step is to learn how simple mathematical calculations are carried
out within the binary system. We will start with the addition of two
positive binary numbers, and again it is a matter of mapping the procedure
we are familiar with within the decimal system onto the binary system. An
example is probably the best way to explain how it works. Let us add 52
and 49.
In the binary system we would calculate by adding the units, noting the
overflow under the 10s, then adding the 10s and noting the overflow
under the 100s, and finally adding the 100s (which in this case only
consist of the overflow).
100s 10s 1s
5 2
+ 4 9
1 1
1 0 1
We will now do the same addition in the binary system. We already know
that 5210 = 1101002, and you should be able to work out now (Try it!)
that 4910 = 1100012. We will follow the same approach as in the decimal
system above. All you need to think about is what causes an overflow
when adding binary digits:
0+0 = 0
1+0 = 0+1 = 1
1+1 = 10, i.e. there is an overflow of 1
1+1+1 = 11, i.e. there is an overflow of 1
64s 32s 16s 8s 4s 2s 1s
1 1 0 1 0 0
+ 1 1 0 0 0 1
Overflow: 1 1
1 1 0 0 1 0 1
If you now check our result of 11001012 by converting it into a decimal number, you should find that 10110 = 11001012
Binary subtraction
In order to subtract one binary number from another we can again use the
procedures we are familiar with from the decimal system. For example:
100112 – 11112
64s 32s 16s 8s 4s 2s 1s
1 0 0 1 1
− 1 1 1 1
Borrowed: 1 1
Result: 0 0 1 0 0
So we start from the least significant bits on the right where we deduct 1
from 1 and get a result of 0. The next column is also straightforward. In
the third and fourth columns we need to borrow 1 from the column on the
left.
However, it is quite difficult for a computer to carry out subtraction in this
(so familiar to us) way. A more computer-friendly approach that has been
proposed relies on the idea of turning the number to be subtracted into its
negative counterpart (referred to as its two’s complement) and then
adding this negative using ordinary addition. (Please note that although
you should be aware of the concept of representing the two’s complement
of a binary number, a detailed knowledge of how the two’s complement
can be determined is beyond the material covered in this course.)
A further well-known method that can be used to represent a negative
binary number is the sign and magnitude method. This method
follows the concept used within mathematics where negative numbers are
represented by using the minus sign as a prefix. As the binary numbers
used with computers do not have any extra symbols, an easy way to
represent a negative number is to agree that the left-most bit is just the
equivalent of a +/− sign, and the remainder of the number indicates the
value, i.e. the magnitude of the number:
• 0 indicates that the number is positive
• 1 indicates that the number is negative.
For example, in an 8-bit representation of binary numbers 000011002
would represent 1210 and 100011002
would be −1210.
Although the sign and magnitude method has been used in early computers and may still be used within the representation of floating-point numbers, the advantage of the two’s complement is evident: The two’s complement representation allows us (or more importantly the computer) to carry out subtraction through a combination of negation and addition. This means that a computer’s electronic circuits that are designed to carry out addition can also be used for subtraction. You will learn more about these circuits in the next section.