The human race, and even some animals, have an inate feeling for numbers. To survive, we had to size up the number of wolves in a pack or tell our friends the width of a river. Only with the advent of language did we really develop a systematic way of representing numbers. We are all familiar with at least one system, but there are others.
The Decimal System
How many dots are there here?
Thirteen, you say? Written in our western system, that would be 13. How did you arrive at that result? Before you even got started you had zero; quite a concept in itself. Then you went through the sequence 1, 2, 3, 4, etc. When you got to 9 you had exhausted all of the symbols of our number system.
But you didn't stop there. You added one more and arrived at 10. The fact that you reused the symbol 1, but positioned it further left is, of course very significant. It means 1 times 10. The position in which any digit lies signifies its order of magnitude in terms of powers of ten1. If we had one hundred and thirty four dots to count we would represent it as 134. That's (1 x 102) + (3 x 101) + (4 x 100). This is called a decimal, or base ten system.
The Binary System
Now, as was pointed out in Numbers, the fact that we use ten symbols and use a base ten system may be because we have ten fingers. But that shouldn't limit us to just one number system. Computers can't handle the base ten system internally very well: a hole is either punched in a card or not; a ferrite ring is either magnetized or not; a transistor is either sitched on or off. Logic gates have only 2 output states, known as bits. For a computer, when it's used up the 0 and the 1 it has to move on to the next position. This would suggest a base 2 or binary system2.
A computer would count the thirteen dots, as follows.
Whenever it exhausts all of the symbols on one position (and it only has 0 and 1 to choose from), it moves on to the next. 1101 is another way of saying (1 x 23) + (1 x 22) + (0 x 21) + (1 x 20) or 8 + 4 + 0 + 1.
A computer has no problem continuing this process. It handles 8, 16 or 32 or more bits as one convenient unit. For us, this begins to get unwieldy. For example 1234 in base ten is written as 10011010010 in binary. For writing, it is preferable to use a more efficient system.
The Base 16 or Hexadecimal System
Imagine we had 16 fingers. We'd need some more symbols in addition to 0 to 9. We can use A to F. The system is the same: we keep counting until we need to move to the next power of 16. Recounting our dots above, when we get to 7, 8, 9 we continue with A for ten, B for eleven, C for twelve, and finally D for thirteen. If we had to add, say, another 5, we would go through E and F to 10, 11 and 12.
Since 16 is a power of two, base 16 is a very convenient short hand for representing binary. Four bits' worth of binary can be replaced by one base 16 (hexadecimal) symbol. Taking our ungainly 10011010010 and starting from the right, or least significant bits, breaking into groups of four we get:
0100 1101 0010
Each group of four can be replaced by one hexadecimal digit, giving us:
This is often written as 0x04D2 or sometimes H04D2. With time these become as familiar as base ten numbers. 0x0CE8 is 1000. Anything with a single '1' is instantly recognized as a power of two.
0x0100 is 256. 0x1000 is 4096. 0xAAAA is often used as a test pattern in electronics because it is 1010 1010 1010 1010 in binary. You can even spell a few words. 0xDEAD 0xBEEF is sometimes used to fill uninitialized memory. Java byte code is identified with the magic number CAFEBABE.
With just 16 bits we can count from 0x0000 to 0xFFFF or 65,535. If we sacrifice half of that positive range, we can count from -32768 to +32767. An analogy is the trip odometer in a car. It has a limited number of digits and no leading plus or minus. If you reset it to 000 and then drove backwards for one mile it would show 999. If none of your trips was ever more than about 500 miles, but you drove some trips forwards and some backwards, you could have a convention that says 'anything less than 499 was driven forwards, and anything greater than 500 must have been a backwards trip.' You have to do a little bit of simple math to find out exactly how far the backwards trips were. In our analogy, 999 is really -1: or the 1000 complement. In binary we use something called the 2's complement.
The high-order bit is referred to as the sign bit, although it isn't dedicated to just indicating plus and minus. It actually carries some weight. To continue with the odometer analogy: if you wanted to transfer your negative miles to a similar display with more digits, you'd have to be careful to fill the new, high order digits with 9s. For example 998 (-2) in the three digit display would become 999998 in a six digit display. In binary or hexadecimal, if we move a signed number from a 16 to a 32 bit register we have to be sure to propagate the sign bit.
Instead of restricting ourselves to 0-9 or even 0-9 and A-F, we enlist 0-9, a-z and A-Z and a couple more characters. That's a total of 64 different symbols. It's like having 64 fingers to count on:
|Base64 Encoding||A-Z||a-z||0-9||+ /|
It doesn't seem very exciting at first, but it does have its uses.
Computers manipulate characters as well as numbers. Actually, each character has a numeric code.
A space might be 32, an 'A' 65, a 'B' 66, etc. With eight bits (one byte) we can represent all of the characters needed for most western languages - as many as 256. Not all of these codes represent printable characters: there are tabs and backspaces and some special control codes like ESC, STX, ETX, SYNC and others. Some applications, especially Internet communications, can only handle a restricted range of printable characters3. While they can digest 0x20, 0x41, etc, they would choke on combinations like 0x01 or 0xFF. But to transmit a picture, or some executable code, we need to be able to transmit any of those 256 bit combinations in every byte.
This is where a number system like base 64 comes in. Suppose we need to transmit a block of binary that contains non-printable characters. We have to transmit them transparently.
Let's say we want to send the following sequence, where none of the bytes are printable characters:
0x02 0x07 0x7F
In binary the same sequence is:
0000-0010 0000-0111 0111-1111
We could transmit each nibble (half a byte) as a printable character. That is, we could send a literal '0', '2', '0', '7', etc. This takes 8 bits to transmit just 4 bits of data. Base 64 is used instead. One base 64 digit can represent six bits' worth of binary: 26 is 64. The resulting transmission size is 4/3 of the original.
We break the binary to be transmitted into six-bit pieces, starting from the left:
000000 100000 011101 111111
We then replace each six-bit sequence with its corresponding Base64 value4. We arrive at this four character string:
This will safely wend its way through the Internet. The receiver reverses the process.
Base64 is also used as a very primitive way of hiding passwords 5 in the 'http' protocol.
The possibilities are endless
Another common system is octal - base 8. This was popular when computers were very limited in memory and addressing capabilities. In Britain we used to have 12 pennies in a shilling. This wasn't strictly a duodecimal system, because we used the decimal system to talk about eleven and twelve pence. However, in the late 19th Century a group of reformers suggested a true duodecimal system whereby the symbols T and E would have been used. Although the number 12 really made things hard for computers, programmers and mechanical adding machines, it did have some advantages. You could take any number of shillings and divide it into a sixth, a third or a quarter with no fractions. You can't do that with decimal money. If only we had evolved with twelve fingers...