Physical Encodings
We now look at the actual encodings of bits on the wire. The simplest encoding would be to use 0 V for a zero and a positive voltage 1 V, say, for a one. This has a number of problems:
- How can we distinguish between a free network (nobody transmitting) and somebody transmitting a stream of zeros? They look the same electrically.
- The sender and receiver need to be accurately synchronized to the starts and ends of bits, since with a long strings of ones or zeros they could drift out of step. This means that, for example, the sender might send 1000 ones, but the receiver could think that it was only 999 ones.
- A long stream of ones would be encoded as a steady 1 V on the wire, namely a DC voltage. This is something that gives electrical engineers great difficulties, as it is much easier to connect together equipment that has average voltage 0, e.g., an AC voltage.
- A similar problem happens with the lasers in an optical system: it is not too good to have continuous laser beams representing a stream of ones.
Engineers prefer an encoding where the average voltage is about zero. Thus they use encodings of the data stream that approximate this even if the data is all ones or all zeros.
So Ethernet and 802.3 use a Manchester encoding (Figure below). This chops the time interval for a bit into two halves: a one is represented by a high voltage in the first half and a low voltage in the second. A zero is represented by the reverse: a low voltage in the first half and a high voltage in the second. The actual voltages used are +0.85 V for high and -0.85 V for low. On average, the voltage is zero. There is an extra advantage that this signal is easy to synchronize with: a transition through zero is always the middle of a bit.
Using 0.85 V is also a compromise. Smaller voltages require less power, but are more prone to interference from noise from the surrounding electrical environment.
There is one drawback. The bandwidth that the Manchester encoding requires is twice that of the simple encoding as the frequency of the signal is doubled. A 10Mb rate requires a 20 MHz signal. This is not so bad, but it is correspondingly worse with 100Mb and higher rates.
Cat 5 cable is rated to 100 MHz so we can't use Manchester encoding for 100Mb Ethernet. The encoding is more subtle. It uses a 4B/5B system, meaning that four data bits are encoded into five physical bits. Thus four consecutive zero bits 0000 become the five bits 11110, say. Physical representations are called symbols and they need not be binary valued. A couple of extra 5B patterns are used for frame start and end, and one for 'idle network' (to maintain synchronization).
So far this has only made it worse in terms of bandwidth. But now the encoding uses a three level (ternary) electrical encoding called MLT-3. MLT-3 is like Manchester in that it encodes 1 bits by transitions, but now the transitions are cyclically from 0 to 1, then 1 to 0, then 0 to -1, then -1 to 0, then 0 to 1, and so on. A 0 bit is encoded by no transition. The 4B/5B translations are such that each five symbol chunk has at least two transitions in the MLT-3 signal, which helps synchronization and also minimizes DC current. So 0000, with no transitions, becomes five symbols 11110, which has four transitions.
An example is the byte value 15, or hexadecimal 0E. This is translated nibble by nibble by the 4B/5B encoding to 11110 and 11100. Figure below shows how these might be encoded in signal transitions. The actual encoding depends on where we happen to be in the current cycle.
This runs at 31.25 MHz giving a symbol rate of 125 Mbaud. This is because the fastest electrical cycle happens with a stream of all one symbols (IDLE) and then we get a complete cycle every four transitions (0 to 1 to 0 to -1 to 0), giving us four symbols per cycle. Typically, though, we get frequencies of around 4/5 of this (25 MHz) as data symbols have four one symbols or fewer (Figure below).
A baud is the number of symbols per second, where a symbol represents some chunk of information. A symbol might represent a bit, or 2 bits, or 2/3 of a bit, and so on; 100Mb Ethernet requires a symbol rate of 125 Mbaud (efficiency 80%) due to 4B/5B encoding. Here one symbol encodes 0.8 bits.
The frequency a signal runs at has other implications, in particular the amount of electrical interference it produces. A system running at 25 MHz is friendlier to the electrical environment than one running at 31.25 MHz.
Gigabit Ethernet over copper uses a much more sophisticated 8B/10B encoding invented by IBM for its Fibre Channel network. It runs over five electrical levels (±2, ±1, 0) over all four pairs of wires in the cable in both directions simultaneously. The encoding, PAM-5, gives us 2 bits per symbol: four levels to encode 00, 01, 10, and 11, and one for error correction. This runs at 125 Mbaud, as before, so we have 2 bits x 125 Mbaud x 4 pairs = 1000Mb/s in both directions simultaneously.
Clever signal processing is required to disentangle the outward signal from the inward signal. The processors on a Gigabit Ethernet interface are about the complexity of a 486 microprocessor.
The 8B/10B encoding is also used in the serial ATA (SATA) disk interface and Digital Audio Tape (DAT).
As previously mentioned, 10Gb Ethernet is primarily optical. Optical systems only use binary encodings: either the laser is on or off so they do not have the option of multiple level encodings.