Many people assume that they understand how mail gets delivered to their homes. It's simple, right? The person sending the letter puts it in an envelope, writes an address on the front, attaches a stamp, and drops it in a slot. Eventually, a postal service employee puts the letter into the recipient's own mailbox. The apparent simplicity of the system, however, is misleading. In reality, very few people know the details of how the postal service gets a letter from Slot A to Box B.
The same is true with data sent over the Internet. We see the start and finish as we download mail, view web pages, and send instant messages. But most of us don't understand -- and don't need to understand -- how the information actually gets from one computer to another. This discussion will cover some of the basics of this process, and explain how network address translation (NAT) systems have helped shore up the Internet's creaky address system to allow more people to use it more effectively.
As with “snail mail,” it all comes down to addressing.
A protocol is a set of procedures that explains how a person or system should respond to a specific situation. Internet protocol (IP) is such a set of procedures, providing rules for the way computers deal with information sent and received over the Internet. The current rules of IP provide two main functions: data delivery (or routing) and fragmentation. In addition to these, minor services include timeouts, prioritization, source routing, and route tracing.
Data delivery is probably the most important feature of IP. For the IP routing system to work, each computer on a network must be assigned a unique address that distinguishes it from every other machine on the network. With the current protocol, an address consists of four 8-bit values, for a total of 32 bits in each address. When represented in decimal form, an IP address appears as four numbers separated by dots. Since the range of an 8-bit value is from 0 to 255, the lowest possible address -- theoretically -- is 0.0.0.0 and the highest is 255.255.255.255.
When information is sent over a network, it is broken up into chunks of data called packets. A packet is equivalent to a fragment of a letter (containing just part of a longer message) which is put into its own envelope and shipped off to a recipient. Each IP packet contains a header which specifies the address of the machine sending the data and the address of the machine intended to receive the data.
The biggest IP network in existence, the Internet, is a vast web of connected machines. Rarely does information on the Internet go directly from one computer to another. Instead, it usually makes a series of hops from one node to another, eventually getting to its intended destination. Routers at these nodes examine the addresses on the packets and choose a path for the packet to travel. When enough of these packets reach their destination, they are reassembled and the message -- in the form of a web page, file download, e-mail message, and so on -- is displayed.
Back in mid-70s, when Vinton Cerf, Jon Postel and Danny Cohen dreamed up the internet protocol, 32-bit addresses seemed like a pretty good idea. Four 8-bit values provides a range of almost 4.3 billion unique addresses -- 4,294,967,296 to be exact -- and this seemed tantamount to overkill to the architects of Arpanet, the precursor to today's Internet. As Vint Cerf muses:
“How the Internet Came to Be,” by Vinton Cerf, as told to Bernard Aboba
Today, with about half of the available addresses already allocated, experts anticipate that the pool of free addresses will run dry before the end of the decade. This curious predicament has been referred to as “the great IP crunch of 2010,” or even more ominously, “the next Y2K.”
The Internet Engineering Task Force has a proposed solution for the problem: a new protocol called IP version 6 (IPv6). Determined not to make the same mistake again, the engineers of IPv6 took overkill to a new level. IPv6 uses 128-bit addressing, which provides roughly 3.4 x 1038 addresses (that's 340,000,000,000,000,000,000,000,000,000,000,000,000) -- or 340 trillion addresses for every cubic centimeter of the earth's volume (1.087 x 1027 cm3). Chances are, once IPv6 is finally implemented, we won't be running out of addresses any time soon.
The problem is, IPv6 is still quite new. The Internet Assigned Numbers Authority green-lighted its regional agents to begin assigning IPv6 addresses in July of 2000, but the resulting system of machines is essentially cut off from the rest of the networked world. Full implementation of IPv6 entails a complete retrofitting of all existing networking hardware and software, which is no small task. Steve Deering, one of the principle designers of IPv6, had the following to say about IPv6: “It's quite possible it won't happen. It's conceivable that we will just continue to do short-term hacks and band-aid whatever is required to keep living with IPv4.”
Whether IPv6 ever gets fully implemented, or whether it is eventually abandoned, the Internet will need to continue routing data. In the meantime, it is necessary to live with the current IP system, and the current hardware and software infrastructure.
It's important to underscore the point that the Internet protocol requires a “routable” IP address to send and receive information. An IP address is routable if it is visible to other machines on the network and if it is unique to that specific machine -- within the bounds of the network. The IP system itself doesn't care whether a network is public or private. So the IP address problem only applies when computers are dealing with each other within the greater world of the Internet, and not in the much smaller world of an institutional intranet.
If an organization has ten computers connected to each other over an IP network, these ten computers must each be identified by a unique identifier. If these machines aren't connected to the rest of the world -- the Internet -- it would not matter which specific ten of the 4.3 billion possible IP addresses were chosen for these computers. If, one day, the system administrator decided to give all ten machines access to the Internet (in addition to each other), it would be necessary to assign each computer an IP address that is unique to that machine and that machine alone within the entire Internet.
The following analogy makes the situation even simpler. Consider a hypothetical office with ten employees. Inexplicably, these employees don't have a need to telephone anyone but each other. Their phone system is a closed network, and they have a great degree of flexibility when assigning phone numbers. They still have to follow the convention of (###) ###-####, but within these boundaries they can get creative. One employee might prefer the easy-to-remember (012) 345-6789; another might want to be (666) 666-6666. It wouldn't matter if someone else in Honolulu or Duluth had an identical telephone number. It's inconsequential to discuss which prefixes are valid in their particular town, or which area code they're in. Since their system can't see or be seen from the outside world, these are all non-issues.
Consider, now, that the employees one day decide that they want to be able to call people in their own town, as well as Honolulu, Duluth, and the rest of the world. To do this, they have to work with their telephone company to modify their phone system so that each of them has a public, unique phone number. They are bound by area code and local prefix restrictions. What's more, they have to pay for the right to use the ten telephone numbers they end up with.
Actually, the telephone system managers in this hypothetical scenario have another option. Suppose they want to save money by purchasing only a few “outside lines” for the office. Or maybe they want incoming calls to all be handled by an operator, so those pesky folks in Duluth can't call their workers directly. Whatever the reason, they could set up the system so that the office has only a few (say, three) public telephone numbers that are shared by all of the employees. Incoming calls would be routed using an employee's private extension number. Outgoing calls would go through one of the three outside lines. Employees would be protected from outside annoyances, and the office would save money on telephone lines. It's a great deal all around.
Computer network administrators have a similar option, which produces similar results. It's called “network address translation.”
One of the “short-term hacks” referred to by Steve Deering is a tricky scheme called network address translation (NAT). Supply and demand isn't the only issue involved with obtaining IP addresses today. Like telephone lines, addresses also cost money. Also, some network administrators consider public IP addresses a security risk. If a computer is visible to everyone on the Internet, it may be vulnerable to attack.
A network address translation device allows an organization to use private IP addresses (analogous to private telephone extensions) for communication within an internal network, and to share a small pool of public IP addresses (analogous to an outside line) when communicating on an external network such as the Internet. The NAT performs the conversion transparently and in real time, so all internal users get access to external services. The users behind the NAT can see the outside world, but at the same time, the users are protected from prying eyes because all communication with the Internet seems to come directly from the machine doing the translation.
NAT devices are often implemented along with other network protections such as firewalls and web proxies to help keep user's data safe from potential threats. When configured correctly, these various schemes can allow for safe, economical Internet access for users on any size network.
Eventually, IPv6 will eliminate the need for network address translation by providing a virtually endless supply of routable addresses for the ever-expanding Internet. Until this happens -- and it won't be happening anytime soon -- we have network address translation to help us extend our currrent supply of IP addresses. For a “short-term hack,” it's not a bad deal.