22 June 2003 Archives

Recovered from the Wayback Machine.

“What do you call yourself?” the Fawn said at last. Such a soft sweet voice it had!

“I wish I knew!” thought poor Alice. She answered, rather sadly, “Nothing, just now.”

“Think again,” it said: “that won’t do.”

Alice thought, but nothing came of it. “Please, would you tell me what you call yourself?” she said timidly. “I think that might help a little.”

“I’ll tell you, if you come a little further on,” the Fawn said. “I can’t remember here.”

There is something of Alice’s adventures in the Looking Glass about the Internet. In their book, “DNS and Bind, 4th Edition”, Paul Albitz and Cricket Liu used excerpts from “Through the Looking Glass and What Alice Found There” to preface all the chapters of their book; appropriate because their book is about the greatest mystery of the Internet: DNS, or the Domain Name System. The system that connects the address you type into a browser to the actual pages that load.

If you think on it, it’s pretty amazing to be able to go into a browser, type an address, and the same page shows up regardless of where you are in the world, and how you’re connecting to the Internet. More so when you consider that the page itself may move between different servers, and even different parts of the world. Consider your reading this page. Most likely you typed in the address for this weblog, weblog.burningbird.net in your browser address field, or you clicked on a link embedded in another page or in your own blogroll. Hopefully in a short period of time after you hit the Enter button, or clicked the link, this page showed up. You don’t have to know the physical location of the machine.

(Heck, I don’t have to know the physical location of the machine. Come to think on it, at this very moment I don’t know the exact physical location of this weblog.)

You probably do this type of activity every day — clicking a link or typing in the address of a weblog or other web page — and you’ve come to take it for granted that the web site loads, the page opens.

Problems happen of course. Perhaps you’ll get a server error saying that the server is down, or you might get an 404 error saying that the page can’t be found. The page might load slowly and even be garbled, or the styles look off. If you get this when trying to access my weblog, you’ll probably assume that something’s wrong with my server, or my pages, or maybe I’m playing around. Such things happen and you go on to other things.

What’s not as common, though, is that you might get a message from your browser saying it can’t identify the address you requested; depending on your browser, you might also be re-directed to a search page to try and locate this weblog.

If you typed in the address in your browser, you might check it for typos, carefully typing and re-typing the address again and again. If, instead, you clicked on a link embedded in a page, you might send a note to the page owner telling them the link is incorrect. Accessing the page through a known static link, such as in a blogroll, you might get even more frustrated, because how can a link work one moment, and not the next?

At this point, you might check to see if other addresses are having problems, and if all of them return the same error message, then you know something is wrong with the connection — something is wrong with the ‘DNS server’, is usually what people will say.

However, if all the other addresses load okay but mine, and the problem continues, you might get concerned and send me an email. But there’s a problem: my email address is the same as my weblog address, and your email server returns the email with a variation on the complaint your browser gave you — the address does not exist.

Like the Cheshire Cat in Alice, I and my pages will have effectively disappeared from the Internet; only the Google cache, like the Cat’s smile, remaining to once mark that I ever existed. Such is the fragile bubble on which a virtual community is based. Such is the dependency on the DNS.

DNS: The Story

For the Impatient: Show me what I need now!

Uniquely Me

Every location on the Internet is accessible through a specific network address called the IP address, IP standing in for Internet Protocol. For instance, the Burningbird Network Co-op has two unique IP addresses that map to a specific location (machine) on a specific network:

69.10.138.64
69.10.138.65

How these IP address break down and the future of IP we’ll leave for another “Internet for Poets’ essay, but for now know that if you type http://69.10.138.64 into a browser, at the time this was written, you’ll get to the Co-op’s dedicated server — even though I create the contents in St. Louis, the pages are on a server in Canada, and you are whereever you are.

Without having to go through the hassle and expense of registering a domain and mapping a domain address such as burningbird.net, you and I can agree that you typing in 69.10.138.64 will bring up my weblog pages. We can effectively bypass the DNS, go our own rebel ways. Unless the infrastructure of the Internet suddenly breaks down just as you click the link, the IP address to the physical location mapping is guaranteed.

Guaranteed…except…

Except if the ISP that manages the co-op’s dedicated server decides to do some network infrastructure changes and gives me two different IP addresses, something that can happen as network folks work to ensure even load balances on networks. Or the Co-op moves to a new ISP — perhaps in Australia because we’ve heard that the laws governing Internet content are quite liberal in Australia, and we’ve all decided to become Bloggers in the Buff.

If this occurs, when next you access 69.10.138.64, instead of getting Burningbird you get something like “Sharon’s House of Delights”, and though it might take you awhile to notice the difference, eventually you’ll realize that the IP address no longer maps to the physical location of this weblog. What’s worse is you have no way finding my current IP address to change your link. I am, to all intents and purposes, lost.

Of course you might try finding me in Google, typing Burningbird into the search field, and my weblog will show up in the list — at the old IP address. So you wait and wait and wait until you think my new address should show and try again, but I’m still at the old IP address because there are no links to my new location for Google to follow because no one knows where I am.

It’s not until some webbot comes along searching for content by random IP address rather than link, or I send out notices of the IP address change, do you have a chance to discover my new location. You then have to change all of your links, and if you’re a thoughtful — or obsessive/compulsive — weblogger, you have to change the links in all your pages whereby you’ve referenced posts in my weblog.

There’s got to be a better way, and there is: DNS.

DNS: An Early History

The earliest users of the Internet, back when it was part of a small experiment among researchers called the ARPANet, realized that using machine addresses to access each other’s work, and each other, wasn’t going to be effective and started keeping name-to-address mappings in a file called the hosts file. Every machine had a copy of this file and most still do — the co-op’s current hosts file contains the following in addition to other mappings:

127.0.0.1 localhost.localdomain localhost
69.10.138.64 burningbird.net
69.10.138.64 yasd.com

(The unique address of 127.0.0.1 is known as the loopback address, and it’s always defined to be the local machine. It’s through the localhost address [http://localhost] that you can access pages on your own computer if it’s running a web server. If you’re using Mac OS X, or Windows 2000, or Linux to access this page, chances are the computer you’re using is also running a web server.)

The entries in the hosts file for burningbird.net and yasd.com map the domains with the same IP address — 69.10.138.64. Typing in yasd.com will bring up the pages for this domain on the new server. Yet if you were to type in the burningbird.net domain name into your browser, at the type this was written, it would still show up on the old not the new Co-op server. The reason why is that the hosts file on the new server is local to that machine — the information contained in it has not been distributed, or propagated to the broader Internet community.

Again, back in the days of ARPANet, the community was so small that they would keep each other apprised of name-address mapping changes by uploading their local hosts file to a centralized HOSTS.TXT location, which contained a merged copy of all the data. The members would then download this file to their machines, usually on an average of twice a week.

As you can imagine, as the Internet grew, this situation became unworkable, for both performance and political reasons.

One potential problem with the old hosts system was name collision — for something like a name-address mapping to work, you needed some form of unique name as well as IP address. Something like burningbird.net. Or microsoft.com. Who’s going to decide the owner of one name or another? And how would the collision be resolved with the hosts information now dispersed across thousands of systems?

In addition, the centralization of one HOSTS.TXT to manage name/address mapping across the entire network placed a great burden on the centralized authority, the Standford Research Institute’s Network Information Center (known as the NIC). As the size of the Internet grew, trying to maintain consistency also became an issue. From DNS and BIND:

Maintaining consistency of the file across an expanding network became harder and harder. By the time a new HOSTS.TXT reached the farthest shores of the enlarged ARPAnet, a host across the network had changed addresses, or a new host had sprung up that users wanted to reach.

A solution was sought for these growing problems, and in the 1980’s a series of changes occurred that started to define the Internet we know today. A new communication protocol was invented called TCP/IP, making it even easier to get connected to the Internet; the infrastructure management of the Internet was taken over by NSF (National Science Foundation), and the beginnings of the InterNIC — an Internet authority — was born; and the old HOSTS.TXT system of propagating changes was replaced by the DNS.

How the DNS works

The newer system is based on the same concept of name-address mapping as the hosts file system, but with two major differences.

First, a name, or domain as they are called, had to be registered under a specific owner with the InterNIC before the owner could use the name. This prevented name collision, and also solved the political issue of who owned a name: he or she who got their first got the name. (This started its own problems as we were to learn at a later time.)

Secondly, the centralized hosts file access was replaced by an ingenuous distributed database of names, the DNS.

How this distributed system would work is that there are authorities who are given name/address mapping, or name server, authority over specific high-level domains known as the dot level domains — ones such as .net, .com, .org and so on. For burningbird.net, there is a central authority that has authority to manage all .net domains, including my own. However, rather than the one organization trying to manage the .net domain for all subdomains, it delegates to others the authority to act as subdomain name server authorities — people or organizations who provide name servers that handle all name/server mappings for all domains they manage.

Each name server authority provides all the address/IP mapping for specific subdomains, such as the name server that provides this for burningbird.net. As with the old system, this information is also maintained in a file, but in this case the file is called a zone file; and rather than all of these files be merged into a centralized location, the individual name servers provide the address/name mappings on demand, using specialized software.

Because of the added complexity of the system, with name/address mappings being polled rather than pushed to a central authority, there had to be additional information to make this more efficient, and today’s zone file is a bit more complex than the old hosts file. But not that complex if you just break it down into its components parts.

Time to Bust a myth:

Contrary to common expectation, a domain really isn’t a specific name such as burningbird.net. A domain is nothing more than an autonomously administered area of the overall domain space that is the Internet. For most people, this would be a specific name such as burningbird.net. However, in larger organizations, such as Stanford University, one domain could be the primary authority — standford.edu — with additional domains given separate authority: gsb.stanford.edu, csu.stanford.edu, and so on. It’s the administration authority, not that the name, that forms a unique domain.

The Zone File

I debated whether to include a breakdown of a zone file within this discussion. After all, you don’t have to know about the internals of zone files in order to have a good understanding of the workings of DNS. Additionally, when you start listing out the contents of machine generated and consumed files, the conversation changes — moving from understanding to implementation.

What decided me to include this section is because the the terminology of zone files is introduced by web hosts and ISPs, usually as a means of charging people more money for hosting services. I’ve frequently seen the case where hosting providers will tack on another charge for name server management for a domain, and if you question this, they’ll come back with an explanation that they have to manage the zone file. As you’ll see in this section, managing the zone file itself isn’t an onerous task, and for the most part is handled automatically with various tools. (Maintaining a name server can take resources, as we’ll see later in this essay.)

Each domain has a specific zone file, and the first line in it is what is known as the TTL — the Time-to-Live of the zone file, which will discuss late. Following, the first record of the file provides what is known as the Start of Authority (SOA) record for the domain — providing information such as the length of time before the name server information is refreshed, who the contact for the zone is, and a unique serial number that is used to determine if the zone file has been updated. An example of a SOA from my new server is:

yasd.com. IN SOA ns1.burningbird.net. shelleyp.burningbird.net. (
1056080183
10800
3600
604800
38400 )

This reads as:

domain name: yasd.com
host name of primary name server: ns1.burningbird.net
contact person: shelleyp@burningbird.net
serial: 1056080183
refresh: 10800
retry:3600
expire:605800
minimum time to live:38400

The domain name and contact email are self-explanatory, and the serial number doesn’t have meaning by itself — it’s changed when the zone file is modified to signify that a change has occurred. The host name of the primary name server is just the host name of the primary name server. The last value in this case means how long this name server record should live in a remote cache.

The zone file also includes other records such as a mapping to a mailserver, and the name/address pair, such as:

yasd.com. IN A 69.10.138.64
mail.yasd.com. IN A 69.10.138.64
www.yasd.com. IN CNAME yasd.com.
yasd.com. IN MX 10 mail.yasd.com.
69.10.138.64.yasd.com. IN PTR

The syntax of these records is mainly important to those folks who have to maintain zone files, but in order what they’re saying is:

The address yasd.com maps to a specific IP network address, 69.10.138.64
The address mail.yasd.com maps to this same IP
The address www.yasd.com is an alias for yasd.com
The address yasd.com is served by a email server, with address of mail.yasd.com

The last record is a reverse lookup pointer for yasd.com — it gives you the ability to find the IP address of a domain given a domain name. You can try this yourself by accessing this site, selecting Lookup from the left, typing yasd.com in the box underneath the tools, and clicking Submit. My IP address should be among the data returned.

There are shortcuts and other things you can add to a zone file, and you have to be careful with the syntax, but for all intents and purposes — this is a zone file; it’s only modified when you change the IP or add new aliases or other records.

Maintenance of a zone file is not a heartbreaker. However, providing name server services for a domain does take resources.

Getting there from here

Okay, so we have a zone file — a text file that provides information about the IP addresses for a specific zone. Now, how does this information get out into the Internet? More importantly, what’s to stop someone else from creating a zone file and hijacking our domain?

When you register a domain with a registrar such as dotster.com or Network Solutions, in addition to providing other information about who owns the domain, you also have to provide at a minimum two name servers — one to act as primary name server, the other the secondary name server.

By specifying a specific name server to act as authority for your zone file, anyone else could create a zone file and say they were the authority — but your domain registrar file says otherwise. You would have to change the name servers at the registrar file to change this, but someone can’t create a stealth zone file and try and steal you, or more accurately your domain, away.

So now there is a direct relationship between the name servers that are maintaining your domain’s zone file, and your registered domain. But that still doesn’t propagate the change throughout the Internet.

That’s where you and your loyal readers come in.

When you connect to the internet, through a dial-up, cable modem, DSL, or whatever, there’s a name server associated with the ISP, known as the ISP’s DNS server. Earlier I mentioned that sometimes you may not be able to resolve any domain name, not just one specifically.Whenever you can’t resolve any address from your PC, the problem is most likely because there’s something wrong with your ISP’s DNS.

If your ISP DNS Server is working, when you type a domain such as burningbird.net into your browser, the DNS Server looks in its own name server cache to see if it can find this domain. If it can’t, it then looks within the zone files it maintains, to see if it’s there. If it still can’t find the domain, the DNS server looks to one of the master DNS servers, known as the root DNS servers.

When you associated the two name servers with your domain, these are stored with the domain at these root servers. Additionally, the root server also knows the IP address of the name servers. When the ISP DNS makes a request on the domain of the root server, these name server addresses are returned to the ISP DNS, which then sends a request to the primary/master name server for the IP address of the domain.

If the primary name server is working, it returns the IP; if not, the ISP DNS server queries the secondary name server, and when the IP address is returned, it caches the domain name and IP within its own cache, in order to make access quicker in the future.

Now, if your ISP DNS server is what is known as a forwarder DNS Server, rather than go directly to the root DNS Server to get the name servers for the domain, it asks the next DNS Server in a list for the address/name mappings for the domain. The forwarded server does the same process — look locally, then ask a root DNS server for the name servers and so on. When it gets the IP/address it caches this information locally and returns it to your ISP’s DNS, which caches it locally — increasing the speed of propagation of the name/address mapping.

Now, if for some reason both of your name servers are down, or the name can’t be found in the root servers, then the person who typed the name into a browser will get the name not found error. In fact the reason for insisting on two different name servers was to prevent this problem — the assumption was that the two name servers would be on separate machines, physically separated. What’s happened more and more though is that most name servers from hosting companies, and the Co-op, are really two different IP addresses for the same machine. Acceptable, barely, for the Co-op (until we can find a secondary) — not acceptable for a commercial hosting service.

You can see how the data becomes propagated throughout the Internet. You can also see that your name server does use resources in order to serve the name/pair requests — a valid expense to pass on, but one that should be commiserate with how often your page is accessed, and from how many different ISPs.

Of course, one the data is propagated, how do you go about getting it changed?

Time to Live

To every thing
turn, turn, turn
There is a season
turn, turn, turn
And a time
to every purpose under heaven
A time to be born
A time to die
A time to plant
A time to reap

A time to kill
A time to heal
A time to laugh
A time to weep

To every thing
turn, turn, turn
There is a season
turn, turn, turn
And a time
to every purpose under heaven

From the Byrds, based on Ecclesiastics 3

Associated with every name/address mapping is a value known as TTL, or Time To Live. This value tells every ISP DNS that caches the name/address mapping to maintain that cache for only the specified time — such as 3 hours, a day, or even several days. When the time expires and the name/address pair is again requested, the lookup procedure should begin all over again.

The TTL keeps the data from becoming too out of date, and allows for changes in the system, such as a move to a new IP, a new alias, and even moving authority for a domain to a new server. Unfortunately, not all ISP DNS honor the TTL.

Some ISP DNS have their own schedule of expired name/address mappings, and will continue to return you the older data until their schedule expiration time rather than the one associated with the zone file. Becaues of this, rather than the data being updated in three hours, if this is the value set in the zone file, it may take a day or even several before you see the updated DNS information. Still, except for extreme circumstances, new DNS changes usually make it from the zone file to your browser within a couple of days.

Here’s a Fanciful Thought:

Weblogging may or may not be revolutionizing the Internet, but in my opinion, it is increasing the efficiency of the DNS. How come, you ask? Well, glad you asked.

There is a geographical distribution associated with weblogging that tends to send people out to sites that not only are not within their local network, but not even within the network served by whatever backbone (major internet architectural component) provides their area service. For each weblog reader from a new region, using a different ISP to connect to my weblog, that’s one more patch of the overall Internet that my particular domain/address mapping is occurring in.

Moreover, there is a frequency of access within weblogging, such as the hourly pings sent from RSS aggregators that are continously asking our ISPs’ DNS to check for the address/name mapping within it’s cache. Because of this, a request for the new name/address mapping is likely to occur soon after it expires within the DNS ISP’s cache, kicking off the propagation process that much more quickly.

If the Internet can survive the weight of all our cat photos, in a decade or so as more webloggers from far corners of the globe join the fray, we could see DNS propagation rates double.

If the Internet can be viewed as plumbing, then Webloggers can be seen as the handle that once pushed, flushes the pipes.

In case you’re curious as to how Burningbird became a name server, this is detailed in the next section. If you’re not, you can skip to the last section in the essay: DNS: A Scenario.

Becoming a Name Server

How does one become a name server authority? In my case, it was getting a server that had Internet access and static IPs, in which I could run the appropriate name server software, called BIND. You need the static IP addresses because your server’s will be queried for name/address resolution, and this IP address must remain constant; and you need specific software to manage the resolution — returning an IP address when queried by name, or a name when queried by IP address.

Once BIND was installed and configured, it was then a simple matter for me to go to my official InterNIC registrar, Dotster, and register the two new name servers: ns1.burningbird.net and ns2.burningbird.net, one for each of my unique IP addresses. Though name servers are usually on different machines, there is no requirement that they be on different machines.

Now my being a name server authority only applies to US-based domains: .com, .net, .info, .org and the like. I’m not a name server authority for any of the country domains such as .uk and .au and would have to ask for this authority from the domain holding organizations there — something not likely to happen.

DNS: A Scenario

Consider a hypothetical Burninbird Network Co-op member called Sally.

When Sally moves her weblog from Blogspot to the Co-op, she wants her own domain. What are the steps she’ll need to take to register the domain, and ensure that the name maps to the co-op, and ultimately to her weblog?

1. Sally thinks of a couple of good domain names and the first thing she’ll have to do is check to make sure that no one has them. She’ll use what is known as the whois database to check to see if anyone has the domain.

The whois database is a database of all domain names managed by Network Solutions, in its role as business proxy for the InterNIC–the domain name central authority. You can query the whois database from Network Solutions, but you can also use whois from hundreds of other sites, just by looking up ‘whois’ from Google. As an example, this is the whois record for one of my domains, yasd.com.

2. Once Sally has found a domain that isn’t used, she has to register it. The registration process does change from country to country, but for the most part she’ll pick whichever one is recommended by another person or by other method of discovery, as long as it’s an accredited registrar. I myself use Dotster, though there are other very good registrars. Most will charge a few though there are some free DNS registrars out in the world.

3. When registering her domain, Sally will have to provide contacts to fill specific roles: Owner, Administrative contact, Billing contact, and Technical contact. If she’s registering with a hosting company, chances are the host will put themselves in as Technical contact and Sally as Owner, Billing, and Administrative Contact. This is something I do not agree with.

To maintain independence from a hosting company, to be able to move your domain easily and quickly, I believe it’s important for the person registering the domain to put themselves in as all four contacts.

Originally, the company that provided the name servers was the one you would put in as technical contact, so that if there was a problem with DNS, or the site, they could be contacted. This isn’t a bad idea — a secondary contact just as there is a secondary name server.

However, rarely is the secondary contact used. What happens is that you, the domain owner, is contacted for most activity. But what happens when you want to move your domain to a new hosting service? Your domain can’t move until the existing name server domain zone files are changed, or until you change the name servers that are authority for your domain. If you have a difficult hosting company, or an unresponsive one, they can literally control your name server entries because of that technical contact and the authority it gives them.

Redundant contacts might be nice, but not worth the hassle.

No, my recommendations is to pick a reputable registrar, register your domain and yourself or your organization as all four contacts, and then you change the name server entries as needed. It’s easy to do.

4. Sally registers her new domain, sallysnewdomain.com, and asks the Co-op admin — that’s me — for the two name servers to use. I provide her with:

ns1.burningbird.net
ns2.burningbird.net

She types these into the appropriate spot in the registration form.

(Sally could also use a commercial name server or some other set of name servers — name server entries don’t have to be maintained by the weblog or web site host. In this case, then, she’ll ask me what the IP address of her site is, and I’ll tell her. And I’ll have to also let her know if this changes.)

5. Once I, as the name server admin, is notified of the new domain, I’ll create a zone file for her domain that maps the domain name with the correct IP address. This will be a shared IP address, shared with all the other co-op members, as most hosting is managed. Settings in Apache and the other services is what allows many domains to run off the same IP, but a different physical address — another topic for a future Internet for Poets essay.

6. If the domain is new, Sally’s own access of the domain triggers the process of propagating the address/name mapping. Once she goes live, then other new readers take over this effort — each person triggering the events at their own ISPs DNS to go out and get the new name/address mapping.

7. Once the domain name propagates, and Sally has set up her site, she’s in business. At that point, the only time the zone file should change is if an IP change occurs — something transparent to the weblog readers. Eventually if Sally wants to move on, then another name server manages her domain from that point on, and Sally updates her registrar record to point to these new servers.

And you, the weblog reader: your role in all of this, should you choose to accept it, is to read Sally’s weblog. I know, it’s tough, but someone has to do it.