1. Introduction and History
The DNS was designed as a replacement for the older "host table"
system. Both were intended to provide names for network resources at
a more abstract level than network (IP) addresses (see, e.g.,
[RFC625], [RFC811], [RFC819], [RFC830], [RFC882]). In recent years,
the DNS has become a database of convenience for the Internet, with
many proposals to add new features. Only some of these proposals
have been successful. Often the main (or only) motivation for using
the DNS is because it exists and is widely deployed, not because its
existing structure, facilities, and content are appropriate for the
particular application of data involved. This document reviews the
history of the DNS, including examination of some of those newer
applications. It then argues that the overloading process is often
inappropriate. Instead, it suggests that the DNS should be
supplemented by systems better matched to the intended applications
and outlines a framework and rationale for one such system.
Several of the comments that follow are somewhat revisionist. Good
design and engineering often requires a level of intuition by the
designers about things that will be necessary in the future; the
reasons for some of these design decisions are not made explicit at
the time because no one is able to articulate them. The discussion
below reconstructs some of the decisions about the Internet's primary
namespace (the "Class=IN" DNS) in the light of subsequent development
and experience. In addition, the historical reasons for particular
decisions about the Internet were often severely underdocumented
contemporaneously and, not surprisingly, different participants have
different recollections about what happened and what was considered
important. Consequently, the quasi-historical story below is just
one story. There may be (indeed, almost certainly are) other stories
about how the DNS evolved to its present state, but those variants do
not invalidate the inferences and conclusions.
This document presumes a general understanding of the terminology of
RFC 1034std13 [RFC1034] or of any good DNS tutorial (see, e.g., [Albitz]).
1.1. Context for DNS Development
During the entire post-startup-period life of the ARPANET and nearly
the first decade or so of operation of the Internet, the list of host
names and their mapping to and from addresses was maintained in a
frequently-updated "host table" [RFC625], [RFC811], [RFC952]. The
names themselves were restricted to a subset of ASCII [ASCII] chosen
to avoid ambiguities in printed form, to permit interoperation with
systems using other character codings (notably EBCDIC), and to avoid
the "national use" code positions of ISO 646 [IS646]. These
restrictions later became collectively known as the "LDH" rules for
"letter-digit-hyphen", the permitted characters. The table was just
a list with a common format that was eventually agreed upon; sites
were expected to frequently obtain copies of, and install, new
versions. The host tables themselves were introduced to:
o Eliminate the requirement for people to remember host numbers
(addresses). Despite apparent experience to the contrary in the
conventional telephone system, numeric numbering systems,
including the numeric host number strategy, did not (and do not)
work well for more than a (large) handful of hosts.
o Provide stability when addresses changed. Since addresses -- to
some degree in the ARPANET and more importantly in the
contemporary Internet -- are a function of network topology and
routing, they often had to be changed when connectivity or
topology changed. The names could be kept stable even as
addresses changed.
o Provide the capability to have multiple addresses associated with
a given host to reflect different types of connectivity and
topology. Use of names, rather than explicit addresses, avoided
the requirement that would otherwise exist for users and other
hosts to track these multiple host numbers and addresses and the
topological considerations for selecting one over others.
After several years of using the host table approach, the community
concluded that model did not scale adequately and that it would not
adequately support new service variations. A number of discussions
and meetings were held which drew several ideas and incomplete
proposals together. The DNS was the result of that effort. It
continued to evolve during the design and initial implementation
period, with a number of documents recording the changes (see
[RFC819], [RFC830], and [RFC1034]).
The goals for the DNS included:
o Preservation of the capabilities of the host table arrangements
(especially unique, unambiguous, host names),
o Provision for addition of additional services (e.g., the special
record types for electronic mail routing which quickly followed
introduction of the DNS), and
o Creation of a robust, hierarchical, distributed, name lookup
system to accomplish the other goals.
The DNS design also permitted distribution of name administration,
rather than requiring that each host be entered into a single,
central, table by a central administration.
1.2. Review of the DNS and Its Role as Designed
The DNS was designed to identify network resources. Although there
was speculation about including, e.g., personal names and email
addresses, it was not designed primarily to identify people, brands,
etc. At the same time, the system was designed with the flexibility
to accommodate new data types and structures, both through the
addition of new record types to the initial "INternet" class, and,
potentially, through the introduction of new classes. Since the
appropriate identifiers and content of those future extensions could
not be anticipated, the design provided that these fields could
contain any (binary) information, not just the restricted text forms
of the host table.
However, the DNS, as it is actually used, is intimately tied to the
applications and application protocols that utilize it, often at a
fairly low level.
In particular, despite the ability of the protocols and data
structures themselves to accommodate any binary representation, DNS
names as used were historically not even unrestricted ASCII, but a
very restricted subset of it, a subset that derives from the original
host table naming rules. Selection of that subset was driven in part
by human factors considerations, including a desire to eliminate
possible ambiguities in an international context. Hence character
codes that had international variations in interpretation were
excluded, the underscore character and case distinctions were
eliminated as being confusing (in the underscore's case, with the
hyphen character) when written or read by people, and so on. These
considerations appear to be very similar to those that resulted in
similarly restricted character sets being used as protocol elements
in many ITU and ISO protocols (cf. [X29]).
Another assumption was that there would be a high ratio of physical
hosts to second level domains and, more generally, that the system
would be deeply hierarchical, with most systems (and names) at the
third level or below and a very large percentage of the total names
representing physical hosts. There are domains that follow this
model: many university and corporate domains use fairly deep
hierarchies, as do a few country-oriented top level domains
("ccTLDs"). Historically, the "US." domain has been an excellent
example of the deeply hierarchical approach. However, by 1998,
comparison of several efforts to survey the DNS showed a count of SOA
records that approached (and may have passed) the number of distinct
hosts. Looked at differently, we appear to be moving toward a
situation in which the number of delegated domains on the Internet is
approaching or exceeding the number of hosts, or at least the number
of hosts able to provide services to others on the network. This
presumably results from synonyms or aliases that map a great many
names onto a smaller number of hosts. While experience up to this
time has shown that the DNS is robust enough -- given contemporary
machines as servers and current bandwidth norms -- to be able to
continue to operate reasonably well when those historical assumptions
are not met (e.g., with a flat, structure under ".COM" containing
well over ten million delegated subdomains [COMSIZE]), it is still
useful to remember that the system could have been designed to work
optimally with a flat structure (and very large zones) rather than a
deeply hierarchical one, and was not.
Similarly, despite some early speculation about entering people's
names and email addresses into the DNS directly (e.g., see
[RFC1034]), electronic mail addresses in the Internet have preserved
the original, pre-DNS, "user (or mailbox) at location" conceptual
format rather than a flatter or strictly dot-separated one.
Location, in that instance, is a reference to a host. The sole
exception, at least in the "IN" class, has been one field of the SOA
record.
Both the DNS architecture itself and the two-level (host name and
mailbox name) provisions for email and similar functions (e.g., see
the finger protocol [FINGER]), also anticipated a relatively high
ratio of users to actual hosts. Despite the observation in RFC 1034std13
that the DNS was expected to grow to be proportional to the number of
users (section 2.3), it has never been clear that the DNS was
seriously designed for, or could, scale to the order of magnitude of
number of users (or, more recently, products or document objects),
rather than that of physical hosts.
Just as was the case for the host table before it, the DNS provided
critical uniqueness for names, and universal accessibility to them,
as part of overall "single internet" and "end to end" models (cf.
[RFC2826]). However, there are many signs that, as new uses evolved
and original assumptions were abused (if not violated outright), the
system was being stretched to, or beyond, its practical limits.
The original design effort that led to the DNS included examination
of the directory technologies available at the time. The design
group concluded that the DNS design, with its simplifying assumptions
and restricted capabilities, would be feasible to deploy and make
adequately robust, which the more comprehensive directory approaches
were not. At the same time, some of the participants feared that the
limitations might cause future problems; this document essentially
takes the position that they were probably correct. On the other
hand, directory technology and implementations have evolved
significantly in the ensuing years: it may be time to revisit the
assumptions, either in the context of the two- (or more) level
mechanism contemplated by the rest of this document or, even more
radically, as a path toward a DNS replacement.
1.3. The Web and User-visible Domain Names
From the standpoint of the integrity of the domain name system -- and
scaling of the Internet, including optimal accessibility to content
-- the web design decision to use "A record" domain names directly in
URLs, rather than some system of indirection, has proven to be a
serious mistake in several respects. Convenience of typing, and the
desire to make domain names out of easily-remembered product names,
has led to a flattening of the DNS, with many people now perceiving
that second-level names under COM (or in some countries, second- or
third-level names under the relevant ccTLD) are all that is
meaningful. This perception has been reinforced by some domain name
registrars [REGISTRAR] who have been anxious to "sell" additional
names. And, of course, the perception that one needed a second-level
(or even top-level) domain per product, rather than having names
associated with a (usually organizational) collection of network
resources, has led to a rapid acceleration in the number of names
being registered. That acceleration has, in turn, clearly benefited
registrars charging on a per-name basis, "cybersquatters", and others
in the business of "selling" names, but it has not obviously
benefited the Internet as a whole.
This emphasis on second-level domain names has also created a problem
for the trademark community. Since the Internet is international,
and names are being populated in a flat and unqualified space,
similarly-named entities are in conflict even if there would
ordinarily be no chance of confusing them in the marketplace. The
problem appears to be unsolvable except by a choice between draconian
measures. These might include significant changes to the legislation
and conventions that govern disputes over "names" and "marks". Or
they might result in a situation in which the "rights" to a name are
typically not settled using the subtle and traditional product (or
industry) type and geopolitical scope rules of the trademark system.
Instead they have depended largely on political or economic power,
e.g., the organization with the greatest resources to invest in
defending (or attacking) names will ultimately win out. The latter
raises not only important issues of equity, but also the risk of
backlash as the numerous small players are forced to relinquish names
they find attractive and to adopt less-desirable naming conventions.
Independent of these sociopolitical problems, content distribution
issues have made it clear that it should be possible for an
organization to have copies of data it wishes to make available
distributed around the network, with a user who asks for the
information by name getting the topologically-closest copy. This is
not possible with simple, as-designed, use of the DNS: DNS names
identify target resources or, in the case of email "MX" records, a
preferentially-ordered list of resources "closest" to a target (not
to the source/user). Several technologies (and, in some cases,
corresponding business models) have arisen to work around these
problems, including intercepting and altering DNS requests so as to
point to other locations.
Additional implications are still being discovered and evaluated.
Approaches that involve interception of DNS queries and rewriting of
DNS names (or otherwise altering the resolution process based on the
topological location of the user) seem, however, to risk disrupting
end-to-end applications in the general case and raise many of the
issues discussed by the IAB in [IAB-OPES]. These problems occur even
if the rewriting machinery is accompanied by additional workarounds
for particular applications. For example, security associations and
applications that need to identify "the same host" often run into
problems if DNS names or other references are changed in the network
without participation of the applications that are trying to invoke
the associated services.
At the applications level, few of the protocols in active,
widespread, use on the Internet reflect either contemporary knowledge
in computer science or human factors or experience accumulated
through deployment and use. Instead, protocols tend to be deployed
at a just-past-prototype level, typically including the types of
expedient compromises typical with prototypes. If they prove useful,
the nature of the network permits very rapid dissemination (i.e.,
they fill a vacuum, even if a vacuum that no one previously knew
existed). But, once the vacuum is filled, the installed base
provides its own inertia: unless the design is so seriously faulty as
to prevent effective use (or there is a widely-perceived sense of
impending disaster unless the protocol is replaced), future
developments must maintain backward compatibility and workarounds for
problematic characteristics rather than benefiting from redesign in
the light of experience. Applications that are "almost good enough"
prevent development and deployment of high-quality replacements.
The DNS is both an illustration of, and an exception to, parts of
this pessimistic interpretation. It was a second-generation
development, with the host table system being seen as at the end of
its useful life. There was a serious attempt made to reflect the
computing state of the art at the time. However, deployment was much
slower than expected (and very painful for many sites) and some fixed
(although relaxed several times) deadlines from a central network
administration were necessary for deployment to occur at all.
Replacing it now, in order to add functionality, while it continues
to perform its core functions at least reasonably well, would
presumably be extremely difficult.
There are many, perhaps obvious, examples of this. Despite many
known deficiencies and weaknesses of definition, the "finger" and
"whois" [WHOIS] protocols have not been replaced (despite many
efforts to update or replace the latter [WHOIS-UPDATE]). The Telnet
protocol and its many options drove out the SUPDUP [RFC734] one,
which was arguably much better designed for a diverse collection of
network hosts. A number of efforts to replace the email or file
transfer protocols with models which their advocates considered much
better have failed. And, more recently and below the applications
level, there is some reason to believe that this resistance to change
has been one of the factors impeding IPv6 deployment.