Making the Internet Scale Through NAT

(just want to dump this here for future reference)

It’s widely acknowledged that the internet has a scaling problem ahead of it, and an even more imminent addressing problem. The core of internet routing is completely exposed to peripheral growth in mobile/multihomed leaf networks. Various groups are looking at solutions. The IRTF’s RRG has been discussing various solutions. The IETF have a LISP WG developing one particular solution, which is quite advanced.

The essential, common idea is to split an internet address into 2 distinct part, the “locator” and the “ID” of the actual host. The core routing fabric of the internet then needs only to be concerned with routing to “locator” addresses. Figuring out how to map onward to the (presumably far more numerous and/or less aggregatable) “ID” of the specific end-host to deliver to is then a question that need only concern a boundary layer between the core of the internet and the end-hosts. There are a whole bunch of details here (including the thorny question of what exactly the “ID” is in terms of scope and semantics) which we’ll try skip over as much as possible for fear of getting bogged down in them. We will note that some proposals, in order to be as transparent and invisible to end-host transport protocols as possible, use a “map and encap” approach – effectively tunneling packets over the core of the internet. Some other proposals use a NAT-like approach, like Six-One or GSE, with routers translating from inside to outside. Most proposals come with reasonable amounts of state, some proposals appear to have quite complex architectures and/or control planes. E.g. LISP in particular seems quite complex, relative to the “dumb” internet architecture we’re used to, as seems to try to solve every possible IP multi-homing and mobile-IP problem known to man. Some proposals also rely on IPv6 deployment.

Somewhat related proposals are shim6, which adds multi-homing capable “shim” layer in between IP and transport protocols like TCP (or “Upper Layer Protocols” / ULPs), and MultiPath-TCP (MPTCP), which aims to add multi-pathing extensions to TCP. Shim6 adds a small, additional state machine to whatever state machines the ULPs require, and is not backwards compatible with non-shimmed hosts (which isn’t a great problem of itself). Shim6 requires IPv6. MPTCP is still in a very early stage, and does not appear to have described any possible proposals yet.

Not too infrequently well-engineered solutions to some problem may not quite have a high enough unilateral benefit/cost ratio  for that solution to enjoy widespread adoption. Then cheap, quick hacks that address the immediate problem without causing too many problem can win out. The clear example in the IP world being NAT. It is therefore interesting to consider what will happen if none of the solutions currently being engineer, mentioned above, gain traction.

E.g. there are quite reasonable solutions which are IPv6 specific, and IPv6 is not exactly setting the world alight. There are other proposals which are v4/v6 agnostic, but require a significant amount of bilateral deployment to be useful, e.g. between ISP and customers, and do not have any obvious immediate advantages to most parties. Etc. So if, for whatever reasons, cost/benefit ratio means none of these solutions are rolled out, and the internet stays effectively v4-only for long enough, then the following will happen:

  1. Use of NAT will become ever more common, as pressure on IPv4 addresses increases
  2. As the use of NAT increases, the pressure on the transport layer port space (now press-ganged into service as an additional cross-host flow identifier) will increase, causing noticeable problems to end-user applications.
  3. As NAT flow-ID resources become ever more precious, so applications will become more careful to multiplex application protocols over available connections.
  4. However, with increased internet growth, even NAT will become more and more a luxury, and so ever more applications will be forced to rely on application layer proxies, to allow greater concentration of applications to public IP connections – at which stage you no longer have an internet.

The primary problems in this scenario are:

  • The transport protocol’s port number space has been repurposed as a cross-host flow-ID
  • That port number space is far too constraining for this purpose, as it’s just 2 bytes

The quickest hack fix is to extend the transport IDs. With TCP this is relatively easy to do, by defining a new TCP option. E.g. say we add a 2 * 4-byte host ID option, one ID for src, one for the dst. When replying to a packet that carried the option, you would set the dst to the received src, obviously. This would give NAT concentrators an additional number space to help associate packets with flows and the right NAT mappings.

This only fixes things for TCP, plus the space in the TCP header for options is becoming quite crowded and it’s not always possible to fit another 2*4+2 bytes in (properly aligned). IP also has an option mechanism, so we could define the option for IP and it would work for all IP protocols. However, low level router designers yelp in disgust at such suggestions, as they dislike having to parse out variably placed fields in HDL; indeed it is common for high-speed hardware routers to punt packets with IP options out to a slow, software data-path. The remaining possibility is a Shim style layer, i.e. in between IP and the ULPs, but that has 0 chance of graceful fallback compatibility with existing transports.

Let’s go with the IP option header as the least worst choice. It might lead to slower connectivity, but then again if your choice is between “no connectivity, because a NAT box in the middle is maxed out” and “slow connectivity, but connectivity still” then its better than nothing. Further, if such use of an IP option became wide-spread you can bet the hardware designers would eventually hold their noses and figure out a way to make common case forwarding fast even in its presence – only NAT concentrators would need to care about it.

Ok, so where are we now? We’ve got an IP option that adds 2 secondary host IDs to the IP header, thus allowing NAT concentrators to work better. Further, NAT concentrators could now even be stateless when it comes to processing packets that have this option. All it has to do is copy the dst address from the option into the IP header dst. This would allow the NATed hosts to be reachable from the outside world! The IDs don’t even have to be 4 byte each, per se.

Essentially you now have split addressing into 2. You could think of it as a split in terms of “internet” or “global network ID” and “end-host” ID, a bit like 6-to-1 or other older proposals I can’t remember the name of now, however it’s better to think of it as being 2 different addressing “scopes”:

  • The current forwarding scope, i.e. the IP header src/dst
  • The next forwarding scope, i.e. the option src/dst, when the the dst differs from the current scope dst

NAT boxes now become forwarding-scope context change points. End-host addressing in a sense can be thought of as end-host@concentrator. Note that the common, core scope of the internet can be blissfully unbothered by any details of the end-host number space. Indeed, if there’s no need for cross-scope communication then they can even use different addressing technologies, e.g. one IPv4 the other IPv6 (with some modification to the “copy address” scheme above).

Note that the end-host IDs no longer need be unique, e.g. 10.0.0.1@concentrator1 can be a different host from 10.0.0.1@concentrator2. In an IPv4 world, obviously the end-host IDs (or outside/non-core scope IDs) would not be globally unique. However, this scheme has benefits even with globally unique addresses, e.g. public-IP@concentrator still is beneficial because it would allow the core internet to not have to carry the prefixes for stub/leaf public IP prefixes.

You could take it a bit further and allow for prepending of scopes in this option, so you have a stack of scopes – like the label stack in MPLS – if desired. This would allow a multi-layered onion of forwarding: end-host@inner@outer. Probably not needed though.

What do we have now:

  • An IP option that adds a secondary addressing scope to packet
  • A 2-layer system of forwarding, extendible to n-layer
  • Decoupling of internet ID space from end-host ID space
  • Compatible with existing ULPs
  • Backwards compatible with IPv4 for at least some use cases
  • Allows NAT to scale
  • Relatively minor extension to existing, widely accepted and deployed NAT architecture
  • Allows scalable use of PI addressing
  • No per-connection signalling required
  • Minimal state

On the last point, the system clearly is not dynamic, as Shim6 tries to be. However, if the problem being solved is provider independence in the sense of being able to change providers without having to renumber internally, then this hack is adequate. Further, even if the reachability of network@concentrator is not advertised to the internet, the reachability of concentrator must be advertised at least. So there is some potential for further layering of reachability mechanisms onto this scheme.

No doubt most readers are thinking “You’re clearly on crack“. Which is what I’d have thought if I’d read the above a year or 5 ago. However, there seems to be a high-level of inertia on today’s internet for solving the addressing crunch through NAT. There seems to be little incentive to deploy IPv6. IPv6 also doesn’t solve multi-homing or routing scalability, the solutions for which all have complexity and/or compatibility issues to varying degrees. Therefore I think it is at least worth considering what happens if the internet sticks with IPv4 and NAT even as the IPv4 addressing crunch bites.

As I show above it is at least plausible to think there may be schemes that are compatible with IPv4 and allow the internet to scale up, which can likely be extended later to allow general connectivity across the NAT/scope boundaries, still with a level of backwards compatibility. Further, this scheme does not require any dynamic state or signalling protocols, beyond initial administrative setup. However, the scheme does allow for future augmentation to add dynamic behaviours.

In short, schemes can be devised which, while not solving every problem immediately, can deliver incremental benefits, by solving just a few pressing problems first and being extendible to the remaining problems over time. It’s at least worth thinking about them.

[Corrections for typos, nits and outright crack-headedness would be appreciated, as well as any comments]

4 Comments »

  1. Please see my response to your article in the RRG mailing list:
    http://www.ietf.org/mail-archive/web/rrg/current/msg06022.html

    The IRTF Routing Research Group scalable routing proposals, with critiques, are listed in the latest version of this draft, which will eventually become an RFC:
    http://tools.ietf.org/html/draft-irtf-rrg-recommendation

    – Robin

  2. pjakma said

    Thinking about the hardware processing side: I can’t see any reason why common case of IP header followed immediately by this “next-address scope” option couldn’t be processed directly in hardware (i.e. “ignore it” for the vast majority of routers). I.e. the common case would have the option in a fixed position. If it turns out there’s another option there, then the packet could be punted to software. So if the common case had this option first, there’s no reason it couldn’t be fast.

    Also, as is obvious to most readers, for the later parts of the suggestion above to work (i.e. initiating connections to hosts@concentrator addresses) it would require some kind of new address family, and application upgrades. This is the most obviously crack-full part of it all. On the application side, it’s effectively the same work as IPv6, to be done all over again (though, apps that support IPv6 APIs /well/ may potentially need no further upgrades). However, unlike IPv6, the suggestion above can deliver benefits even without applications being upgraded; it can deliver benefits even with just scattered deployment; most importantly of all, it’s compatible with IPv4.

    It’s worth thinking about it, is all I’m saying. Perhaps it’s useful as a “plan B”…

  3. […] for NAT that enables DPI, ALG, Firewall and IPS testing to happen with NAT turned on. NAT is an integral part of the network today and it has this nasty habit of rewriting IP addresses and port numbers and […]

  4. Paul Jakma said

    Kind of relevant: https://tools.ietf.org/html/draft-williams-exp-tcp-host-id-opt

RSS feed for comments on this post · TrackBack URI

Leave a comment