Dynamic Rerouting

Historically, the basic Internet technology was developed for military applications, to ensure survivability of a communication network in case of partial destruction of its lines and switching centers. Perhaps it sounds ironic considering the present routing stability problems. However, as we discussed before, the routing flap is inherent in any data communication network that supports true dynamic adaptive routing.

The principal difference between recovery procedures after a fault in connectionless and connection-oriented networks is that in connection-oriented networks the gateways not only need to learn the new topology of the network and calculate the preferred paths, but also have to reroute every individual virtual circuit passing through the failed switch (or even those which are not directly affected by the failure!). This process is very computationally intensive and cannot be performed on-line in the real-life networks.

The method currently employed by telephone companies is to have sufficient "technical" spare capacity to allow rerouting of permanent virtual circuits (usually DS-3s) in case of some predicted failures. Obviously, this method does not work in case of more complicated failures, and since computations of possible fault scenarios can take days, it can seriously delay modification of the infrastructure. (The good thing is that it gives engineers solid information about the behavior of a network under stress.) The pre-computation is not useful for switched virtual circuits.

In the global data network, no carrier can base routing decisions within its backbone solely on the basis of local situation. Fortunately for the Internet, there's a way to "decouple" local routing decisions from global routing decisions by employing two routing protocols at once, so changes in the interior routing do not affect computations of exterior routes as long as backbone is not fractured. This approach is also known as an "iBGP hack", and uses the fact that it is possible to reduce or increase the effective capacity of the communication network without causing more than transient congestion because of the adaptive congestion control built in TCP.

For connection-oriented networks such as ATM, there is no corresponding way of decoupling exterior and interior routing. It means that dynamic rerouting within backbones can be required even if the failure occurred in some other provider's backbone. An alternative to rerouting is to simply tear down connections, and let users reestablish them (this is often what telephone companies do, although they don't like to admit it). In the data networking world this solution is unacceptable, particularly because the machines (unlike humans) would not introduce delays before reestablishing the connections, thus creating a rush of new connections overloading the control units of switches.

The complexity of the Internet and the lack of complete topology information by any network operator make pre-computation of alternative routes quite impractical, so the backbones would have to be "decoupled" with IP routers at exchange points. Alternatively, link level redundancy (such as dual-ring SONET) has to be used, which currently seems to be the only working solution.

The link-level redundancy is very expensive, as it usually wastes half of actual medium capacity, and in most cases does not handle multiple failures. This means that a native IP network relying on IP-level dynamic rerouting and working with fibers without benefit of SONET-level link restoration would be able to deliver at the same cost two times more user data than a flattened network on top of redundant SONET facility. The modern IP backbones with link-state interior gateway protocols are shown to have convergence times in 200-300 ms range (in case of link failures; gateway failures usually take longer to detect because of the peculiarity of implementation of link-level keepalives by a vendor of popular routers). This is only marginally worse than 140-160 ms restoration time of dual SONET rings and certainly can be improved.

SONET Backbones

The current backbone Internet service providers are forced to share the redundant SONET transmission facilities with telephony and connection-oriented networks because the demand does not guarantee allocation of separate unprotected fiber strands to the Internet backbones. This, however, will change, and we expect ISPs with native IP backbones to start removing excessive levels of redundancy in order to reduce transmission costs.