Route Flap Explained

One of the main culprits of today's poor Internet service is the so-called route flap. Route flap is the stream of routing updates received by all backbone routers and collected from the entire global Internet. Effectively, every transition between operational and non-operational states of a network device affecting the connectivity of some globally-visible network generates a wave of update messages rolling over all backbone networks:

Propagation Of Route Flap

Because the Internet is so large, the average rate of updates is relatively high (100-200 updates a second). Although this does not sound like very much, in reality, processing routing updates is computationally intensive because every new update requires matching the changed routes against routing policies. Routing policies are sets of rules defining how service providers exchange routing information and which paths are valid or preferred. A typical definition of routing policy at an edge router of a large ISP comprises several thousand rules. Without routing policies, the modern Internet would be totally unmanageable. The routing policies also reflect contractual relationships between service providers, and are a major tool for traffic engineering (i.e. directing flows of data along paths selected by network administrators).

Some level of route flap is unavoidable since routing updates carry vital information about changes in network topology. However, excessive levels of route flap are extremely dangerous - when flap is intensive enough to load the control units of routers or switches completely, the network routing does not have a chance to converge. The resulting long-lived transient states create routing loops that make the situation even worse.

Excessive routing flap is often caused by misconfigured Channel Service Units (CSUs, broadband modems), when clocks are derived from different sources (a proper configuration would cause one CSU to recover its clock from the signal generated by another CSU). The drift between two clock sources causes the line to "bounce" between periods of normal service and receiving corrupted data; the resulting repetitive route announcements and withdrawals are particularly destructive.

There are two methods of reducing damaging route flap: flap dampening by holding off introduction of routing updates of unstable routes, and route aggregation. The dampening allows the rapid suppression of flapping routes; however it does little to reduce the normal route "churn" generated by "real" topology changes all around the network. Route aggregation is a more powerful tool, as it limits the visibility of topology details, so the routing updates generated by topological changes of local significance do not reach backbone networks.

An empiric rule regarding route flap is that the rate of flap is proportional to the number of possible paths to all destinations, as seen by a backbone router. This means that increasing the number of alternative paths also increases flap; the Internet Exchange Points amplify routing flap by providing alternative routes, even if all those routes become alive and dead simultaneously:

IXPs Amplify Route Flap

There is no known method for prevention of flap amplification by exchange points; which means that the number of exchange points must be limited to ensure stability of the network. A careful application of routing policies so as not to accept third-party routes at exchange points is necessary to prevent receiving multiple copies of identical routing updates from different peers at the same exchange point.

It must be noted that the route flap phenomenon is not unique to IP, nor to connectionless networks in general. Since the flap is caused by propagation of information about topological changes, any networking technology which incorporates dynamic adaptive routing (most telephone systems do not, and we will discuss that later) will have to deal with route flap. In fact, the increased complexity of rerouting in connection-oriented networks makes route flap even more dangerous, a point that seems to be completely lost by the developers of ATM routing techniques.