Best-Effort Delivery vs Back-Pressure Flow Control

Another difference between ATM-based networks and native IP networks is how networks control congestion. Because of the self-similar nature of data traffic, periods of congestion cannot be eliminated by aggregating more traffic streams together or by increasing the number of buffers in gateways.

Native IP networks feature the so called best-effort delivery; i.e. packets are not guaranteed to be delivered, and gateways simply drop packets when congestion occurs at some communication line. When reliable delivery is needed, the discarded data packets are retransmitted.

Since retransmissions cause additional load on an already congested network, the communicating hosts must slow down to avoid aggravating the congestion. This is achieved with a technique known as exponential back-off, which causes hosts to halve the effective transmission data rate every time the transmitter learns about loss of a packet. When all hosts cooperate by such voluntary reduction of transmission rates the congestion quickly abates. After slowing down, hosts start to increase the transmission rate gradually, until the saturation point is reached (as determined by loss of a packet), so the network is operating close to congestion; i.e. the nearly-optimal resource utilization is achieved. This technique was shown to control congestion effectively in a global heterogeneous network, while achieving nearly 100% utilization. Note that some packet loss is necessary for normal functioning of a best-effort network even if it is not overloaded.

Since communication of congestion indicators between transmitters and receivers is not instantaneous, some time passes from the beginning of congestion until transmitting hosts start slowing down. That time is close to the characteristic round-trip time (RTT) in the network. The size of buffers in the gateways in the best-effort networks should be sufficient to accommodate accumulation of packets arrived after congestion started but before transmitting hosts slowed down. Because congestion does not occur instantaneously, the maximal data rate of traffic routed to an outgoing link is close to the link's bandwidth; so the buffer size for that link should be at least as large as the product of RTT (delay) and bandwidth. Increasing buffer size beyond that allows the gateway to accommodate longer transient congestion, but also causes packets to spend more time in buffers (because the TCP congestion control algorithm tends to "fill the pipe", and makes queues to grow to the limit). Therefore, further increasing buffer size only increases delays in the network, effectively decreasing the quality of service. In a network that has properly sized buffers, the maximal RTT therefore does not exceed minimal RTT multiplied by the number of hops (an average number of inter-POP hops in the modern Internet is about 5, the "small" hops inside POPs are artifacts of clustering technology).

An important modification of cooperative congestion control is to have gateways drop packets randomly before the congestion actually occurs, with a frequency dependent on the queue size. This technique is known as Random Early Discard (RED), and was shown to be effective in pro-active prevention of congestion. RED allows decreasing the size of buffers by as much as 60%, thus significantly decreasing maximal network latency. Another benefit of RED is that if a packet source is not willing to cooperate by reducing packet rates, its packets will have higher probability of being discarded; i.e. RED gateways enforce fairness.

Virtual-circuit based networks usually employ the method of congestion control known as back-pressure flow control. The back-pressure flow control allows loss-free transmission by having gateways verify that the next gateway has sufficient buffer space available before sending data.

The back-pressure flow control works perfectly when buffering on different virtual circuits is independent; i.e. when every virtual circuit has its own pre-allocated buffer space. However, this solution is not practical because most virtual circuits are inactive most of the time, so the buffer space would be wasted. It is also very expensive when the number of virtual circuits passing through a switch reaches millions, as is the case of pure-ATM global network. The size of a buffer per every virtual circuit should be at least equal to the product of bandwidth and round-trip time on a communication line to the next switch, to ensure that the full capacity of the line can be used by the virtual circuit when other VCs are idle. This means that the maximal delay in a properly tuned network with back-pressure flow control is three times the minimal delay (for every hop, the maximal delay equals time spent in a buffer, plus the propagation time in the link; buffer size is sufficient to accommodate round-trip time on the link, or two times the delay in the link).

The difficulty of practical implementation of independent buffering means that switches are designed to share buffer space between virtual circuits. When the shared buffering is used, congestion on a line causes depletion of buffer space available to other lines on the same switch; which in the case of a completely loaded network triggers the congestion to propagate to neighbor gateways, and possibly to a large part of the network. This phenomenon of chain-reaction congestion collapse is not possible in best-effort networks because they simply remove excessive packets.

To summarize what was said above, best-effort networks have similar (if RED is used, and backbones comprise large central-office routers) latency parameters as back-pressure flow control networks, but are not prone to chain-reaction congestion collapses.

In any case, the pure ATM architecture for global networks is not feasible for the reasons outlined in previous chapters; so we will further discuss congestion control in flattened networks. The simplest flattened network wouldn't perform any statistical multiplexing at ATM level, simply using a mesh of CBR permanent virtual circuits. The congestion control is then performed by IP routers on the edges, and is essentially the same best-effort as in native IP networks. This architecture, however, does not make much sense, because similar functions can be successfully performed by a combination of IP routers and cheap synchronous multiplexors (producing a native IP network with too many circuits). Therefore in a realistic flattened network, ATM switches will perform statistical multiplexing.

When the back-pressure flow control is used by the ATM backbone, the edge IP routers will drop packets if the backbone cannot accept more data. This is equivalent to the best effort, but with the added "benefit" of chain-reaction congestion collapses. Another alternative is to have ATM switches drop cells, thus performing the best-effort delivery.

A peculiarity of ATM is the small size of cells (53 bytes) making implementation of a sliding-window reliable transport protocol operating on individual cells infeasible (for comparison, a minimal TCP/IP header is 40 bytes). This means that a transport-level data packet must be split into several cells (and it is done so by AAL5). However, the switches operate on cell level, so a loss of one cell makes it necessary to retransmit the entire packet. In a typical case of a bulk file transfer using 1.5 kilobyte packets (which will occupy 31 cells), a cell loss of 1% will result in 30% packet loss. In other words, a minor congestion would cause sudden loss of connectivity. ATM switch vendors attempted to use strategies such as packet-tail drop (i.e. if a cell was dropped, all subsequent cells in the same packet are also discarded), and variants of RED. However, those methods only reduce the effect but do not eliminate it completely, or render it harmless enough.

The only real solution is, therefore, to accumulate an entire packet before making the decision about dropping it. An ATM switch doing that becomes a best-effort delivery gateway with a rather inefficient line encapsulation protocol. The next logical step is to replace the inefficient line encapsulation protocol with a framing method for entire data packets (to reclaim 15-20% of total line capacity lost to ATM framing overhead), thus returning to native IP networking.