Date: Wed 1 Oct 86 12:41:40-EDT From: Dennis G. Perry Subject: Congestion in the Arpanet To: tcp-ip@SRI-NIC.ARPA Cc: perry@VAX.DARPA.MIL Message-Id: There has been quite a bit of conjecture on what is happening in the Internet and what are the reasons for the performance that people are seeing. I have been trying to understand these issues myself and asked BBN to provide me with some information. Attached is some of that information. I hope it raises other questions and answers a few. dennis --------------- ----- Forwarded message : Received: from cc5.bbn.com by .J.BBN.COM id a021878; 30 Sep 86 23:35 EDT To: dperry@vax.darpa.mil Subject: arpanet congestion Date: 30 Sep 86 23:16:58 EDT (Tue) From: Jeff Mayersohn For the last month, a large number of PSNs in the Arpanet have been reporting symptoms of congestion to the network monitoring center. These reports, or "traps," have been accompanied by an increasing number of user complaints. In order to deal with the problem of network congestion, we have been pursuing a number of avenues at BBNCC. This note summarizes the current state of our investigations and makes a number of specific recommendations. First, a little background. The Arpanet topology is largely unchanged since the physical split of the Arpanet into the Arpanet and Milnet in 1984. The topology of the post-physical-split Arpanet was actually designed from data which was collected before the earlier logical split of the two networks. In the past year, the network has shown a significant increase in traffic. A five-day average of network traffic showed an internode traffic rate of 140 Kbps in June of 1985 and an internode traffic rate of 230 Kbps in March of 1986. (The traffic growth had, in fact, leveled off over the summer of 1986 but we suspect that traffic has grown even more since the start of the academic year.) The network has recently been redesigned to accommodate NSF hosts, but these new resources have not yet been added to the network. Marianne Gardner has observed some very interesting trends in the statistics that we have collected recently. First, a very small percentage of host pairs account for a very large percentage of the network traffic. More than 80% of network traffic is contributed by 600 host pairs (out of 2596 communicating pairs). Some 60% of the traffic is contributed by 100 pairs. Second, gateway traffic dominates network traffic. 86% of Arpanet traffic has a gateway as either the source of destination. 52% of network traffic is between gateways. Our immediate focus over the last few weeks has been to concentrate on topological modelling in order to recommend a small number of changes which would bring network resource usage to acceptable levels. This modelling was based upon the peak hour traffic in late June, the last month during which a global network statistics collection was performed. The measured June traffic was increased by 50%. This number was based upon the recent growth in network traffic and the ratio of the peak hour traffic to the peak minute traffic. The assumption is probably conservative, which is good. The modelling work was done by Peg Primak, whose report is contained in the following. As of June, 1986, the Arpanet contained 47 nodes and 63 links. Two of these nodes have since been retired (SAC2 and USC) but were retained in the current model with all USC traffic re-routed to node 121. Our routing model shows single hour maximum link utilization of 75% (on UWisc-Roch) and maximum node utilization of 69%. Even with the UWisc-Purdue link restored, the maximum link utilization is still 72% and the maximum node utilization is 69%. (The Wisconsin to Purdue link was temporarily removed from the network a while ago.) To alleviate the worst of these problems, we considered adding a link from MIT77 to SRI51. The addition of this link reduces maximum link utilization to 58% (on the new link), with only two other links having utilizations over 50% (53% and 51%). Node utilization remains unchanged. The network diameter is reduced from 10 to 9 by the addition of this link. As these results show, a link between MIT77 and SRI51 would substantially improve Arpanet performance, and would become one of the most heavily utilized links in the network. Node utilization is quite heavy on several nodes. Normal utilization over seven minute intervals seems to be between 30% and 60% for all of the following nodes: ISI27, UCLA, RCC5, and UWISC. With the MIT-SRI link added, SRI51 will join this group. Measurement data show that each of these nodes experiences times of very heavy utilization (15 minute averages of 60% to 70%, 7 minute averages of 87%). Based on the June data, either nodes should be added at these sites or the five nodes at these sites should be upgraded to C/300s. We assume that the addition of trunk bandwidth will take a while. There are a number of other actions which we would like to take. First, TAC 113 should be installed immediately in the Arpanet. This provides for two changes that should reduce congestion. First, the release bundles more characters into single packets, thereby reducing the number of bits and packets required to send a given unit of Telnet data. TAC 113 also modifies the TCP retransmission timers. We probably get the wrong kind of feedback when the network slows down. If data is delayed due to network congestion, we suspect that this gives rise to TCP retransmissions which exacerbate the original problem. Bob Hinden of our gateway group tells me that, in the next two weeks, we will conduct an experiment which will make the Wideband Network look more favorable to the internet routing in the Butterfly gateways. This will cause some gateway-to-gateway traffic to move from the Arpanet to the Wideband Network. We have observed that, when network links get heavily saturated, the network routing algorithm becomes a bit too dynamic, trying to find excess capacity which does not exist. The effect of the resulting oscillations in network routing sometimes works to the detriment of network performance. There is a simple fix to this, i.e., we can easily make all three cross-country paths look equivalent to the routing algorithm. This results in the proper sort of load-sharing. There is still the possibility that we are running short of end-to-end resources. We are currently measuring the utilization of these resources to see whether this is the case. If we are short of these resources, there may be easy remedies to this in the PSN software. Our efforts over the last few weeks have concentrated on the modelling work. We have not had the opportunity to accumulate or study global network statistics collection in order to understand what has changed in the last month. John Wiggins and Clive Greenleaf have begun this collection today. Simple questions which should be answered are: 1) where has traffic increased? 2) are gateways using the network differently? 3) are we seeing large amounts of internet control traffic as we have in the past? 3) would the addition of mailbridges improve the situation? 5) should homings of hosts to mailbridges be changed? In summary, we should pursue the following: 1) A link from MIT to SRI51 should be added. 2) Node capacity should be added at ISI27, UWISC, RCC5, UCLA, SRI51 3) The planned addition of resources should be accelerated. 4) TAC 113 should be installed. 5) The network parameters should be adjused in order to result in more even sharing of the cross-country bandwidth, should statistics confirm that routing oscillations are occurring. 6) The Wideband Network experiment should be conducted as soon as possible. 7) Additional statistics should be collected in order to shed light on the underlying causes of the congestion. We will let results be known as soon as we have them. 8) The Purdue to Wisconsin link should be restored asap.