On the Self-Similar Nature of Ethernet Traffic (Extended Version)

Will E. Leland, Murad S. Taqqu, Walter Willinger, and Daniel V. Wilson

One-line summary:

Overview/Main Points

   * Self-Similarity Definition:
     Intuitively:
        o time series is bursty at all time-scales, compared with classical
          models which smooth off at large time-scales (eg. exponential or
          Poisson arrival processes, stochastic models of packet traffic,
          ...)
        o adding more sources adds to the burstiness
     Mathematically: see paper for exact definitions.
     Implications of mathematical defintion:
        o variance of sample mean var(X^(m)) decays as m^(-beta), with 0 <
          beta < 1.
        o autocorrelation function decays hyperbolically in beta as well,
          meaning autocorrelation function is non-summable (this is
          long-range dependence.
        o spectral density is unbounded at the origin - f(lambda) ~
          lambda^(-gamma), with gamma = 1 - beta.
     Hurst effect: E[R(n)/S(n)] ~ n^H, where H is Hurst parameter, and H = 1
     - (beta/2). It is known (and can be mathematically demonstrated) that
     H=0.5 corresponds to non-selfsimilar data, while 0.5 < H < 1.0
     corresponds to selfsimilar data. Larger H implies larger burstiness.

   * Detecting Self-Similarity:
        o Variance-time plots: since var(X^(m)) ~ m^(-beta), plotting
          log(var(X^(m)) vs. log(m) will yield a straight line or slope
          -beta. As H = 1 - (beta/2), can deduce H.
        o R/S analysis (Pox plots): plotting log(R(n)/S(n)) vs. log(n) will
          yield a straight line of slope H.
        o Periodogram analysis: an estimate of H (known as Whittle's
          approximate MLE) can be calculated along with a confidence
          interval, based on the power spectrum density function properties.
          Few details given in paper.
        o Proof by picture, of course.

        o Ethernet Traffic:
             + High-quality Ethernet packet logs from Bellcore. Logs have
               microsecond resolution for periods of 27 hours, taken on four
               different days across a 4 year period.
             + All three tests gave similar values for H - Ethernet traffic
               is therefore self-similar.
             + Both LAN and WAN traffic showed similar H values.
             + The busier times of day yielded more bursty traffic (seen by
               higher H values).
             + The type and load of traffic (increased by web and
               exponential growth of internet) seemed to not affect the
               self-similarity property, although slight perturbations were
               noticed in H and attributed to host-to-host vs
               router-to-router, diskless vs. diskfull workstations, and
               other such things.
             + Low-load WAN traffic was non-selfsimilar. (Inconclusive,
               really.)

        o Causative Factors: Mandelbrot showed if have a large number of
          ON/OFF source clients, with ON/OFF periods having a heavy-tailed
          distribution, then the aggration of of the clients' traffic would
          be self-similar. (Heavy tail: P[U >= u] ~ u^(-alpha), 0 < alpha <
          2.)

          In a later paper, this hypothesis is verified.

        o Modelling Self-Similarity:
             + Fractional Gaussian noise, and fractional autoregressive
               integrated moving-average (ARIMA) processes. Essentially time
               series that satisfy certain properties regarding
               autocorrelation functions, can be shown that they are
               self-similar. Expensive to compute, reaonable approximations
               can be now made.
             + Model ON/OFF sources, aggregate. Behaves like fractional
               Brownian motion (related to fractional Gaussian noise)

        o Measuring Burstiness:
             + H.
             + Index of dispersion (for counts): for time interval L, IDC(L)
               = var(arrivals during L) / (expected value of arrivals during
               L). Valid, self-similar process have monotonically increasing
               (linear) IDC.
             + peak-to-mean ratio: peak bandwidth / mean bandwidth. But over
               which timescale?? (Visual picture of whitenoise.)
             + coefficient of variation: standard deviation of interarrival
               time : mean of interarrival time ratio. Infinite variance of
               interarrival time for heavy-tailed!

     Relevance

     Is revolutionizing assumptions and understanding of networks, and
     time-series/processes in computer systems in general. Self similar
     stuff now includes networks (WAN, LAN, ..), file system traffic, HTTP
     traffic, ...

     Flaws

        o overly mathematical for a CS paper (but necessarily so)
        o self-similar definitions have limits as t goes to infinity; we
          have finite processes. How much does this affect results?
        o to date, lack of practical implications
     -----------------------------------------------------------------------

======

On the Self-Similar Nature of Ethernet Traffic
----------------------------------------------
Leland et al., SIGMETRICS '93


Self-similar phenomena (first defined by Mandelbrot in '65) display
statistically identical structure across all time scales. In the case
of Ethernet, this manifests as the lack of a "typical" burst length.

The authors wrote a custom monitoring system to collects all seen
Ethernet packets, along with accurate timestamps. One doesn't need to
make any a priori decisions regarding their experiment, except the
amount of saved data per packet. They got 4 data sets totaling on the
order of 10^8 packets. Set #1 was collected on an isolated LAN with
diskless workstations; set #2 came 2 months later, after a large
upgrade. In both these sets, 95% of the traffic was intra-LAN. 3
months later they monitored an inter-LAN that served as an external
connection as well. After 2 years, they got set #4 by monitoring a
backbone segment (thus, mostly inter-router traffic).

They found that real Ethernet traffic at Bellcore is very different
from what is predicted by formal models for packet traffic, packet
train models, and fluid flow models. Obviously, it's also very
different from isochronous traffic typical in the telephone
system. Ethernet traffic was highly self-similar, whereas all the
models would predict white noise-like distributions for aggregated
traffic.

The Hurst parameter (H) is a measure of self-similarity. When H=0.5
you get white noise [?], but for Ethertraffic it's 0.75-0.85 and H
goes
up as utilization of the network increases (i.e., traffic becomes
increasingly self-similar). An important consequence is that, as the
number of Ethernet users increases, the traffic does not become
smoother (as predicted by formal models), but rather it becomes even
burstier.

Mandelbrot showed how self-similar processes can be viewed as an
aggregation of many simple renewal reward processes. The renewal
rewards for each such process turns out to be the amount of traffic
generated by a single user during successive time intervals, described
by a heavy tailed distribution (i.e., no characteristic length of busy
period or packet train).

Common measures of burstiness include:
- index of dispersion (variance in number of arrivals / expected value
  of arrivals over a given time interval); this index increases
  monotonically for Ethertraffic, but stays constant or converges
  rapidly in the formal models.
- peak-to-mean ratio: inappropriate measure for self-similar traffic,
  because it could literally have any value, depending on the time
  interval you're looking at.
- coefficient of variation (standard deviation of interarrival times /
  expected value of interarrival times): again, any value is possible,
  depending on number of samples.

Switched Megabit Data Service (SMDS) buffers and caps the rate at
which traffic is delivered into a LAN, to reduce burstiness. As it
turns out, increasing the buffer size reduces overall packet loss only
very slowly (in contrast to the exponential decrease predicted by
Poisson models) and invariably increases packet delay (in spite of the
limit predicted by formal models). This is all explained by
self-similarity, which further explains the observation that buffering
doesn't help manage congestion (losses are concentrated and greatly
exceed the background loss rate).

The only tractable models that are reasonably accurate: self-similar
stochastic model and deterministic chaotic maps.


Discussion issues:

1. CSMA/CD <--> nature (remember classroom example) log-log plots...
   1/f noise: cities by rank, fjords by size, Etherbursts, fractals, 
   English lang by rank, WWW traffic, co. revenue as func of rank

2. Why is this relevant/irrelevant to Van Jacobson's paper?