The Challenge of Measuring High Bandwidth Connections
Our obsession with bandwidth is essentially driven by our desire for a better Online experience, a poor user experience is frustrating. No matter how much bandwidth is provided our desire always seems to be 'I want more'. This desire is encouraged by the persistent publicity from a myriad of bandwidth providers, that more bandwidth is faster. In short, bandwidth is synonymous with speed, and more speed is better for the experience than less speed... is this true?
One of the most popular application services on the Internet today is bandwidth measurement. In the USA alone several million so called speed tests are run each and every day. Not surprising really, it is a bit like buying a new sports car, our first urge is to test the car’s ability to deliver on its published rating of acceleration and top speed. Does more bandwidth really deliver a better user experience? This is a good question, however, a better question would be, "why is a poor user experience still the No.1 service complaint, when the No.1 service solution is to increase the bandwidth"?
Over the years bandwidth offerings have grown from as little as 512Kbps (considered to be the popular definition of a broadband connection) to 10Gbps or higher, especially in the business world. In fact, 1Gbps to the home is now becoming a common reality when, just 5 years ago, 10Mbps was considered fast.
The big disconnect is that bandwidth is not a speed, only a rate. If you were to ask me the question, "I have an 11am flight tomorrow. What time should I leave to be on time”. How would you react if I replied, "the road to the airport delivers 5,000 cars per hour". I suspect you would be puzzled...
If you are driving to the airport, what is more important to your experience?
1. That the road is delivering cars at a rate of 5,000 cars per hour, or
2. That the speed of your car is at or close to the speed limit of the road and you'll reach the airport on time.
I would suggest the answer is obvious, clearly being on time defines a good user experience, being late does not. This simple example demonstrates that the number of vehicles on the road is not necessarily a measure of performance, although it could be argued that there are some obvious correlations. I would expect a road architect to be more than just interested in wanting to understand the relationship between utilization of a road and the speed attained on the road. Understanding utilization versus speed will define the experience of any road.
So how should a road be measured? Cars per hour? Average speed attained? Both? Some other metric?
Time is the overriding metric that matters to the user experience, nothing else. Ultimately the success of any network application’s delivery is based on this. Being on time is the reason why car drivers plan a departure time prior to starting any journey that has a finite purpose. Knowing the "car per hour" rate of the road will contribute negligible value in the planning process.
If road signs published the percentage of cars that arrive close to on time for any destination along the road, the driver could accurately manage his driving experience. After all, this is what Satellite Navigation devices strive to do. A slower speed road may be selected because the road will deliver a quicker time to the destination.
Time is the core dynamic of the user experience. However, it should be noted that ISPs and the myriad of Internet speed test applications do not define a connection, or measure a connection, by any metric other than its maximum bit rate, namely bandwidth.
Consider this example of using a rate measure as a measurement of time. On the morning of your vacation you wake 30 minutes before your flight departs but the airport is a minimum of 45 minutes away!! No problem, the solution is simple... persuade a neighbor to drive their car to the airport at the same time as you, because a rate of 2 persons in 45 minutes delivers 1 person in 22.5 minutes, a full 7 minutes before departure. Better yet, persuade 2 neighbors to participate because the increased rate of 3 persons in 45 minutes now delivers 1 person in 15 minutes, enough time before departure for coffee and a bagel.
The core of the disconnect in this example is that three cars traveling at 60mph is not the same as one car traveling at 180mph. The user experience, as defined by time, is entirely different...
So, how do connection test applications measure bandwidth?
The principle of a network speed test is to read data and divide the amount of data received by the time it took to receive it. There are several problems with this...
1. How much time makes the test representative and acceptable. 1 millisecond, 1 second, 1 minute. The more time means more data.
2. Network data is not sent continuously but in batches, the size of which is dependent on available memory. If the batches of data do not fill the connection end-to-end then naturally the data blocks are separated by gaps, the same is true of a road (a simplistic explanation of TCP forced idle).
3. The presence of gaps in the data corrupts the result of the test when dividing by time because the gaps increase the time and decrease the data, which falsely lowers the result.
4. You can’t accurately test bandwidth with gaps. To resolve the 'gap' dilemma the test must make sure it has enough bits to ensure no gaps. Otherwise how can you distinguish a gap that is natural versus a gap that is created by a network problem.
5. Resolving item 4 is a big dilemma! How much data is enough data when the purpose of the test is to measure the bandwidth. While the latency of the pipe is known unfortunately the bandwidth is not. Too much data will negate test accuracy, too little data creates gaps... A 10Gbps pipe needs considerably more bits than a 1Mbps pipe.
As bandwidth levels have climbed over the last 5 to 10 years, speed test applications have become more inaccurate, why is this? This inaccuracy is fueled by four core issues.
1. Not knowing the bandwidth.
2. The premise that the speed test needs a continuous stream of data to measure the bandwidth.
3. The test’s inability to acquire and control the data needed to ensure no 'natural gaps'
4. Latency multiplies the data requirement and varies for every test.
These four issues mean that measuring higher and higher bandwidths will always be flawed when it comes to understanding the user experience. The tests inability to detect gaps as being 'natural' versus 'network delay' defeats the whole purpose of a bandwidth test, accuracy.
The real-world result for 3 persons arriving in 45 minutes is NOT one person in 15 minutes as speed tests report. The user experience is a 45-minute drive and a missed flight for all three no matter how you look at it. Three cars at 60mph is not the same as one car traveling at 180mph and never will be.
If the user experience matters, and it should, then the test method must introduce two significant variables into the test process. The first is the delivery relative to time and the second is quality because lack of quality also impacts time.
Time is what matters overall!! A network should therefore not be rated by the maximum number of bits per second but by the quality of data and its timely delivery. Should a network test report the rate of the bits per second, 'yes' it should. However, the test must have the smarts to be able to assess the variance for an on-time arrival to be of any use. What matters is, will the packets arrive on time to fulfill the purpose of why the packets are on the network, missing the flight is not a good result no matter what bps values are reported.
This is where quality plays a significant role. Quality is material to the user experience because when quality fails on a connection, loss of time is the penalty and a lot of time can be lost to quality events. Consider a road quality issue. If a pot hole causes a car to have a flat. The resulting delay is not related to the performance of the road specifically, the car will not arrive on time even if the road is empty. Network quality issues are packet events that cause the data to be unusable on arrival or simply arrive late. Packets out of order, packet lost, packet corruption, duplicate packets are just a few examples. It takes a very smart test to measure quality end to end.
Measuring the ability of a connection by combining data and dividing by time will provide little insight to the ability of a network to deliver a good user experience. However, the ability to assess quality and data rates within the context of 'on-time' will quickly identify a bad network from a good one.
Simply put, when it comes to the user experience, time matters!