TalkTalk and DSCP and 19 second latency
This page is about in interesting problem that was reported to us, by a customer, in December 2021.
TL;DR
For some reason, TT's kit is reading IP DSCP marks from IPv4 packets inside PPPoE, and then putting them in a funny queuing setup which results in latency quickly increasing to over 19 seconds:
1408 bytes from x.x.x.x: icmp_seq=986 ttl=63 time=19084.589 ms
In depth:
Home network
the cusotmer's home network is fairly typical, eg:
WiFi devices <-> Aruba AP22 <-> FireBrick 2900 <-> Huawei HG612 (bridge mode)
There are also a number of devices wired in via a switch and the FireBrick.
The Problem
Our customer moved house in early 2021 and we provided a VDSL line, a FireBrick FB2900. The VDSL was supplied over TalkTalk backhaul. At the same time the customer installed a set of new Aruba access points to cover his new house in Wi-Fi.
The Wi-Fi itself works very well. But since the install the customer soon noticed problems with some 'real time' applications such as webRTC, Google Stadia, Nest Camera video streaming. The problem was with latency, which was caused delays with the live streaming of video and audio.
The customer put this down to something odd on their network until they finally decided to investigate further.
The cause
pcaps revelealed that the Aruba access point was marking some traffic with the DSCP flag CS6, and when there was enough traffic latency would increasingly build up.
Show me
Here is 100 pings - though to keep the page short, I've included only every 10th ping, but you get the idea:
PING 81.187.81.187 (81.187.81.187): 1400 data bytes 1408 bytes from 81.187.81.187: icmp_seq=0 ttl=63 time=10.234 ms Request timeout for icmp_seq 10 1408 bytes from 81.187.81.187: icmp_seq=10 ttl=63 time=37.160 ms 1408 bytes from 81.187.81.187: icmp_seq=11 ttl=63 time=16.935 ms 1408 bytes from 81.187.81.187: icmp_seq=20 ttl=63 time=278.065 ms 1408 bytes from 81.187.81.187: icmp_seq=30 ttl=63 time=386.798 ms Request timeout for icmp_seq 70 1408 bytes from 81.187.81.187: icmp_seq=40 ttl=63 time=793.062 ms Request timeout for icmp_seq 80 1408 bytes from 81.187.81.187: icmp_seq=50 ttl=63 time=836.497 ms 1408 bytes from 81.187.81.187: icmp_seq=60 ttl=63 time=1040.796 ms 1408 bytes from 81.187.81.187: icmp_seq=70 ttl=63 time=1453.866 ms 1408 bytes from 81.187.81.187: icmp_seq=80 ttl=63 time=1461.298 ms 1408 bytes from 81.187.81.187: icmp_seq=90 ttl=63 time=1935.554 ms 1408 bytes from 81.187.81.187: icmp_seq=99 ttl=63 time=1978.546 ms 100 packets transmitted, 100 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 9.734/929.728/2047.155/628.466 ms
What's DCSP and CS6?
Things that were tried
...that didn't make a difference
- Disabling the "QoS" setting on the HG612 in bridge mode (still observe high latency)
- Reducing the "speed" of the PPPoE connection from the FB2900 to 85% of sync speed, hoping to avoid buffer-bloat anywhere in the me-to-A&A direction (still observe high latency)
- Using other wireless devices (I can repro the problem with the "live view" of some Nest Cameras and with web-based Stadia on a Chromebook)
- Dumping packets on the WAN interface of the FB2900 (I've confirmed that the FB2900 itself isn't introducing the extra latency)
...that did make a difference
- Connecting the phone, running Stadia, via wired ethernet (high latency goes away because the problematic QoS marking has gone, DSCP field = 0)
- Setting a special feature on AAISP and the FireBrick - 'IP over LCP' - this sends the IP traffic as control frames. (high latency goes away)