12,341
edits
This is the support site for Andrews & Arnold Ltd, a UK Internet provider. Information on these pages is generally for our customers but may be useful to others, enjoy!
m (→The fix) |
mNo edit summary |
||
(33 intermediate revisions by the same user not shown) | |||
<indicator name="Front">[[File:Menu-document.svg|link=Category:Technical Documents|30px|Back up to the Technical Documents category]]</indicator>
'''Updates:'''
2021-12-07 Page created
2021-12-08 Updated as FireBrick has software update to change/remove DSCP field
2021-12-09 Updated with reply from TalkTalk
2021-12-13 Updated with reply from Aruba
Further updates from TalkTalk expected middle of the week
Further updates from TalkTalk expected in the new year
This page is about in interesting problem that was reported to us, by a customer, in December 2021. We have written this up so that this page can be found if other people are seeing a similar problem.
We've masked the IP we're pinging in this page as it doesn't matter what IP you ping - in our tests though we were mostly pinging the IP of our LNS, as that is the 'next hop' on the customer's route to the internet.
=TL;DR=
1408 bytes from x.x.x.x: icmp_seq=986 ttl=63 time='''19084.589 ms'''
===Video:===
* https://www.dropbox.com/s/rm6c5zsmkddxsq0/High%20ping%20when%20QoS%20set.mov?dl=0
=== Thanks ===
Thank you to the customers (and non-customers) that helped run pings over their lines to help identify that this is a TalkTalk only issue.
=In depth:=
== The Problem ==
Our customer moved house in early 2021 and we provided a VDSL line and a FireBrick FB2900 Ethernet router. The VDSL was supplied over TalkTalk back-haul. At the same time the customer installed a set of new Aruba AP22 'Instant-On' access points to cover his new house in Wi-Fi.
The Wi-Fi itself works very well. But since the install the customer soon noticed problems with some 'real time' applications such as webRTC, Google Stadia, Nest Camera video streaming
* Nest video camera video streaming was laggy and lossy
* With Google Stadia, the client software sometimes gives up completely and says the connection isn't good enough to play
* Video conferencing applications struggled but turning down the bit rate enough helped
The customer put this down to something odd on their network until they finally decided to investigate further.
Although A&A only have a single customer who has reported this, '''we expect anyone in the UK using a TalkTalk ADSL of VDSL line with an Aruba Instant-on access point will be affected with lag on some streaming applications'''.
It's amazing to see 19 second latency - this means that a device on TalkTalk's network is storing packets for this amount of time before passing them on. You would usually expect the packets to be dropped (packet loss) - but packetloss has been very low in our tests.▼
== The cause ==
Request timeout for icmp_seq 10
1408 bytes from x.x.x.x: icmp_seq=10 ttl=63 time=37.160 ms
1408 bytes from x.x.x.x: icmp_seq=20 ttl=63 time=278.065 ms
1408 bytes from x.x.x.x: icmp_seq=30 ttl=63 time=386.798 ms
* '''-s 1400''' Setting the packet size to 1400 bytes - quite large. If we reduce this, eg to 700 bytes then it takes longer for the latency to rise. Setting it to 600 and it seems the traffic is low enough not to be caught by TalkTalk's traffic shaping policy
* '''-c 100''' Just to 100 pings this time
=== 19 second latency ===
▲It's amazing to see 19 second latency - this means that a device on TalkTalk's network is storing packets for this amount of time before passing them on. You would usually expect the packets to be dropped (packet loss) - but packetloss has been very low in our tests.
=== pcaps ===
'''Ping request:'''
Leaves CPE: 12:03:
Arrives LNS 12:03:
'''Ping reply:'''
Leaves LNS 12:03:38.
Arrives CPE 12:03:38.
In this example, it takes nearly 1 second for the packet to travel from the CPE to our LNS. The reply (from LNS to CPE) is quick.
'''Note:''' The classification below CS6 is described as Telephony - which may have been more appropriate, and in our tests, this traffic is unaffected by TalkTalk's network.
'''Note:''' DSCP classification are only guidelines and different manufacturers do seem to use them differently.
== Things that were tried== ▼
===...that didn't make a difference===
* Using other wireless devices (I can repro the problem with the "live view" of some Nest Cameras and with web-based Stadia on a Chromebook)
* Dumping packets on the WAN interface of the FB2900 (I've confirmed that the FB2900 itself isn't introducing the extra latency)
*Swapping the modem from a HG612 to a Technicolor in bridge mode
===...that did make a difference===
* Connecting the phone, running Stadia, via '''wired Ethernet''' (high latency goes away because the problematic QoS marking has gone, DSCP field = 0)
* Setting a special feature on AAISP and the FireBrick - ''''IP over LCP'''' - this sends the IP traffic as control frames. (high latency goes away). This IPoLCP is a niche feature and has been used in the past to help diagnose problems in back-haul networks: eg: https://www.revk.uk/2015/02/congestion-case-study.html
* '''Changing the DSCP value''' - only packets marked CS6 (192 to 195) are affected. Using values higher or lower and the latency goes away
*'''Getting the FireBrick 2900 to set the DSCP field to 0''' - this feature was added to the alpha release of FireBrick software on 8th December 2021
=== Things that were not tried ===
*Migrating the line to BT back-haul - this would have fixed the problem for the customer, but would not have fixed the problem in the TalkTalk network or the Aruba access point. Being engineers - we like to fix problems!
*Changing the DSCP setting on the Aruba Instant-On Access Points - there is no setting, the DSCP field is being set automatically.
== Further tests ==
With A&A having a lively IRC channel, we asked customers to try our ping test to see
* All AAISP TalkTalk VDSL lines tested showed latency
* All AAISP TalkTalk ADSL lines tested showed latency
* No AAISP Ethernet lines showed latency
'''Another TalkTalk partner, like us:'''
rtt min/avg/max/mdev = 16.017/9689.727/'''19303.816'''/5771.600 ms, pipe 449
'''Another TalkTalk partner, like us, actually sees 48 seconds!'''
So, seems this is a problem within TalkTalks's UK network, probably affecting all TalkTalk ADSL and VDSL lines in the UK.▼
round-trip min/avg/max/std-dev = 17.830/33807.255/48113.126/15886.310 ms
▲So, seems this is a problem within TalkTalks's
=Deep Packet Inspection Concerns=
One concern that this issue raises is that TalkTalk are inspecting further in to the packet than we'd like or need them to. This may well be by mistake (a miss-configured router within TalkTalk's network), but this is something we're keen to understand and get to the bottom of.
=Theories=
We and others have come up with a few theories as to what could be happening in TalkTalk:
*TalkTalk probably want to process cs6 tagged traffic for their own traffic, and probably can't differentiate between their own and customer traffic on some of the devices within their core network.
=The fix=
There are seemingly two faults here:
# Aruba adding the DSCP field - which
# (in our opinion) TalkTalk should not be looking at the DSCP field. AAISP are taking this up with TalkTalk.
==Fault raised with TalkTalk==
===December 7th===
A&A got in touch with TalkTalk directly by emailing TalkTalk's escalations department and our Service Manager. (There was no point in reporting an individual line fault via the normal channels for broadband fault.)
===December 9th===
TalkTalk are still investigating and are hoping to get back to us next week. They are assuring us on the point about packet inspection, that their policy remains the same in that they are not inspecting traffic in any way and that nor do they have the means to do so.
'''To be continued....'''
==Ticket open with Aruba==
Apparently These Aruba Access Points do have the ability to open a CLI on the device and disable DSCP.
However, it's not so simple and Aruba advise against it because the CLI requires an interactively-generated token
from their support staff, and changing the setting back would require another support call.
Current Solution: Customer is currently using the FireBrick feature to set the DSCP field to 0.
|
edits