CQM Graphs: Difference between revisions
Appearance
Content deleted Content added
Roadhog863 (talk | contribs) m Fix dead link |
|||
(34 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
<indicator name="Diagnostics">[[File:menu-spanner.svg|link=:Category:Diagnostic Tools|30px|Back up to the Diagnostics Category]]</indicator> |
|||
⚫ | |||
⚫ | The FireBrick 6000 routers we use provide us with Continuous Quality Monitoring of every broadband circuit. This allows us to track the quality of each and every connection in great detail. The router itself produces the graphs in real time, and can also provide csv files with accurate data for each graph. |
||
⚫ | |||
⚫ | |||
{{CPbox|#Click on the line you want to view |
|||
#Clicking on the graph again will show 7 days worth of graphs with options there to view 30 and 60 days}} |
|||
==How does it work?== |
==How does it work?== |
||
Our router sends an LCP echo (like a ping) every second while a line is active. Your router replies. We track how long it takes for each reply to arrive, and how many are lost. These results are collated into 100 second samples and shown as a graph like the one on this page. The graph shows us lots of information about the line, and gives a history covering the last 24 hours. |
Our router sends an LCP echo (a bit like a ping) every second while a line is active. Your router replies. We track how long it takes for each reply to arrive, and how many are lost. These results are collated into 100 second samples and shown as a graph like the one on this page. The graph shows us lots of information about the line, and gives a history covering the last 24 hours. |
||
==What information is on the graph?== |
==What information is on the graph?== |
||
[[File:Cqm-screen-shot-notes.png| |
[[File:Cqm-screen-shot-notes.png|1200px|CQM Graphs]] |
||
⚫ | |||
⚫ | |||
===A note on Tx/Rx download/upload=== |
===A note on Tx/Rx download/upload=== |
||
Line 20: | Line 24: | ||
===Pins=== |
===Pins=== |
||
Staff are able to add 'pins' to graphs. This is a useful way of adding notes to particular times on a graph. We used to let customers add their own pins, but this feature has been removed for the time being. |
|||
[[File:Clueless-adding-pin.png|none|frame|Adding a pin to a graph]] |
|||
Pins are added on the Usage page. Simply click on the graph where you'd like to add a pin and enter in the details. |
|||
=ICMP Graphs= |
=ICMP Graphs= |
||
Optionally, |
Optionally, staff can enable ICMP ping graphs. These will graph a normal ping to the customer WAN IP address. They tend to have a salmon coloured background: |
||
[[File:Cqm-icmp.png|none|frame|An graph created from ICMP Pings to a WAN address]] |
[[File:Cqm-icmp.png|none|frame|An graph created from ICMP Pings to a WAN address]] |
||
Customers |
Customers may need to set their firewall to allow pings from us; currently we ping from 90.155.53.8. |
||
Sometimes a normal CQM graph may have a salmon background - this would usually be when the line is logged in to our test LNS. |
Sometimes a normal CQM graph may have a salmon background - this would usually be when the line is logged in to our test LNS. |
||
Line 37: | Line 39: | ||
[[File:CQM1.png|none|frame|Lots of upload at the end of the day]] |
[[File:CQM1.png|none|frame|Lots of upload at the end of the day]] |
||
(Not an ADSL |
(Not an ADSL fault.) The example above shows a line with occasional short uploads causing spikes in peak latency, and then a sustained upload starting at around 6pm and causing high latency (a queue in the router). At 8pm there was more upload filling the link causing higher latency still and some loss (normal when the link is full). This is normal. Also see: [[Packet Loss]] |
||
Here is another, an FTTC line filling the up link whilst doing a backup ( |
Here is another, an FTTC line filling the up link whilst doing a backup (10 Mbit/s) |
||
[[File:CQM-FTTC-upload.png|none|frame|Lots of upload in the morning - a backup without any traffic shaping on the client end]] |
[[File:CQM-FTTC-upload.png|none|frame|Lots of upload in the morning - a backup without any traffic shaping on the client end]] |
||
This line is doing a large backup from just before 6am. The dark red horizontal line shows the traffic, during this time there is lots of packet loss (red) and the light blue at the bottom is showing high latency. So, whilst the backup is happening the line has about 50% packet loss and around 300 ms of latency. Using the line for things like web browsing at this time will be slow and sluggish. However, this is not a fault per-se. It is normal for a line to appear slow when it's being filled with traffic. However, this traffic may be unknown, it may not be a backup, but could be a virus or peer-to-peer traffic. You can do a [[Traffic Capture]] to see what the traffic is, or ask Support to Help. |
|||
===Redcare=== |
===Redcare=== |
||
Line 67: | Line 71: | ||
===Lots of Drops=== |
===Lots of Drops=== |
||
[[File:Cqm-dropping2.png]] |
[[File:Cqm-dropping2.png]] |
||
⚫ | |||
⚫ | |||
===Only Dropping during the day=== |
===Only Dropping during the day=== |
||
[[File:CQM4.png]] |
[[File:CQM4.png]] |
||
(Probably not an ADSL Fault.) If a line is dropping during the day, and maybe just Monday to Friday, then it's probably not going to be an upstream problem. This could be caused by interference |
(Probably not an ADSL Fault.) If a line is dropping during the day, and maybe just Monday to Friday, then it's probably not going to be an upstream problem. This could be caused by interference or bad wiring on site. Check things like the phone line and extensions. Put the router into the master socket and unplug all other phones. Maybe change filter. |
||
=== |
===Interleaving being applied=== |
||
[[File:Interleaving.png|none|frame|This line dropped shortly after 11am, and reconnected with higher latency. This could be interleaving being applied]] |
|||
For more info see: [[Interleaving]] |
|||
===Heavy packet loss=== |
|||
Also see [[Packet Loss]] |
Also see [[Packet Loss]] |
||
[[File:CQM-heavyloss.png|none|frame|This is showing more than 50% packet loss with no usage. A problem!]] |
[[File:CQM-heavyloss.png|none|frame|This is showing more than 50% packet loss with no usage. A problem!]] |
||
[[File:Cqm-packetloss-contactfault.png|none|frame|This line has loss due to a |
[[File:Cqm-packetloss-contactfault.png|none|frame|This line has loss due to a battery contact fault. A copper line test suggests that this should be reported to Openreach as a fault.]] |
||
[[File:Cqm-VPcongestionOrfault.png|none|frame|Packet loss when a downloading but not filling the line]] |
|||
This is a rather strange one. There is packet loss when downloading, however the download is not filling the link, but there is still loss. This is unusual. This turned out to be congestion on the virtual path within BT, or it may have been misconfigured. It took our escalation staff three months to convince BT that the fault was within their network. |
|||
===Phone line half connected - DIS in one leg=== |
|||
[[File:CQM-FTTC-DIS-in-one-leg.png|none|frame|FTTC running on just a single leg/wire]] |
|||
Here we have a perfectly good FTTC, running at 80M. However, in the evening the line goes rather crazy! Here is what happened: |
|||
*18:20 The end user was doing some tidying up of wiring and accidental disconnected one of the wires that make up the pair used for the phone line |
|||
**The line dropped a few times and reconnected, but the sync speed dropped to from 20 Mbps up and 80 Mbps down to 600 kbps up and 23 kbps down! |
|||
*The line continued working for a short while, up until: |
|||
*19:20 where a backup job was started which started uploading data |
|||
**As the sync speed was so low, the backup filled the link and high latency (blue) ensued. |
|||
**At this point, the end user noticed something was not right. |
|||
*Looking at the times on the graph gave a clue to the end user what had happened. They had accidentally disconnected one of the wires of the phone line and surprisingly the FTTC was actually still in sync, logged in and passing traffic - just at low speeds. |
|||
*20:00 The wiring was repaired and the FTTC came back in to sync at the usual high rates. |
|||
===Faulty Switch on the LAN=== |
===Faulty Switch on the LAN=== |
||
Line 85: | Line 109: | ||
(Not an ADSL Fault.) This latency and loss was caused by a fault Netgear switch on the LAN side of a Vigor router! The switch had failed to the point where it wouldn't talk gigabit but would talk 100M unreliably. Guess it was maxing router CPU perhaps. It's unknown how this affected the Vigor, but perhaps the switch was faulty enough to upset the Vigor causing it to delaying or not replying to the LCP echos. In this case unplugging the LAN side of the router would show a normal looking graph, indicating the fault is somehow caused by something on the LAN. |
(Not an ADSL Fault.) This latency and loss was caused by a fault Netgear switch on the LAN side of a Vigor router! The switch had failed to the point where it wouldn't talk gigabit but would talk 100M unreliably. Guess it was maxing router CPU perhaps. It's unknown how this affected the Vigor, but perhaps the switch was faulty enough to upset the Vigor causing it to delaying or not replying to the LCP echos. In this case unplugging the LAN side of the router would show a normal looking graph, indicating the fault is somehow caused by something on the LAN. |
||
===Line dropped and speed changed=== |
|||
[[File:CQM-speed-change.png|none|frame|line dropped and came back slower and with higher latency]] |
|||
This FTTC line dropped at 01:50, and came back within a minute or two, however, the speed has dropped (horizontal black line at the top) and also the latency (blue at bottom) has increased. There could be a fault here, but in this case we can clearly see that something has happened. |
|||
===Affect of adjusting the 'rate' setting=== |
|||
[[File:Cqm-ratedrop.png|none|frame|Example of changing the line rate from 100%-95% reduced average latency when filling the link when downloading]] |
|||
===Running a speed test every 15 minutes=== |
|||
[[File:Cqm-speedtest15mins.png|none|frame|Running an automated speed test every 15 minutes - not a fault, but the test fills the link and causes loss so will be service affecting]] |
|||
==Other Examples== |
==Other Examples== |
||
Line 104: | Line 139: | ||
|[[File:Cqm-dropping.png]] |
|[[File:Cqm-dropping.png]] |
||
|Going off line is shown in purple, and this is often associated with packet loss (red). Where a line has occasional drops they are shown as purple lines. However in some case a line can deteriorate over a period of time, staying on line less and less until solid purple (off line). On the live graphs a line that is currently off line has a red square in the bottom right corner where as a line that is on-line has a green square. |
|Going off line is shown in purple, and this is often associated with packet loss (red). Where a line has occasional drops they are shown as purple lines. However, in some case a line can deteriorate over a period of time, staying on line less and less until solid purple (off line). On the live graphs a line that is currently off line has a red square in the bottom right corner where as a line that is on-line has a green square. |
||
|- |
|- |
||
|[[File:Cqm-congestion.png]] |
|[[File:Cqm-congestion.png]] |
||
Line 131: | Line 166: | ||
|Purple on the graph is off line, and this can be short blips if a line loses sync or longer. Notes (pins) are often added to graphs to explain why a line is off line if we know, especially when we are investigating a fault. The notes on this graph told when the BT engineer arrived and left. |
|Purple on the graph is off line, and this can be short blips if a line loses sync or longer. Notes (pins) are often added to graphs to explain why a line is off line if we know, especially when we are investigating a fault. The notes on this graph told when the BT engineer arrived and left. |
||
|[[File:Cqm-heating-small.png]] |
|[[File:Cqm-heating-small.png]] |
||
|'''Regular drops repeating every day''' e.g. Central Heating causing drops. [ |
|'''Regular drops repeating every day''' e.g. Central Heating causing drops. [[:File:Cqm-heating.png |View a week of these graphs]] Here there is interference as the central heating goes on, the drops are regular - twice a day, at the time the heating goes on. This actually highlighted a fault in the central heating for this customer as the graphs showed no drops for 2 days running! |
||
|} |
|} |
||
Line 138: | Line 173: | ||
[[File:CQM-evening-drops.png|none|200px|Evening Drops]] |
[[File:CQM-evening-drops.png|none|200px|Evening Drops]] |
||
===Faulty Card in a TalkTalk LTS=== |
|||
[[File:Cqm-faulty-lts-card.png|none|800px|Card in a TalkTalk LTS ]] |
|||
Here we have three days of graphs, you can see packet loss starting at around 2:50AM, which then gets worse two days later and is then fixed. |
|||
This was a faulty line card in a TalkTalk LTS - Only a small number of circuits were affected with packetloss due to this though. |
|||
The time ties in with logs from TalkTalk's LTS: |
|||
<syntaxHighlight> |
|||
8 alarms currently active |
|||
Alarm time Class Description |
|||
2023-06-29 02:57:46 BST Minor FPC 9 Minor Errors |
|||
2023-06-29 02:52:29 BST Minor FPC 1 Minor Errors |
|||
2023-06-29 02:52:25 BST Minor FPC 10 Minor Errors |
|||
2023-06-29 02:52:23 BST Minor FPC 8 Minor Errors |
|||
2023-06-29 02:52:22 BST Minor FPC 11 Minor Errors |
|||
2023-06-29 02:52:22 BST Minor FPC 0 Minor Errors |
|||
2023-06-29 02:52:22 BST Minor FPC 3 Minor Errors |
|||
2023-06-29 02:52:02 BST Major FPC 10 Major Errors |
|||
</syntaxHighlight> |
|||
From 9AM on day 3, the interface that we have with TalkTalk started reporting input discards. |
|||
[[File:Discards.jpg|none|400px|Interface discards ]] |
|||
The faulty line card was taken out of service, and the packet loss was no longer. |
|||
The incident was: https://aastatus.net/42546 |
|||
=Other Information= |
=Other Information= |