Jump to content

This is the support site for Andrews & Arnold Ltd, a UK Internet provider. Information on these pages is generally for our customers but may be useful to others, enjoy!

CQM Graphs: Difference between revisions

Content deleted Content added
AA-Andrew (talk | contribs)
m Fix dead link
 
(13 intermediate revisions by 2 users not shown)
Line 1: Line 1:
__NOTOC__<indicator name="Diagnostics">[[File:menu-spanner.svg|link=:Category:Diagnostic Tools|30px|Back up to the Diagnostics Category]]</indicator>
<indicator name="Diagnostics">[[File:menu-spanner.svg|link=:Category:Diagnostic Tools|30px|Back up to the Diagnostics Category]]</indicator>


The FireBrick 6000 routers we use provide us with Continuous Quality Monitoring of every broadband circuit. This allows us to track the quality of each and every connection in great detail. The router itself produces the graphs in real time, and can also provide csv files with accurate data for each graph.
The FireBrick 6000 routers we use provide us with Continuous Quality Monitoring of every broadband circuit. This allows us to track the quality of each and every connection in great detail. The router itself produces the graphs in real time, and can also provide csv files with accurate data for each graph.


Customers and Staff can view these graphs in near real time (updated every 100 seconds), and can view historical graphs.
Customers and staff can view these graphs in near real time (updated every 100 seconds), and can view historical graphs.


{{CPbox|#Click on the line you want to view
{{CPbox|#Click on the line you want to view
Line 13: Line 13:
==What information is on the graph?==
==What information is on the graph?==


[[File:Cqm-screen-shot-notes.png|800px|frame|CQM Graphs]]
[[File:Cqm-screen-shot-notes.png|1200px|CQM Graphs]]


Each column (pixel) represents 100 seconds of samples. The hour of day is shown at the bottom, and the day and date shown next to midnight in the graph. There is additional text superimposed on the graph such as a circuit ID.
Each column (pixel) represents 100 seconds of samples. The hour of day is shown at the bottom, and the day and date shown next to midnight in the graph. There is additional text superimposed on the graph such as a circuit ID.
Line 24: Line 24:


===Pins===
===Pins===
Staff are able to add 'Pins' to graphs. This is a useful way of adding notes to particular times on a graph. We used to let customers add their own pins, but this feature has been removed for the time being.
Staff are able to add 'pins' to graphs. This is a useful way of adding notes to particular times on a graph. We used to let customers add their own pins, but this feature has been removed for the time being.


=ICMP Graphs=
=ICMP Graphs=
Optionally, Staff can enable ICMP ping graphs, these will graph a normal ping to the customer WAN IP address. These tend to have a salmon colour background:
Optionally, staff can enable ICMP ping graphs. These will graph a normal ping to the customer WAN IP address. They tend to have a salmon coloured background:


[[File:Cqm-icmp.png|none|frame|An graph created from ICMP Pings to a WAN address]]
[[File:Cqm-icmp.png|none|frame|An graph created from ICMP Pings to a WAN address]]


Customers would need to allow their firewall to allow our LNSs to send ping them, currently 90.155.53.51 - 90.155.53.54
Customers may need to set their firewall to allow pings from us; currently we ping from 90.155.53.8.


Sometimes a normal CQM graph may have a salmon background - this would usually be when the line is logged in to our test LNS.
Sometimes a normal CQM graph may have a salmon background - this would usually be when the line is logged in to our test LNS.
Line 39: Line 39:
[[File:CQM1.png|none|frame|Lots of upload at the end of the day]]
[[File:CQM1.png|none|frame|Lots of upload at the end of the day]]


(Not an ADSL Fault.) The example above show a line with occasional short uploads causing spikes in peak latency, and then a sustained upload starting at around 6pm and causing high latency (queue in the router). At 8pm there was more upload filling the link causing higher latency still and some loss (normal when the link is full). This is normal. Also see: [[Packet Loss]]
(Not an ADSL fault.) The example above shows a line with occasional short uploads causing spikes in peak latency, and then a sustained upload starting at around 6pm and causing high latency (a queue in the router). At 8pm there was more upload filling the link causing higher latency still and some loss (normal when the link is full). This is normal. Also see: [[Packet Loss]]


Here is another, an FTTC line filling the up link whilst doing a backup (10Mbit/s)
Here is another, an FTTC line filling the up link whilst doing a backup (10 Mbit/s)


[[File:CQM-FTTC-upload.png|none|frame|Lots of upload in the morning - a backup without any traffic shaping on the client end]]
[[File:CQM-FTTC-upload.png|none|frame|Lots of upload in the morning - a backup without any traffic shaping on the client end]]


This line is doing a large backup from just before 6am. The dark red horizontal line shows the traffic, during this time there is lots of packet loss (red) and the light blue at the bottom is showing high latency. So, whilst the backup is happening the line has about 50% packet loss and around 300ms of latency. Using the line for things like web browsing at this time will be slow and sluggish. However, this is not a fault per-se. It is normal for a line to appear slow when it's being filled with traffic. However, this traffic may be unknown, it may not be a backup, but could be a virus or peer-to-peer traffic. You can do a [[Traffic Capture]] to see what the traffic is, or ask Support to Help.
This line is doing a large backup from just before 6am. The dark red horizontal line shows the traffic, during this time there is lots of packet loss (red) and the light blue at the bottom is showing high latency. So, whilst the backup is happening the line has about 50% packet loss and around 300 ms of latency. Using the line for things like web browsing at this time will be slow and sluggish. However, this is not a fault per-se. It is normal for a line to appear slow when it's being filled with traffic. However, this traffic may be unknown, it may not be a backup, but could be a virus or peer-to-peer traffic. You can do a [[Traffic Capture]] to see what the traffic is, or ask Support to Help.


===Redcare===
===Redcare===
Line 72: Line 72:
[[File:Cqm-dropping2.png]]
[[File:Cqm-dropping2.png]]


This line does have a fault. It is dropping sync throughout the day. In this type of case, go through the usual checks, and AAISP will report a fault, which will probably need a BT SFI Engineer to atend site.
This line does have a fault. It is dropping sync throughout the day. In this type of case, go through the usual checks, and AAISP will report a fault, which will probably need a BT SFI engineer to atend site.


===Only Dropping during the day===
===Only Dropping during the day===
[[File:CQM4.png]]
[[File:CQM4.png]]


(Probably not an ADSL Fault.) If a line is dropping during the day, and maybe just Monday to Friday, then it's probably not going to be an upstream problem. This could be caused by interference of bad wiring on site. Check things like the phone line and extensions. Put the router in master socket and to unplug all other phones. Maybe change filter.
(Probably not an ADSL Fault.) If a line is dropping during the day, and maybe just Monday to Friday, then it's probably not going to be an upstream problem. This could be caused by interference or bad wiring on site. Check things like the phone line and extensions. Put the router into the master socket and unplug all other phones. Maybe change filter.


===Interleaving being applied===
===Interleaving being applied===
Line 83: Line 83:
For more info see: [[Interleaving]]
For more info see: [[Interleaving]]


===Heavy Packetloss===
===Heavy packet loss===
Also see [[Packet Loss]]
Also see [[Packet Loss]]
[[File:CQM-heavyloss.png|none|frame|This is showing more than 50% packet loss with no usage. A problem!]]
[[File:CQM-heavyloss.png|none|frame|This is showing more than 50% packet loss with no usage. A problem!]]


[[File:Cqm-packetloss-contactfault.png|none|frame|This line has loss due to a Battery Contact Fault. A copper line test suggest that this should be reported to Openreach as a fault.]]
[[File:Cqm-packetloss-contactfault.png|none|frame|This line has loss due to a battery contact fault. A copper line test suggests that this should be reported to Openreach as a fault.]]


[[File:Cqm-VPcongestionOrfault.png|none|frame|Packet loss when a downloading but not filling the line]]
[[File:Cqm-VPcongestionOrfault.png|none|frame|Packet loss when a downloading but not filling the line]]


This is a rather strange one. There is packetloss when downloading, however the download is not filling the link, but there is still loss, this is unusual. This turned out to be congestion on the Virtual Path within BT, or it may have been miss-configured. It took our escalation staff 3 months to convince BT that the fault was within their network.
This is a rather strange one. There is packet loss when downloading, however the download is not filling the link, but there is still loss. This is unusual. This turned out to be congestion on the virtual path within BT, or it may have been misconfigured. It took our escalation staff three months to convince BT that the fault was within their network.


===Phone line half connected - DIS in one leg===
===Phone line half connected - DIS in one leg===
[[File:CQM-FTTC-DIS-in-one-leg.png|none|frame|FTTC running on just a single leg/wire]]
[[File:CQM-FTTC-DIS-in-one-leg.png|none|frame|FTTC running on just a single leg/wire]]
Here we have a perfectly good FTTC, running at 80M. However, in the evening the line goes rather crazy! Here is what happened:
Here we have a perfectly good FTTC, running at 80M. However, in the evening the line goes rather crazy! Here is what happened:
*18:20 End user was doing some tidying up of wiring and accidental disconnected one of the wires that make up the pair used for the phone line
*18:20 The end user was doing some tidying up of wiring and accidental disconnected one of the wires that make up the pair used for the phone line
**The line dropped a few times and reconnected, but the sync speed dropped to from 20M up and 80M down to 600K up and 23M down!
**The line dropped a few times and reconnected, but the sync speed dropped to from 20 Mbps up and 80 Mbps down to 600 kbps up and 23 kbps down!
*The line continued working for a short while, up until:
*The line continued working for a short while, up until:
*19:20 where a backup job was started which started uploading data
*19:20 where a backup job was started which started uploading data
**As the sync speed was so low, the backup filled the link and high latency (blue) ensued.
**As the sync speed was so low, the backup filled the link and high latency (blue) ensued.
**At the point the end user noticed something was not right.
**At this point, the end user noticed something was not right.
*Looking at the times on the graph gave a clue to the end user what had happened. They had accidentally disconnected one of the wires of the phone line and surprisingly the FTTC was actually still in sync, logged in and passing traffic - just at low speeds.
*Looking at the times on the graph gave a clue to the end user what had happened. They had accidentally disconnected one of the wires of the phone line and surprisingly the FTTC was actually still in sync, logged in and passing traffic - just at low speeds.
*20:00 The wiring was repaired and the FTTC came back in to sync at the usual high rates.
*20:00 The wiring was repaired and the FTTC came back in to sync at the usual high rates.
Line 166: Line 166:
|Purple on the graph is off line, and this can be short blips if a line loses sync or longer. Notes (pins) are often added to graphs to explain why a line is off line if we know, especially when we are investigating a fault. The notes on this graph told when the BT engineer arrived and left.
|Purple on the graph is off line, and this can be short blips if a line loses sync or longer. Notes (pins) are often added to graphs to explain why a line is off line if we know, especially when we are investigating a fault. The notes on this graph told when the BT engineer arrived and left.
|[[File:Cqm-heating-small.png]]
|[[File:Cqm-heating-small.png]]
|'''Regular drops repeating every day''' e.g. Central Heating causing drops. [http://wiki.aa.org.uk/File:Cqm-heating.png View a week of these graphs] Here there is interference as the central heating goes on, the drops are regular - twice a day, at the time the heating goes on. This actually highlighted a fault in the central heating for this customer as the graphs showed no drops for 2 days running!
|'''Regular drops repeating every day''' e.g. Central Heating causing drops. [[:File:Cqm-heating.png |View a week of these graphs]] Here there is interference as the central heating goes on, the drops are regular - twice a day, at the time the heating goes on. This actually highlighted a fault in the central heating for this customer as the graphs showed no drops for 2 days running!
|}
|}


Line 173: Line 173:


[[File:CQM-evening-drops.png|none|200px|Evening Drops]]
[[File:CQM-evening-drops.png|none|200px|Evening Drops]]

===Faulty Card in a TalkTalk LTS===

[[File:Cqm-faulty-lts-card.png|none|800px|Card in a TalkTalk LTS ]]

Here we have three days of graphs, you can see packet loss starting at around 2:50AM, which then gets worse two days later and is then fixed.

This was a faulty line card in a TalkTalk LTS - Only a small number of circuits were affected with packetloss due to this though.

The time ties in with logs from TalkTalk's LTS:
<syntaxHighlight>
8 alarms currently active
Alarm time Class Description
2023-06-29 02:57:46 BST Minor FPC 9 Minor Errors
2023-06-29 02:52:29 BST Minor FPC 1 Minor Errors
2023-06-29 02:52:25 BST Minor FPC 10 Minor Errors
2023-06-29 02:52:23 BST Minor FPC 8 Minor Errors
2023-06-29 02:52:22 BST Minor FPC 11 Minor Errors
2023-06-29 02:52:22 BST Minor FPC 0 Minor Errors
2023-06-29 02:52:22 BST Minor FPC 3 Minor Errors
2023-06-29 02:52:02 BST Major FPC 10 Major Errors
</syntaxHighlight>

From 9AM on day 3, the interface that we have with TalkTalk started reporting input discards.

[[File:Discards.jpg|none|400px|Interface discards ]]

The faulty line card was taken out of service, and the packet loss was no longer.

The incident was: https://aastatus.net/42546


=Other Information=
=Other Information=