Changes

Jump to: navigation, search

CQM Graphs

2,104 bytes added, 14:25, 5 March 2018
m
What information is on the graph?
__NOTOC__<indicator name="Diagnostics">[[File:menu-spanner.svg|link=:Category:Diagnostic Tools|30px|Back up to the Diagnostics Category]]</indicator>
The FB6000 [[L2TP]] router provides us with Continuous Quality Monitoring (CQM). This allows us to track the quality of each and every connection in great detail. The router itself produces the graphs in real time, and can also provide csv files with accurate data for each graph.
 
The FB6000FireBrick [[L2TP]]6000 routerrouters provideswe use provide us with Continuous Quality Monitoring (CQM)of every broadband circuit. This allows us to track the quality of each and every connection in great detail. The router itself produces the graphs in real time, and can also provide csv files with accurate data for each graph.
 
Customers and Staff can view these graphs in near real time (updated every 100 seconds), and can view historical graphs.
 
{{CPbox|#Click on the line you want to view
#Clicking on the graph again will show a 7 day'sdays worth of graphs with options there to view 30 and 60 days}}
 
==How does it work?==
Our router sends an LCP echo (a bit like a ping) every second while a line is active. Your router replies. We track how long it takes for each reply to arrive, and how many are lost. These results are collated into 100 second samples and shown as a graph like the one on this page. The graph shows us lots of information about the line, and gives a history covering the last 24 hours.
 
==What information is on the graph?==
 
[[File:Cqm-screen-shot-notes.png|none|frame1200px|CQM Graphs]]
 
Each column (pixel) represents 100 seconds of samples. The hour of day is shown at the bottom, and the day and date shown next to midnight in the graph. There is additional text superimposed on the graph such as a circuit ID. There are 8 pieces of data shown for each 100 second sample as follows:-
 
Each column (pixel) represents 100 seconds of samples. The hour of day is shown at the bottom, and the day and date shown next to midnight in the graph. There is additional text superimposed on the graph such as a circuit ID. There are 8 pieces of data shown for each 100 second sample as follows:-
 
===A note on Tx/Rx download/upload===
 
===Pins===
Customers (and staff)Staff are able to add 'Pins' to graphs. This is a useful forway customersof to addadding notes thatto staffparticular see,times for example you can addon a pingraph. sayingWe 'Pluggedused routerto intolet thecustomers testadd socket'.their Theown notepins, willbut bethis shownfeature whenhas youbeen hoverremoved your mouse overfor the pintime being.
[[File:Clueless-adding-pin.png|none|frame|Adding a pin to a graph]]
Pins are added on the Usage page. Simply click on the graph where you'd like to add a pin and enter in the details.
 
=ICMP Graphs=
(Not an ADSL Fault.) The example above show a line with occasional short uploads causing spikes in peak latency, and then a sustained upload starting at around 6pm and causing high latency (queue in the router). At 8pm there was more upload filling the link causing higher latency still and some loss (normal when the link is full). This is normal. Also see: [[Packet Loss]]
 
Here is another, an FTTC line filling the up link whilst doing a backup (10Mb10Mbit/s)
 
[[File:CQM-FTTC-upload.png|none|frame|Lots of upload in the morning - a backup without any traffic shaping on the client end]]
[[File:CQM4.png]]
 
(Probably not an ADSL Fault.) If a line is dropping during the day, and maybe just Monday to Friday, then it's probably not going to be an upstream problem. This could be caused by interference of bad wingwiring on site. Check things like the phone line and extensions. Put the router in master socket and to unplug all other phones. Maybe change filter.
 
===Interleaving being applied===
[[File:Interleaving.png|none|frame|This line dropped shortly after 11am, and reconnected with higher latency. This could be interleaving being applied]]
For more info see: [[Interleaving]]
 
===Heavy Packetloss===
 
This is a rather strange one. There is packetloss when downloading, however the download is not filling the link, but there is still loss, this is unusual. This turned out to be congestion on the Virtual Path within BT, or it may have been miss-configured. It took our escalation staff 3 months to convince BT that the fault was within their network.
 
===Phone line half connected - DIS in one leg===
[[File:CQM-FTTC-DIS-in-one-leg.png|none|frame|FTTC running on just a single leg/wire]]
Here we have a perfectly good FTTC, running at 80M. However, in the evening the line goes rather crazy! Here is what happened:
*18:20 End user was doing some tidying up of wiring and accidental disconnected one of the wires that make up the pair used for the phone line
**The line dropped a few times and reconnected, but the sync speed dropped to from 20M up and 80M down to 600K up and 23M down!
*The line continued working for a short while, up until:
*19:20 where a backup job was started which started uploading data
**As the sync speed was so low, the backup filled the link and high latency (blue) ensued.
**At the point the end user noticed something was not right.
*Looking at the times on the graph gave a clue to the end user what had happened. They had accidentally disconnected one of the wires of the phone line and surprisingly the FTTC was actually still in sync, logged in and passing traffic - just at low speeds.
*20:00 The wiring was repaired and the FTTC came back in to sync at the usual high rates.
 
===Faulty Switch on the LAN===
 
(Not an ADSL Fault.) This latency and loss was caused by a fault Netgear switch on the LAN side of a Vigor router! The switch had failed to the point where it wouldn't talk gigabit but would talk 100M unreliably. Guess it was maxing router CPU perhaps. It's unknown how this affected the Vigor, but perhaps the switch was faulty enough to upset the Vigor causing it to delaying or not replying to the LCP echos. In this case unplugging the LAN side of the router would show a normal looking graph, indicating the fault is somehow caused by something on the LAN.
 
===Line dropped and speed changed===
[[File:CQM-speed-change.png|none|frame|line dropped and came back slower and with higher latency]]
 
This FTTC line dropped at 01:50, and came back within a minute or two, however, the speed has dropped (horizontal black line at the top) and also the latency (blue at bottom) has increased. There could be a fault here, but in this case we can clearly see that something has happened.
 
===Affect of adjusting the 'rate' setting===
[[File:Cqm-ratedrop.png|none|frame|Example of changing the line rate from 100%-95% reduced average latency when filling the link when downloading]]
 
===Running a speed test every 15 minutes===
[[File:Cqm-speedtest15mins.png|none|frame|Running an automated speed test every 15 minutes - not a fault, but the test fills the link and causes loss so will be service affecting]]
 
==Other Examples==
 
|[[File:Cqm-dropping.png]]
|Going off line is shown in purple, and this is often associated with packet loss (red). Where a line has occasional drops they are shown as purple lines. However, in some case a line can deteriorate over a period of time, staying on line less and less until solid purple (off line). On the live graphs a line that is currently off line has a red square in the bottom right corner where as a line that is on-line has a green square.
|-
|[[File:Cqm-congestion.png]]

Navigation menu