MTU

From AAISP Support Site
Revision as of 14:40, 25 February 2015 by AA-Andrew (talk | contribs)

This page will be about MTU, it is currently work in progress.

Overview

In simple terms, when you send information from one place on the internet (e.g. a web server) to another (e.g. your computer) the data is broken up in to packets. The sender breaks the overall data in to small chunks, and sends them over the internet. The internet only handles packets. The packets arrive at the other end, and they are put back together to make the whole of the original data. This could be a web page, email, image, or whatever. If a packet gets dropped, the sender works this out and resends it. That way you don't get gaps in what you receive. The process is managed by a protocol called TCP (Transmission Control Protocol). The packets get to the other by a protocol called IP (Internet Protocol). There are other protocols that work over the internet but they all come down to sending packets.

 

How big is a packet?

The simple answer is a maximum of 1,500 bytes. But it is more complex. The internet is actually a combination of routers and links. Each router has links to other routers. Your packets go from one place to another by being passed along from one router to another over these links.

One of the main types of link is Ethernet which is used for Local Area Networks (LAN). You have probably encountered a LAN as they are used in offices and homes, and connect things together. You probably use Ethernet to connect your broadband router to your computers in your house. Ethernet allows 1,500 byte packets to be carried. Internet providers use much faster links such as gigabit and 10 gigabit and these often allow bigger packets up to around 9,000 bytes. Some links on the internet are set up specially for certain traffic and have links that support packet sizes like 1,548 bytes.

The links, like Ethernet, limit the size of packets that can be sent. The size of packet that can get from one place to another without any difficulties depends on the smallest link in the chain. On the internet as a whole this is generally 1,500 bytes. But this depends where the packet goes and what links are used. The smallest link allowed is 576 bytes which used to be common for modems on dialup-internet.

The maximum size of packet you can send on a link is called the MTU (Maximum Transmission Unit). The maximum size you can receive is the MRU (Maximum Receive Unit). The terms MRU and MTU often get used interchangeably for obvious reasons.

What happens if a packet is too big?

Consider a packet being sent that is 1,500 bytes. It is passed from router to router until one that has a link which is less than 1,500 bytes. This creates a problem as it cannot send the packet on to the next router via that link. There are two options:-

  • (A) Don't send the packet and send an error message back saying it could not be sent
  • (B) Break the packet up in to smaller bits (called fragments) which will fit, and send these on to the next router.

The choice depends on the packet. For IPv6 packets you have to take option (A) and send an error. For IPv4 packets the packet has a flag called DF (Don't Fragment). If that is set you have to take option (A) and send an error. If not, then you take option (B) and fragment the packet.

If an error is sent back then the sending computer can try again sending smaller packets this time. If the packet is broken in to fragments then they still arrive at the destination and can be put back together by the receiving end. Either way the data gets through.

 

Is there another option?

In most cases links support 1,500 bytes, so there is not problem. However, where we know there is a specific issue there is another option, an option (C). It is a bit of a bodge, but works. We have added this as a user-settable feature on the line so you can contol if you want us to work-around the issue or not. This is mainly for where the link from us to you is restricted, typically to 1,492 bytes, for PPPoE.

It works by exploiting the fact that at the start of a TCP session each end tells the other the MTU it can handle (actually the MSS which is the TCP payload but one is derived from the other). When we have the fix enabled we check the initial TCP handshake packets and we change them if the MSS specified would be too big. This means each end may say they handle 1,500 bytes, but the other hears that they handle, say, 1,492 bytes.

When do you get small links? The internet as a whole generally works with 1,500 byte packets with no problem, but there are cases where smaller MTU links are used.

One is a VPN (Virtual Private Network). These create links that connect computers (routers) virtually. The link is not real but involves taking packets and wrapping them up in another packet which is sent over the internet to the other end where it is unwrapped. Much like putting a letter in an envelope and then putting it in another envelope - you need a bigger envelope on the outside. This means that if the outside envelope has to fit in 1,500 bytes, the inner one can't be as big, and that makes a link with a small MTU.

The other main example is dialup and broadband links. In practice a broadband link works much like a dialup line as it uses a protocol called PPP (Point to Point Protocol). Part of this is each end telling the other its MRU. I.e. how big a packet it can receive. There are a number of reasons a router might decide to say it cannot handle the full 1,500 bytes:-

  • The router could be using PPPoE (PPP over Ethernet) bridging which requires an extra 8 byte header for PPP and then sends the packet over 1,500 byte Ethernet, so the packets sent in the PPP layer can be at most 1,492 bytes.
  • Some part of the link could require the use of a smaller MTU, typically because PPPoE is being used within a back-haul network (e.g. Be lines), so the router has to be set to use 1,492.
  • The router could simply have a default of 1,492 as PPPoE is common in many countries as the standard way to connect broadband lines. In the UK we usually use PPPoA (PPP over ATM) which allows the full 1,500 bytes.
  • The router could be stupid and not even allow you to change the default to the normal 1,500 bytes even though using PPPoA.
  • The router could have been deliberately set to a lower MTU by the owner for reasons of their own.
  • The router could be fine and BT could have messed up (see below)

Having a lower MTU is not necessarily a problem - as we said, either way the data gets through. But as soon as you don't have the standard 1,500 byte MTU you can run in to issues.

Why do things go wrong? If everything worked as it should a smaller MTU would not be any problem, but there are reasons why it does not:-

  • Some people running web servers (notably some banks) set up their network so that they block the error message that is sent back when a packet is too big. This would not be too bad if they did not also try and send 1,500 byte packets with the DF bit set. The result is the packet gets dropped when it hits a sub 1,500 MTU link and has to retry. Eventually it may try a smaller packet size but this could be 20 seconds later. This is a stupid network setup on the part of the person running the web server.
  • Some people set up their firewalls to block any fragmented packets. This is because it is hard to tell what a fragmented packet is as you don't have all of the data. However, it means that fragments don't work. Not everyone sends packets with DF to start with (a process called Path MTU discovery) so fragmentation happens. If you have a broadband link with less than 1,500 MTU and a firewall (or even the router) blocking fragments you are likely to have a problem with some places (notable MSN messenger).


Fragments are bad

Fragments are bad anyway, which is why IPv6 insists you always take option (A) and send an error message. Fragments create extra overhead as each packet has headers that have to be copied in to each fragment. They also work badly when a link is congested as dropping any fragment in a packet means the whole packet is lost. They also take up CPU time creating the fragments and putting them back together. All in all it is better if the sending end creates the smaller packets in the first place. This means Path MTU Discovery being used and the error message not being blocked! Fortunately IPv6 mandates this, so the next generation of internet protocol should not have the same issues.

PPPoE problems (technical)

One of the main causes of a reduced MTU is PPPoE (PPP over Ethernet). This is because Ethernet allows 1,500 byte payloads, ideal for an IP packet, but PPPoE has a header which takes a total of 8 bytes. This would make 1,508 bytes with a full 1,500 byte IP packet.

Unfortunately the specifications are not that helpful here. RFC1661 defines PPP and states If smaller packets are requested, an implementation MUST still be able to receive the full 1,500 octet information field. RFC2516 defines PPPoE The Maximum-Receive-Unit (MRU) option MUST NOT be negotiated to a larger size than 1,492. But they are not incompatible statements - negotiating 1,492 does not mean you don't have to accept 1,500 byte packets (as per RFC1661), but you can't send on if PPPoE bridging, for example, so logically you would have to fragment the IP packet. Most routers do not do that! This is one of the reasons we get problems with MTU being smaller than 1,500 bytes.

There are also ways to do over sized PPPoE using baby jumbo frames on Ethernet. The Ethernet specification still says 1,500 bytes maximum even for gigabit speeds, but it common for gigabit equipment to support jumbo frames - i.e. larger Ethernet packets typically up to around 9,000 bytes. This is more than you need to do PPPoE with 1,500 bytes - you only need 1,508. However there are other wrapping and tunnelling cases where just a bit more is useful. Baby jumbo support normally means a bit more than the usual 1,500.

As there is no real way to tell if baby jumbo frames are supported on an Ethernet, RFC4638 defines an extra option for PPPoE to negotiate this at the Ethernet level. Of course two ends could simply agree to handle slightly larger Ethernet frames by configuration as well. Sadly this is not always the same level of operation or the same equipment that does the MRU negotiation at the PPP level, and if that knows PPPoE is involved it will not negotiate more than 1,492 MRU as per RFC2516. So typically some configuration is needed.

The upshot of all this? It is possible to get BT FTTC (Fibre to the Cabinet) circuits (which use PPPoE) working on full 1,500 byte PPP by using modified pppd on the customer end, a suitable network card that will handle 1,508 byte frames at 10/100Mbit/s. We have done this! (Thanks to TonyHoyle and Simon, customers on irc, for tweaking pppd and testing this for us). The new FireBrick does, of course, support PPPoE with baby jumbo frames to handle 1,500 byte MTU and even bonds multiple lines. Using the right modem (and the DLINK 320B in bridge mode do this) you can negotiate and use 1508 bytes over ADSL as well.

What is my MTU?

If you can ping a box, you can check what the largest packet size you can get to it is. AAISP customers can ping 81.187.81.187, which is the next hop from your DSL line.

For example:

ping -c1 -M do -s 1472 81.187.81.187

^^ That's the ping command to remember. The rest of this section is a bit of info about that command.

Explanation:

-c1 In these examples, we've only used a count of one ping using the -c option.

-M do Use the path discovery options to ping. The option is -M. See the man page for all the options.

-s 1472 When setting size in ping, the size is the payload size - not the full packet size. The full packet size is payload + ICMP header size (28 bytes). The option for payload size is -s.

Note: ping is slightly different on different operating systems, the above is a Debian machine.

On OSX (Apple), to perform a ping with the same options, you can use:

ping -c1 -D -s 1472 81.187.81.187

On a Windows machine to perform a ping with the same options, you can use:

ping -n1 -f -l 1472 81.187.81.187

Two quick examples

Checking if you really have 1500 MTU:

% ping -c1 -M do -s 1472 81.187.81.187
PING 81.187.81.187 (81.187.81.187) 1472(1500) bytes of data. 
1480 bytes from 81.187.81.187: icmp_req=1 ttl=59 time=12.4 ms
--- 81.187.81.187 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms 
rtt min/avg/max/mdev = 12.491/12.491/12.491/0.000 ms


Checking if you have smaller than 1500 MTU:

% ping -c1 -M do -s 1472 81.187.81.187
PING 81.187.81.187 (81.187.81.187) 1472(1500) bytes of data.
From 90.155.53.53 icmp_seq=1 Frag needed and DF set (mtu = 1492)
--- 81.187.81.187 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

That error says "Frag needed and DF set (mtu = 1492)". So, with an MTU of 1492 we would want a payload of 1464 (1464+28=1492):

% ping -c1 -M do -s 1464 81.187.81.187
PING 81.187.81.187 (81.187.81.187) 1464(1492) bytes of data.
1472 bytes from 81.187.81.187: icmp_req=1 ttl=59 time=29.5 ms 
--- 81.187.81.187 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 29.595/29.595/29.595/0.000 ms

Yay a reply! To double check, lets try one byte larger:

% ping -c1 -M do -s 1465 81.187.81.187
PING 81.187.81.187 (81.187.81.187) 1465(1493) bytes of data.
From 90.155.42.36 icmp_seq=1 Frag needed and DF set (mtu = 1492)
--- 81.187.81.187 ping statistics ---
0 packets transmitted, 0 received, +1 errors

Yep, 1465 is too large a payload. It's interesting that the error is from itself (90.155.42.36) that time. The machine we sent the ping from must have received and remembered the ICMP error!


If you're unlucky, there may be no error message at all (these are from Debian and Ubuntu through a Netgear router to TTW):

$ ping -c1 -M do -s 1465 81.187.81.187
PING 81.187.81.187 (81.187.81.187) 1465(1493) bytes of data.

--- 81.187.81.187 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
$ ping -c1 -M do -s 1464 81.187.81.187
PING 81.187.81.187 (81.187.81.187) 1464(1492) bytes of data.
1472 bytes from 81.187.81.187: icmp_seq=1 ttl=63 time=55.4 ms

--- 81.187.81.187 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 55.448/55.448/55.448/0.000 ms

Backhaul networks

TalkTalk Wholesale

TTW's default MTU is 1492 bytes. It is possible to increase this to 1500 bytes by ticking 'TCPFix' with an MTU of '1500' on clueless. At the time of writing (Dec 2014) TTW does not support an MTU of 1508, so users of PPPoE cannot achieve an MTU of more than 1492 bytes.

[Above is the official position. However there are some experimental results which indicate one might be able to do better. Using a TG582n as the router running PPPoE and configuring it to support baby jumbo frames:

:eth ifconfig intf=eth_WAN mtu=1508
:ppp ifdetach intf=Internet
:ppp ifconfig intf=Internet mru=1500
:ppp ifattach intf=Internet

then PPP claims to successfully negotiate a 1500 octet MRU with TTW.

{Administrator}[ppp]=>:ppp iflist intf=Internet
Internet: dest  eth_WAN    [03:43:04]  retry : 10
    admin state = up    oper state = up    link state = connected
    flags = echo magic accomp restart mru addr route savepwd ipv4 ipv6 chap
    class = 12  echointerval = 10  echofail = 5 echototaltolerance = 50
    administrative mru = 1500  negotiation mru = 1500
    auth type = auto
    ...

Using the ping tests from above, a ping of 1468 octets (i.e. a packet size of 1496) now works but a ping of 1469 octets (packet size of 1497) doesn't. On further testing, one can receive 1500 octet packets, but can only send 1496 octet packets - i.e. MRU=1500, MTU=1496. This is despite the bridging modem used for the tests not claiming to support an MTU > 1500 octets and with PPPoE needing to send 1504 octets and receive 1508.

]