Vyatta 5400 and interface inbound discards

Recently I was investigating alerts that were being generated for inbound interface discards on multiple interfaces and multiple Vyatta 5400 devices. There were not any noticeable performance issues on traffic passing through the devices. The discards would report in SNMP, show interface ethernet ethX, and ifconfig outputs. An example show interface ethernet ethX output I was reviewing is below.

vyatta@FW01:~$ sh int ethernet eth0
eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
link/ether 00:50:56:x:x:x brd ff:ff:ff:ff:ff:ff
inet 172.x.x.x/24 brd 172.x.x.x scope global eth0
inet6 fe80::250:56ff:x:x/64 scope link
valid_lft forever preferred_lft forever
Last clear: Wed Oct 29 10:55:13 GMT 2014
Description: MGMT
RX: bytes packets errors dropped overrun mcast
   242863    3664      0     163       0     0
TX: bytes packets errors dropped carrier collisions
   128065     701      0       0       0          0

I was not finding any other statistics that would match up with the quantity of discards being reported. Here are a few of the commands I looked at to look for matching discard counters.

vyatta@FW01:~$ sh int ethernet eth0 queue
vyatta@FW01:~$ sh int ethernet eth0 statistics
vyatta@FW01:~$ sh queueing
vyatta@FW01:~$ sudo netstat -s

While researching where to go next I was reminded that the Vyatta 5400 is at it’s heart a Linux device server. I found a few references that beginning in the Linux kernel version 2.6.36 there were more error conditions added to this counter in the kernel.

The rx_dropped counter shows statistics for dropped frames because of: (Beginning with kernel 2.6.37)
(http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=caf586e5f23cebb2a68cbaf288d59dbbf2d74052)
Softnet backlog full — (Measured from /proc/net/softnet_stat)
Bad / Unintended VLAN tags
Unknown / Unregistered protocols
IPv6 frames when the server is not configured for IPv6
If any frames meet those conditions, they are dropped before the protocol stack and the rx_dropped counter is incremented.

via http://www.novell.com/support/kb/doc.php?id=7007165

When taking a look I found that the version of Vyatta code in use contains the Linux kernel version 3.3.8. The only way to verify if these conditions are causing the counter to increment is to put the interface into promiscuous mode. Since this was a production system I instead looked for neighboring Linux systems in the same subnet, and found they do not report the same level of discards. It appears I found my the reason behind this counter incrementing. This issue looked more urgent as we measure this counter in percentage of packets discarded and this interface does not have much traffic flowing through it. This made the percentages very high which the discarded frames where non-production impacting frames. This issue was a reminder that it is good to remember the underlying Operating System even if it is masked by a custom CLI.

A meditation on the interface discard counter

I find the interface discard counterĀ a deceptively complex counter. When you ask people what the counter means the usual answer is that you are over running the throughput capability of an interface. Which matched pretty closely to the definition in the IF-MIB SNMP MIB.

The number of inbound packets which were chosen
to be discarded even though no errors had been
detected to prevent their being deliverable to a
higher-layer protocol.  One possible reason for
discarding such a packet could be to free up
buffer space.

ifInDiscards : https://www.ietf.org/rfc/rfc1213.txt

The description from the MIB is often the cause of this counter incrementing, however as devices get more powerful and circuits keep increasing in size, this description is becoming less applicable. There are many other issues that have been lumped into this counter, all of these other issues are vendor, platform, and configuration dependent. Some examples I have found are,

  • ASA Dispatch Unit CPU over utilization
  • ASA ACL/ASP packet drops
  • QoS on an IOS interface can cause an elevated (purposeful) number of frames dropped
  • An ASIC shared between ports on a switch is being over utilized
  • L2/L3 packet handling on some Linux kernels and some virtual network platforms

Looking at this list the interface discard counter starts to look more like a check engine light for a device or interface. As with the check engine light it is important to understand all of the that data your devices are presenting, and build good baselines of the statistics for your system. Ethan Banks has some good thoughts on data baselines in a post titled The Importance of Knowing Baselines.