Vyatta 5400 and interface inbound discards

Recently I was investigating alerts that were being generated for inbound interface discards on multiple interfaces and multiple Vyatta 5400 devices. There were not any noticeable performance issues on traffic passing through the devices. The discards would report in SNMP, show interface ethernet ethX, and ifconfig outputs. An example show interface ethernet ethX output I was reviewing is below.

vyatta@FW01:~$ sh int ethernet eth0
eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
link/ether 00:50:56:x:x:x brd ff:ff:ff:ff:ff:ff
inet 172.x.x.x/24 brd 172.x.x.x scope global eth0
inet6 fe80::250:56ff:x:x/64 scope link
valid_lft forever preferred_lft forever
Last clear: Wed Oct 29 10:55:13 GMT 2014
Description: MGMT
RX: bytes packets errors dropped overrun mcast
   242863    3664      0     163       0     0
TX: bytes packets errors dropped carrier collisions
   128065     701      0       0       0          0

I was not finding any other statistics that would match up with the quantity of discards being reported. Here are a few of the commands I looked at to look for matching discard counters.

vyatta@FW01:~$ sh int ethernet eth0 queue
vyatta@FW01:~$ sh int ethernet eth0 statistics
vyatta@FW01:~$ sh queueing
vyatta@FW01:~$ sudo netstat -s

While researching where to go next I was reminded that the Vyatta 5400 is at it's heart a Linux device server. I found a few references that beginning in the Linux kernel version 2.6.36 there were more error conditions added to this counter in the kernel.

The rx_dropped counter shows statistics for dropped frames because of: (Beginning with kernel 2.6.37)
Softnet backlog full -- (Measured from /proc/net/softnet_stat)
Bad / Unintended VLAN tags
Unknown / Unregistered protocols
IPv6 frames when the server is not configured for IPv6
If any frames meet those conditions, they are dropped before the protocol stack and the rx_dropped counter is incremented.

via http://www.novell.com/support/kb/doc.php?id=7007165

When taking a look I found that the version of Vyatta code in use contains the Linux kernel version 3.3.8. The only way to verify if these conditions are causing the counter to increment is to put the interface into promiscuous mode. Since this was a production system I instead looked for neighboring Linux systems in the same subnet, and found they do not report the same level of discards. It appears I found my the reason behind this counter incrementing. This issue looked more urgent as we measure this counter in percentage of packets discarded and this interface does not have much traffic flowing through it. This made the percentages very high which the discarded frames where non-production impacting frames. This issue was a reminder that it is good to remember the underlying Operating System even if it is masked by a custom CLI.

A meditation on the interface discard counter

I find the interface discard counter a deceptively complex counter. When you ask people what the counter means the usual answer is that you are over running the throughput capability of an interface. Which matched pretty closely to the definition in the IF-MIB SNMP MIB.

The number of inbound packets which were chosen
to be discarded even though no errors had been
detected to prevent their being deliverable to a
higher-layer protocol.  One possible reason for
discarding such a packet could be to free up
buffer space.

ifInDiscards : https://www.ietf.org/rfc/rfc1213.txt

The description from the MIB is often the cause of this counter incrementing, however as devices get more powerful and circuits keep increasing in size, this description is becoming less applicable. There are many other issues that have been lumped into this counter, all of these other issues are vendor, platform, and configuration dependent. Some examples I have found are,

  • ASA Dispatch Unit CPU over utilization
  • ASA ACL/ASP packet drops
  • QoS on an IOS interface can cause an elevated (purposeful) number of frames dropped
  • An ASIC shared between ports on a switch is being over utilized
  • L2/L3 packet handling on some Linux kernels and some virtual network platforms

Looking at this list the interface discard counter starts to look more like a check engine light for a device or interface. As with the check engine light it is important to understand all of the that data your devices are presenting, and build good baselines of the statistics for your system. Ethan Banks has some good thoughts on data baselines in a post titled The Importance of Knowing Baselines.

Daily work log

Like a lot people I regularly have the problem at the end of a workday or even a workweek answering the question, "What did I do?" let alone what "What did I accomplish?" To find an answer for these questions I have started to keep a daily journal using both an automated report and a manual entry.  Between these two entries I tend to have a good idea of what occurred during my workday and work week. The idea to keep a journal was inspired in part by a post over at Productivityist: Taking Journaling to Another Level. It is useful to note that I am a Mac user so all off the tools that I use and have strung together to generate my automated report are Mac specific. That being said, I have found this this practice to be very helpful in keeping track of what I am or am not accomplishing and use it in concert with my daily and weekly review.

Screen Shot 2014-11-01 at 15.44.17I started this project by writing the automated script using an AppleScript that glues together all of the various apps I use (Mail.app, Omnifocus, Lync, and Apple Calendar) and generates an automated report of what email I sent, tasks I completed, my scheduled meetings and people I interacted with over IM. There were some limitations on what I could actually pull out of all the various apps, but in the end this is high-level list, if I need more I can the app that has the information I need. I kick off this script when my workday comes to an end and the script will collect all the data and stuff it into a new entry for the day in Day One.

After using the script for a while I would still have a lot of those "oh yeah I did XYZ task that day" moments when doing my daily and weekly reviews. The items I was not capturing were items such as phone calls, general thoughts, gripes, or half done tasks that may not show up in any of the apps I collect data from. This eventually led me to keep a daily journal entry in Day One along side my scripts entry. These entries maybe something as simple as a few bullet points I add over the course of the day, or something more detailed along side of the basic thoughts.

Reviewing these two entries feeds into my own daily and weekly review of my to-do lists to figure out what needs to be put in the list for tomorrow or a later day. A by-product also this simplifies a weekly status I report for my manager. These reports also provide me a view into items that have been touched in the many projects I have running concurrently and random tasks that come in through all of my various inboxes.

I recommend this practice for anyone, it can help you feel more accomplished, remember those ideas you had that you couldn't do anything about, and vent about the frustrations. I also find it helps me see what all I have accomplished even when my to-do list does not get smaller or items did not get completed when I planned them to be completed. You can download a copy of my daily script can be found in my Github at,

I implore you to not laugh too hard; this script has been hacked together with a lot of twine, glue, and duct tape, and has been my first attempt at AppleScript.

Footnote: This post was inspired in part by an Engadget post about DayOne.

Nexus 7000 and a systematic Bug

I have been thinking about an old issue that a customer encountered with an pair of Nexus 7000 switches about a year and half ago. When the issue first came onto my radar it was in a bad place, this customer had Nexus 2000 Fabric Extenders that would go offline and eventually the Nexus 7000 would go offline causing some single homed devices to be come in reachable, and in the process broader reachability issues. This is occurred intermittently which always causes data collection to be complicated. After working with TAC and finally collecting all of the information the the summary of the multiple causes came down the these 5 items.

1. Fabric extender link become error disabled due to CSCtz01813
2. Nexus is reloaded to recover from Fabric Extender issue.
3. After reload ports get stuck in INIT state due to CSCty8102 and failed to come online.
4. Peer Link, Peer Keep-Alive and VPCs fail to come online since ports are err-disabled from sequence time out.
5. VPC would go into a split brain state causing SVIs to go into shutdown mode.
6. Network connectivity is lost until reload of module and ports are brought online.

The summary is two bugs that would get triggered at random times causing a firestorm of confusing outages. The two temporary work arounds to mitigate the problem before we could upgrade the code on the switches was to,

  1. VPC keep alive link to Admin port on Supervisor.
  2. Use EEM script to reset a register when a module comes on line.

When thinking about what occurred it is important to remember the Nexus 7000 platform consists of many line cards that each contain an independent "brain" (Forwarding Engine(s) and supporting systems on the line cards) that are connected and orchestrated by the Supervisor module. It is true previous statement was a bit of a simplification, however I find it enigmatic of some of the design challenges you can on the Nexus 7000 platform. For example there are many limitations with Layer 3 routing features and VPC. In the example above it could be said that this sort of complexity can cause safety features such as those build into VPC to cause more harm then good when they encounter an in planned failure scenario. This is different from the Catalyst platform where (for the most part) everything is processed through an central processor.

Over all the Nexus 7000 system design allows for tightly coupled interactions between the modules, supervisors and even more loosely coupled interactions between chassises. These interactions can allow for the high speed and throughput that can be delivered, however is adds to the complexity of troubleshooting and complex designs. In the end what makes this issue so interesting to me and and why I keep mentally revisiting it is that it is an example of a system failure. Every single cause if occurred individually would have been as greatly problematic but their interactions together caused the observed issue to be many times worse.

Some great Nexus 7000 references

My interest of academics of systems

Lately I've been very interested in academic side of computers. Complex systems, Theoretical Computing, and Control Theory are two of my focuses right now. This has come about because I'm getting more interested in how the systems work and how ti measure them, more then how to implement them. My career has been very focus on the implementation then how systems work and can be measured. I've never had any sort of formal Computer Science education, making a lot of this new territory to me. As I dive deeper into these topics I realize how much math I have forgotten over the years. These topics are some of  reasons for me to refresh my math skills, however math skills are also analyze sampled data such as monitoring data. A great video discussing data analysis is by Noah Kantrowitz at Monitorama PDX 2014.

Monitorama PDX 2014 - Noah Kantrowitz from Monitorama on Vimeo.

Some of the topics I'm learning about are much broader then others. The definitions of these fields of study as defined by their Wikipedia articles are as follows,

Control theory is an interdisciplinary branch of engineering and mathematics that deals with the behavior of dynamical systems with inputs, and how their behavior is modified by feedback.
Wikipedia: Control Theory

The field of theoretical computer science is interpreted broadly so as to include algorithms, data structures, computational complexity theory, distributed computation, parallel computation, VLSI, machine learning, computational biology, computational geometry, information theory, cryptography, quantum computation, computational number theory and algebra, program semantics and verification, automata theory, and the study of randomness. Work in this field is often distinguished by its emphasis on mathematical technique and rigor.
Wikipedia: Theoretical Computer Science

Complex systems present problems both in mathematical modelling and philosophical foundations. The study of complex systems represents a new approach to science that investigates how relationships between parts give rise to the collective behaviors of a system and how the system interacts and forms relationships with its environment.
Wikipedia: Complex Systems

All of these topics I feel are important as products start to become much simpler and centrally controlled or incredibly complex in their interactions. Algorithms, controls, and data are becoming more and more important to understand.

My First OpenDaylight

Over the last few days I've started to the play with the OpenDaylight Test VM Image. This image is was easy to get up and running and have a playground with mininet and a pre-baked OpenDaylight (ODL) controller to play with. After deploying the OVA file in Virtualbox poking around the file system I got down to "business" with getting a test topology in place. I made some changes to initial mininet configuration startup file to make the topology more complex and changing the startup command to look like the following,

sudo mn --controller 'remote,ip=,port=6633' --topo tree,3

This yielded a 8 hosts and 7 switches topology. At one point I have 63 hosts and some number of switches things broke pretty hard so I dialed it back a little bit. I want over to the webui for the controller and after some fiddling Names and Tiers on the switches. My test topology in the ODL console is show in the following screenshot.

ODL Home

I also had full reachability from all of the mininet hosts.

mininet> pingall
*** Ping: testing ping reachability
h1 > h2 h3 h4 h5 h6 h7 h8
h2 > h1 h3 h4 h5 h6 h7 h8
h3 > h1 h2 h4 h5 h6 h7 h8
h4 > h1 h2 h3 h5 h6 h7 h8
h5 > h1 h2 h3 h4 h6 h7 h8
h6 > h1 h2 h3 h4 h5 h7 h8
h7 > h1 h2 h3 h4 h5 h6 h8
h8 > h1 h2 h3 h4 h5 h6 h7
*** Results: 0% dropped (56/56 received)

Now that I had things working it was time to find ways to break it. Diving into the flow rules I threw together a basic Drop rule on one of the transit links.

Flow Rule Split Network

As expected the network was split into two.

mininet> pingall
*** Ping: testing ping reachability
h1 > h2 h3 h4 X X X X
h2 > h1 h3 h4 X X X X
h3 > h1 h2 h4 X X X X
h4 > h1 h2 h3 X X X X
h5 > X X X X h6 h7 h8
h6 > X X X X h5 h7 h8
h7 > X X X X h5 h6 h8
h8 > X X X X h5 h6 h7
*** Results: 57% dropped (24/56 received)

Lets see about black holing a single host now.

Drop H1 This drops all traffic from the host connected to port 1 on the switch which happens to be h1

mininet> pingall
*** Ping: testing ping reachability
h1 > X X X X X X X
h2 > X h3 h4 h5 h6 h7 h8
h3 > X h2 h4 h5 h6 h7 h8
h4 > X h2 h3 h5 h6 h7 h8
h5 > X h2 h3 h4 h6 h7 h8
h6 > X h2 h3 h4 h5 h7 h8
h7 > X h2 h3 h4 h5 h6 h8
h8 > X h2 h3 h4 h5 h6 h7
*** Results: 25% dropped (42/56 received)

OpenDaylight has always peaked my interested, I've been trying to follow the mailing lists and some of the discussions out there and the Test VM is a nice way to start to get under the hood. I have a lot more to learn and there are a ton of other plugins to start to explore. Not to mention to start to think about the API and writing some code against it.


  1. If you do not set switch roles properly end hosts my not show up on the topology.

  2. Flow rule names can not have spaces in them.

  3. The controller had the Access switches properly classified in the Tier however the transit switches were not set to either Distribution or Core.

IBM PURE systems networking

To start off I'll cut past some of the marketing and state that PURE Systems are IBM BladeCenters with some predefined hardware configurations that support both x86 and POWER work loads.

With that being said the advantage to the PURE architecture is the software that IBM has assembled to orchestrate deployments of workloads across all of the integrated platforms. The orchestrator is named Flex System Manager (FSM). The FSM plugs into VMWare for x86, HMC for Power systems and other management system for virtualization platforms. The FSM will use these connections to automate deployment of systems and monitoring of the hardware, physical and virtual systems within the PURE System.

There are many details about the hardware I will not cover but one of the details IBM discusses is the increased speeds and feeds. This is accomplished by interconnections between the Nodes and the I/O Bays, each Node has multiple connection to the I/O Bays. The number of paths grow or shrink by the number of licenses, or as IBM says Pay as you Grow.

IBM Blade to IOM connectivity


Image copied from (http://www.redbooks.ibm.com/abstracts/tips0864.html)

The portfolio of IO Modules is similar to any BladeCenter you may have seen in the past, with options for in Network Switches (BNT Switches, some supporting OpenFlow 1.0), Fiber Channel switches and passthrough modules (All the options can be found here: http://www-03.ibm.com/systems/flex/networking/).

Where I see the need for great improvement is the POWER Series networking. POWER utilizes a Virtual IO Server (VIOS) to connect the LPARs to each other and the outside world. Essentially the VIOS is a AIX server that acts as a layer 2 bridge. The VIOS lacks the ability most network switches have had to do private VLAN configurations and layer 3 inspection. There also currently is no support at this time for next generation such as OpenFlow, IBM DOVE, or VXLAN.

IBM PURE Block Diagram


This brings many complications in a multi-hypervisor environment. For example locating an IBM LPAR next to a VMWare workload you will need glue it together with VLANs and legacy networking. This will require networking teams maintain network controls separately from how you may treat the rest of your virtualized work loads on the VMWare platform.

Even though I have a bit of a beef with the VIOS, the PURE system is a good approach for IBM shops to consolidate their workloads into a single Private Cloud style configuration.

How does Riverbed Steelhead Auto Discovery work?

Riverbed Steelhead devices have a method they use to find each other on the network. Riverbed has named this Enhanced Auto Discovery. This is intended to reduce time to deployment and simplify the configuration on the devices. The core of this method uses setting Options in the TCP headers within the initial 3 way handshake. There are a few concepts to go over to fully understand the process of Steelhead Auto Discovery.

The Steelhead is a Layer 2 Bridge

Every Riverbed is a layer 2 bridge, for traffic to enter the optimization engine it must be bridged through the Steelhead. In the appliance there are 2 interfaces that make of the bridge, they are named the LAN and WAN interfaces.  The LAN interface connects the network where the client machines, or server machines are located. The WAN interfaces connects to the external network where the router for the VPN, MPLS, P2P, 3G Radio, or whatever medium may be in use resides.

These interfaces together are called the IN PATH interfaces. In a Cisco device that is doing IRB style bridging the IN PATH interface would be similar to a BVI interface. The IN PATH interface is required to have an IP address assigned to it, this is the IP Steelheads use to communicate to each other on. For example Inner Channel is negotiated between IP addresses of IN PATH interfaces, this is one reason that IP reachability between the IN PATH IP addresses is important.

Clients and Servers

There are a couple of designations that help to clarity which devices initiate which connections. The Client Steelhead (CSH) is the Steelhead that receives the first Naked SYN from the client machine initiating the connection. A Server Steelhead (SSH) is a Steelhead that receives the SYN+ from a Client Steelhead and is the Steelhead closest to the destination server.

The Channels

Channels denote areas where specific devices communicate with each other, and  where traffic is optimized or not optimized. There are 2 separate Outer Channels and a single Inner Channel.

Screen Shot 2013-08-18 at 10.38.49 PM

Outer Channel (local) - This channel is between the Steelhead LAN interface and the client machine sourcing of the Naked SYN. This may be a branch workstation for example.

Outer Channel (Remote) - This is the channel between the Steelhead LAN interface and the server machine that the initial SYN was intended to be received by. This may be some sort of application or file server at a data center for example.

Inner Channel - This is the network between the WAN interfaces of the Steelheads. This is a TCP connection between the IN PATH IP address where all optimized traffic between Steelheads passes.  The default for used for this connection is TCP 7800.

 The Process

Now on to the steps used for the Steelhead devices to find each other. There are 7 main steps involved, in the following example you are initiating a TCP connection from a Client workstation to a Server to copy a file.

Screen Shot 2013-08-18 at 10.54.42 PM


  1. The client machine sends a Naked SYN packet, this is a SYN with out the PROBE TCP option set. This SYN packet is received on the LAN interface of the CSH and is intercepted by the CSH. At this stage the Steelhead checks licenses to make sure this traffic is entitled, if it is not the traffic is just passed through unchanged. If all is well the CSH will add the TCP header Option number 76. This option header is named AUTO-DISCOVERY PROBE and will include such information as the IN PATH IP address and what role this Steelhead is assuming. A packet with an TCP option is noted by a '+' in documentation, for example SYN+. After the header is set the packet is forwarded out the WAN interface.
  2. This SYN+ is received by the Steelhead that will become the SSH (in this example) on the WAN interface. Again at this stage licenses on this Steelhead are checked to make sure this traffic is entitled, if not the traffic is just passed through untouched. If the SYN does not for the option headers set it is also passed through untouched.
  3. Since this packets is a SYN+ the Steelhead then sends a SYN/ACK+ with FWD Negotiation option back towards Client machine. This SYN/ACK+ is then intercepted by the CSH on the return path.
  4. The Steelhead near the Server machine starts a negotiates 3 way handshake with Server machine. If this Steelhead does not encounter another Steelhead between itself and the Server machine it will assume the role of SSH and complete an 3 way handshake with the Server machine. If a Steelhead is encountered this process repeats down the line towards the server.
  5. The newly declared SSH will then send a SYN/ACK+ with PROBE RESPONSE option in headers towards CSH.
  6. The SYN/ACK+ is intercepted by the CSH and the CSH initiates the Inner channel between the CSH to the SSH.
  7. Once the Inner Channel is formed the CSH completes the 3 way between itself and the Client machine.

Now that the above steps are completed the traffic between the Client machines and Server machines is on its way and the Steelhead is transparently (to the client and server) doing its optimization magic. All of the traffic in this flow, after it has been optimized is transmitted, over the Inner Channel. If a new traffic flow needs to communicate between the Client and Server the process starts again for that new traffic flow and this is the case for every TCP connection that occurs from site to site.

Where Trouble can Occur

There are a few basic items that can easily stop this process from occurring, therefor blocking optimization from occurring.

  • If there is something strips the Option headers out of TCP packets such as a firewall or IDS.
  • The IN PATH interfaces do not have Layer 3 reachability between each other. This will prevent the Inner Channel from forming.
  • The traffic does not follow from the LAN to WAN and WAN to LAN interfaces on both the SSH and CSH.
  • The boxes are improperly licensed or the amount of traffic/tcp session is exceeding the license

Over all this is a pretty simple process and not to different from how other vendors handle things such as Cisco WAAS. Even though they use technologies such as WCCPv2. There are other scenarios that  such as Virtual In Path and Server Side Out of Path that are a bit different, but this is the Riverbed recommended way of doing things.

8static 34 – April 2013 Photos

So I have some catching up to do, so here are some photos from April!

8static 34
April 13th, 2013 7:00pm

Br1ght Pr1mate (BOS)
Note! (NYC)
Dauragon (DC)
Environmental Sound Collapse (CHI)

Environmental Sound Collapse (CHI)

Animal Style on soldering for modding / circuit bending


8static - Dauragon
8static - Br1ght Pr1mates
Br1ght Pr1mates
8static - Note!


Cisco Live 2013 – My (late) wrap up.

This past month I attended Cisco Live in Orlando, FL with 20,000(?) of my fellow Network/Collaboration/Service Provider/Data Center engineers from all around the world. This was my first time attending, and I had a blast! There are a few themes I won't talk much about in this post that were big topics at Cisco Live one of which is the Internet of Everything (IoE) as that is well covered, and well is really just Market-ecture-tastic. New gear like the Catalyst 6800 or Nexus 7700 and new ASICs all of which are neat, powerful, and that will enable a lot of the future technologies, but Better, Faster, Stronger hardware comes all the time. In the end SDN/Network Virtualization for me was the most discussed topic in through all of the Network-centric sessions and "hallway" conversations throughout the entire week.

CLUS 2013 Schedule I was able to attended many great sessions, but even with a packed schedule I still wanted to be in two places at once most of the time. There were a few standout sessions including "BRKRST-3114 The Art of Network Architecture" and "BRKRST-3045 LISP - A Next Generation Networking Architecture". "The Art of Network Architecture" was a very business forward discussion of network architecture, and I believe attempted to change the discussion around designing a network. Wheras "LISP - A Next Generation Networking Architecture" got me excited about LISP in a way that I had not been before. All the previous information I had read about LISP left me wanting for a tangible use case. This presentation at CLUS started to describe some good use cases for LISP, I am still left wanting for more wide spread production implementations.

CLUS Tweetup

Another great event I attended was the Tweetup organized by Tom Hollingsworth. I met a lot of people I follow on Twitter there, and it was nice to put a face with a Twitter handle and have some good conversations about networking and well just about anything else.


When listening to the discussions and presentations a few trends and themes struck me. First, there is a trend towards the flat network; when I look at the fabric technologies or the affinity networking coming out of Plexxi or potentially Insieme, this all puts a large exclamation point at the end of the need to move to IPv6 or at least implement dual stack sooner rather than later. It will be key in the success of these technologies in the data center. Next there was a constant argument going on about the death of the CLI and that the GUI will reign supreme. I believe both the CLI users and the GUI user can be accommodated, both types of interfaces can be used to manipulate some back end software and logic. An example of this is tail-f NCS which has both, while not "SDN" by some definitions, but an example of the 2 UIs co-existing. The real augment that needs to be had concerns the designs of the system needed to support the applications.

This one is more a rant and less of a theme, but I still think Cisco is missing the mark with the ASA 1000v. I think virtualized physical appliances are a transitional technology, but a needed one. Creating the ASA 1000v and not giving it the full set of features of it's physical counterpart without a roadmap as far as I can tell to add them, along with the insane licensing scheme of a per-socket protected model does not make sense to me. This is all short changing the IaaS provider market and IMHO it should be licensed and operated similar to the CSR 1000v, full features and per appliance licensing.

Overall, I was left with two general questions from the week. First, I'm curious how the balance of systemic complexity vs configuration complexity vs structure complexity will fall as the overlay, "underlay" and the SDN glue that holds it all together sets into place. Each new technology that is introduced seems to address one of these complexity problems but not all three in one fell swoop, but this is a larger topic for another post. Second is a reoccurring theme in technology: everything old is new again; I look at the data center technologies, and some of the new IP routing technologies (LISP), and they look a lot like old telephony switching technologies, in the same way VDI looks like mainframe dumb terminals. This is not a critique, just an observation on how it's important to know your past because it will come back Better, Faster, Stronger, or maybe just the same with a new box around it.