Hashcat in AWS EC2

Intro

During my OSCP studies, I realized I needed a more efficient system for cracking password hashes. The screaming CPU fans and high CPU usage became a problem. I first tried using hashcat and the GPU on my MacBook Pro in OS X. There are some bugs and problems with hashcat on OS X that would make it crash in the middle of cracking a hash. Also, I was not interested in investing a server with a bunch of GPUs, the high costs to do this would outweigh the amount of time I need the system. All of this lead me to do a little research and found the instructions in the following link to build an AWS instance for password cracking.

https://medium.com/@iraklis/running-hashcat-v4-0-0-in-amazons-aws-new-p3-16xlarge-instance-e8fab4541e9b

Since that post was created there have been some changes to the offerings in AWS EC2 leading me write this post.

If you wish to skip ahead I have created scripts to automate the processes in the rest of this post. They are both in my github and can be downloaded at the following links.

https://github.com/suidroot/AWSScripts/blob/master/aws-ec2-create-kracker.sh
https://github.com/suidroot/AWSScripts/blob/master/configure-kracker.sh

For the rest of the article I will cover some of the instance options in EC2, installation of the needed Linux packages, the basic setup of Hashcat, running Hashcat, and finally monitoring and benchmarks of an EC2 instance.

AWS EC2 Options

There are many options for EC2 instances, they have a huge range in cost and scale.

I found the g3 instances to be the more cost effective tier. For my testing I opted to use the g3.4xlarge tier. Next to choose the AMI image, appropriate the appropriate operating system.

AMI images

There are two options that are I tested hashcat on they are both Ubuntu based. I’m sure there are many other available options that will work too, but I am familiar with Ubuntu systems. The first option is a standard Ubuntu image, there is nothing special about this image and it requires configuration to add the GPU drivers and a little more work.

Standard Ubuntu

The next option is a Deep Learning image, this image is preconfigured with the GPU drivers and was originally designed for machine learning applications. I found the the pre-configuration allowed for me skip a few steps in building out a new system.

Deep learning Ubuntu GPU driver preloaded

Instance Build and config

Once you have the instance deployed there are a few steps to get the Instance prepared for hashcat, the steps are a little bit different between a Standard and a Deep Learning Ubuntu instance.

An apt cronjob may already be running and you will have to wait it out.

Prepare Machine (Standard Ubuntu)

This script will install all the required packages and the Nvidia GPU drivers on a vanilla Ubuntu installation.

#!/bin/bash

# mostly copied from: https://medium.com/@iraklis/running-hashcat-v4-0-0-in-amazons-aws-new-p3-16xlarge-instance-e8fab4541e9b
#
sudo apt-get update -yq
sudo apt-get install -yq build-essential linux-headers-$(uname -r) unzip p7zip-full linux-image-extra-virtual
sudo apt-get install -yq ocl-icd-libopencl1 opencl-headers clinfo
#sudo apt-get install -yq libhwloc-plugins libhwloc5 libltdl7 libpciaccess0 libpocl2 libpocl2-common ocl-icd-libopencl1 pocl-opencl-icd
sudo apt-get install -yq python3-pip 
pip3 install psutil

sudo touch /etc/modprobe.d/blacklist-nouveau.conf
sudo bash -c "echo 'blacklist nouveau' >> /etc/modprobe.d/blacklist-nouveau.conf"
sudo bash -c "echo 'blacklist lbm-nouveau' >> /etc/modprobe.d/blacklist-nouveau.conf"
sudo bash -c "echo 'options nouveau modeset=0' >> /etc/modprobe.d/blacklist-nouveau.conf"
sudo bash -c "echo 'alias nouveau off' >> /etc/modprobe.d/blacklist-nouveau.conf"
sudo bash -c "echo 'alias lbm-nouveau off' >> /etc/modprobe.d/blacklist-nouveau.conf"

sudo touch /etc/modprobe.d/nouveau-kms.conf
sudo bash -c "echo 'options nouveau modeset=0' >>  /etc/modprobe.d/nouveau-kms.conf"
sudo update-initramfs -u
sudo reboot

### Install nVidia Drivers
wget http://us.download.nvidia.com/tesla/410.104/NVIDIA-Linux-x86_64-410.104.run
sudo /bin/bash NVIDIA-Linux-x86_64-410.104.run --ui=none --no-questions --silent -X

Prepare Machine (Deep Learning Ubuntu)

In comparison the previous script there is a much simpler script to prepare the Deep Learning instance. The main focus is installing the needed archive extraction tools.

#!/bin/bash

sudo apt update
sudo apt upgrade
sudo apt install clinfo unzip p7zip-full
sudo apt install build-essential linux-headers-$(uname -r) # Optional 
sudo apt-get install -yq python3-pip 
pip3 install psutil

Hashcat Setup

Now we need to download and extract the star of the show Hashcat. The link in the wget below points to the the most recent version as of writing however you might want to check to see if there is a more recent version at the main site: https://hashcat.net/hashcat/

wget https://hashcat.net/files/hashcat-5.1.0.7z
7z x hashcat-5.1.0.7z

Download wordlists

You will need some wordlists for hashcat to use to crack passwords, he commands listed are for some wordlists I like to use when cracking. You should however add whichever lists are your favories.

mkdir ~/wordlists
git clone https://github.com/danielmiessler/SecLists.git ~/wordlists/seclists
wget -nH http://downloads.skullsecurity.org/passwords/rockyou.txt.bz2 -O ~/wordlists/rockyou.txt.bz2
cd ~/wordlists
bunzip2 ./rockyou.txt.bz2
cd ~

Running hashcat

Now it is time to run hashcat and crack some passwords. When running hashcat I had the best performance with the arguments-O -w 3. Below is an example command line I've used inclusing a rules file.

./hashcat-5.1.0/hashcat64.bin --username -m 1800 ./megashadow256.txt wordlists/rockyou.txt -r hashcat-5.1.0/rules/best64.rule -O -w 3

Monitoring the Nvidia GPU

The nvidia-smi utility can be used to show the GPU processor usage and what processes are utilizing the GPU(s). The first example is is showing an idle GPU.

ubuntu@ip-172-31-17-6:~$ sudo nvidia-smi
Fri Apr 26 14:43:49 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   37C    P0    42W / 150W |      0MiB /  7618MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

This example shows a GPU being used by hashcat.

ubuntu@ip-172-31-17-6:~$ sudo nvidia-smi
Fri Apr 26 14:44:44 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   46C    P0   141W / 150W |    828MiB /  7618MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     11739      C   ./hashcat-5.1.0/hashcat64.bin                817MiB |
+-----------------------------------------------------------------------------+

Conclusion and Benchmarks

Finally here is a benchmark I ran on a g3.4xlarge instance. This instance type contains 1 GPU. These results give an idea of performance for this AWS EC2 instance type.

ubuntu@ip-172-31-17-6:~$ ./hashcat-5.1.0/hashcat64.bin -O -w 3 -b
hashcat (v5.1.0) starting in benchmark mode...

* Device #2: Not a native Intel OpenCL runtime. Expect massive speed loss.
             You can use --force to override, but do not report related errors.
nvmlDeviceGetFanSpeed(): Not Supported

OpenCL Platform #1: NVIDIA Corporation
======================================
* Device #1: Tesla M60, 1904/7618 MB allocatable, 16MCU

OpenCL Platform #2: The pocl project
====================================
* Device #2: pthread-Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz, skipped.

Benchmark relevant options:
===========================
* --optimized-kernel-enable
* --workload-profile=3

Hashmode: 0 - MD5

Speed.#1.........: 11611.6 MH/s (90.74ms) @ Accel:512 Loops:512 Thr:256 Vec:4

Hashmode: 100 - SHA1

Speed.#1.........:  4050.2 MH/s (65.01ms) @ Accel:512 Loops:128 Thr:256 Vec:2

Hashmode: 1400 - SHA2-256

Speed.#1.........:  1444.5 MH/s (91.98ms) @ Accel:256 Loops:128 Thr:256 Vec:1

Hashmode: 1700 - SHA2-512

Speed.#1.........:   499.4 MH/s (66.78ms) @ Accel:128 Loops:64 Thr:256 Vec:1

Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4096)

Speed.#1.........:   189.8 kH/s (42.76ms) @ Accel:128 Loops:64 Thr:256 Vec:1

Hashmode: 1000 - NTLM

Speed.#1.........: 18678.1 MH/s (56.58ms) @ Accel:512 Loops:512 Thr:256 Vec:2

Hashmode: 3000 - LM

Speed.#1.........: 10529.6 MH/s (50.60ms) @ Accel:128 Loops:1024 Thr:256 Vec:1

Hashmode: 5500 - NetNTLMv1 / NetNTLMv1+ESS

Speed.#1.........: 10650.8 MH/s (49.60ms) @ Accel:512 Loops:256 Thr:256 Vec:1

Hashmode: 5600 - NetNTLMv2

Speed.#1.........:   829.3 MH/s (80.24ms) @ Accel:256 Loops:64 Thr:256 Vec:1

Hashmode: 1500 - descrypt, DES (Unix), Traditional DES

Speed.#1.........:   442.0 MH/s (37.81ms) @ Accel:4 Loops:1024 Thr:256 Vec:1

Hashmode: 500 - md5crypt, MD5 (Unix), Cisco-IOS $1$ (MD5) (Iterations: 1000)

Speed.#1.........:  4209.1 kH/s (51.39ms) @ Accel:1024 Loops:500 Thr:32 Vec:1

Hashmode: 3200 - bcrypt $2*$, Blowfish (Unix) (Iterations: 32)

Speed.#1.........:     7572 H/s (33.02ms) @ Accel:16 Loops:4 Thr:8 Vec:1

Hashmode: 1800 - sha512crypt $6$, SHA512 (Unix) (Iterations: 5000)

Speed.#1.........:    76958 H/s (83.99ms) @ Accel:512 Loops:128 Thr:32 Vec:1

Hashmode: 7500 - Kerberos 5 AS-REQ Pre-Auth etype 23

Speed.#1.........:   149.4 MH/s (56.00ms) @ Accel:128 Loops:64 Thr:64 Vec:1

Hashmode: 13100 - Kerberos 5 TGS-REP etype 23

Speed.#1.........:   152.1 MH/s (55.00ms) @ Accel:128 Loops:64 Thr:64 Vec:1

Hashmode: 15300 - DPAPI masterkey file v1 (Iterations: 23999)

Speed.#1.........:    32703 H/s (84.02ms) @ Accel:256 Loops:64 Thr:256 Vec:1

Hashmode: 15900 - DPAPI masterkey file v2 (Iterations: 7999)

Speed.#1.........:    21692 H/s (96.24ms) @ Accel:256 Loops:128 Thr:32 Vec:1

Hashmode: 7100 - macOS v10.8+ (PBKDF2-SHA512) (Iterations: 35000)

Speed.#1.........:     5940 H/s (40.09ms) @ Accel:64 Loops:32 Thr:256 Vec:1

Hashmode: 11600 - 7-Zip (Iterations: 524288)

Speed.#1.........:     4522 H/s (55.87ms) @ Accel:256 Loops:128 Thr:256 Vec:1

Hashmode: 12500 - RAR3-hp (Iterations: 262144)

Speed.#1.........:    18001 H/s (56.74ms) @ Accel:4 Loops:16384 Thr:256 Vec:1

Hashmode: 13000 - RAR5 (Iterations: 32767)

Speed.#1.........:    18135 H/s (55.93ms) @ Accel:128 Loops:64 Thr:256 Vec:1

Hashmode: 6211 - TrueCrypt PBKDF2-HMAC-RIPEMD160 + XTS 512 bit (Iterations: 2000)

Speed.#1.........:   121.7 kH/s (59.39ms) @ Accel:128 Loops:32 Thr:256 Vec:1

Hashmode: 13400 - KeePass 1 (AES/Twofish) and KeePass 2 (AES) (Iterations: 6000)

Speed.#1.........:    68380 H/s (158.89ms) @ Accel:512 Loops:256 Thr:32 Vec:1

Hashmode: 6800 - LastPass + LastPass sniffed (Iterations: 500)

Speed.#1.........:  1088.7 kH/s (48.51ms) @ Accel:128 Loops:62 Thr:256 Vec:1

Hashmode: 11300 - Bitcoin/Litecoin wallet.dat (Iterations: 199999)

Speed.#1.........:     2107 H/s (78.97ms) @ Accel:128 Loops:64 Thr:256 Vec:1

Started: Fri Apr 26 14:36:56 2019
Stopped: Fri Apr 26 14:42:03 2019

If you've made it this far congratulation and happy cracking!

How to set up a Meraki API Test environment

I needed to set up and Meraki API key to test, well an Meraki API that was in beta. This is the process I used to get started with some of the basics of the Meraki API and getting a test environment up and running. There are lots of great references covering the basics of REST APIs like the REST API Tutorial. These resources will do a much better job then I can of explaining REST APIs. I found there was a lack of guide for the initial steps of building the data you need to get started with the Meraki API.

Note: The screenshots are from late 2018 and may have changed over time.

API Key Generation

First things first you will need to login to the Meraki Dashboard. Once there, you will navigate using the menu on the left to Organization -> Settings.

On the settings screen scroll down to Dashboard API Access, and check "Enable access to the Meraki Dashboard API" and click Save at the bottom. Once the general access is enabled you will need to click the "profile" link to go to the screen where you generate an API Key to use than making REST API calls.

On the API access click the "Generate new API Key" button. If the button is not there I found with my account I can only have a maximum of two API keys generated at any point in time. Once I deleted one key the button came back.

After clicking the button a dialogue similar to this showing you your new key, this key is only shown once so make a note of it since you will use it to authenticate your API calls.

Now that you have a key what to do with it?

Meraki has an extensive API with many calls and you will want a tool to start to test some of the calls. A good utility to start testing with is Postman. This tool allows you to make REST API calls using a convenient GUI. I won't go into complete detail on how to use Postman but cover some highlights to getting it setup to test some Meraki API calls.

A useful feature of Postman is the ability to import collections of API calls. The collection of Meraki Dashboard calls is at https://create.meraki.io/postman. Once there click "Run in Postman" in the upper right and it will ask to open the Postman client. Once you import the collection there will need to be some variables you will need to discover and fill in:

  • X-Cisco-Meraki-API-Key
  • organizationId
  • networkId
  • baseUrl

To set these variables you will need to edit the newly imported Postman collection, you can right click on the collection and select "Edit."

Then select the Variables tab, I have populated these variables already in the screenshot, you will need to type them in.

Now you ask where do I find the values for these variables. I'll cover the calls that are made to collect the values you need in the next few sections.

Meraki API URL (baseUrl) and API Key

baseURL
The first variable you will set it the baseUrl this is the URL that Postman will use to send REST API calls to. In general for testing you can the use URL:

https://dashboard.meraki.com/api/v0

This will work for testing and non-production. Once you go to production you will want to point to the specific shard you are hosted on such as:

https://n466.meraki.com/api/v0

X-Cisco-Meraki-API-Key
We will also need to set the API key which we generated earlier. This is stored in the X-Cisco-Meraki-API-Key variable. This variable sets the header also named X-Cisco-Meraki-API-Key in REST calls. This is used to authenticate the REST calls.

With these two variables set you can start to discover the organizationId and networkIds.

Finding the "organizationId"

To find the organizationId, in Postman navigate to "Organizations -> List organizations this user has access to" in the sidebar on the left.

The query in Postman looks like:

The full REST URL to retrieve the Meraki Organizations you have access to is: https://dashboard.meraki.com/api/v0/organizations
The data returned shows the organizations you have access to. The "id" number is the field used to select the organization you wish to query

[ { "id": 1234567, "name": "Organization name" } 

Finding the "networkIds"

Many calls I have worked will use either the organizationID or a networkId. In most organizations, there are multiple networks in the organization you are querying. Each of the networks is identified by the networkId.

The full REST URL to retrieve the Meraki Networks in an Organization you have access to is: https://dashboard.meraki.com/api/v0/organizations/1234567/networks

The output below will list all of the networks in the Organization, the field labeled "id" is what you will use to query data for a specific network.

[ 
{ "id": "L_234567890",
"organizationId": "1234567",
"name": "Test Network",
"timeZone": "US/Eastern",
"tags": null, "type":
"combined",
"disableMyMerakiCom": false },
{ "id": "N_678901234",
"organizationId": "1234567",
"name": "Systems Manager",
"timeZone": "America/Detroit",
"tags": null,
"type": "systems manager" }
]

Done?

Now there is a test environment to play and learn how the various API calls work and what data can be collected, set, or deleted. Postman is just the start for experimentation and to

https://create.meraki.io/

March 2019 NX-OS Vulnerability Dump

On March 6th Cisco released 29 high and medium rated PSIRT notices for NX-OS based platforms. These platforms include the Cisco Nexus 3000 - 9000 series and Nexus adjacent platforms FX-OS and UCS Fabric Interconnect platforms. Not all advisories affect all platforms but all platforms are affected by at least one high rated vulnerability. The vulnerabilities range from command and code execution, privilege escalation, denial of service, and arbitrary file read vulnerabilities. This is just about everything bad that could affect core infrastructure devices.

If you haven't updated your switch in a while this is probably the time too. Within some of the advisories Cisco notes that they are providing free updates:

Cisco has released free software updates that address the vulnerability described in this advisory. Customers may only install and expect support for software versions and feature sets for which they have purchased a license.

I've included a table of the fixed in versions notes as of the writing of this post.  I would recommend looking at the advisories to assist in selecting the best version as there are other code versions that have integrated the fixes.

PlatformVersion
Nexus 1000v5.2(1)SM3(2.1) (Hyper-V)
5.2(1)SV3(4.1a) (VMWare)
Nexus 3000
Nexus 3500
Nexus 3600
9.2(2)
Nexus 5500, 5600, and 6000
Nexus 7000 and 7700
8.3(3)
Nexus 9000 and 95009.2(2)
UCS 6200 and 6300 Series Fabric Interconnects
UCS 6400 Series Fabric Interconnects
4.0(2a)

Cisco has a bundled advisory for all of the high rated notices at the following link,

Cisco Event Response: March 2019 Cisco FXOS and NX-OS Software Security Advisory Bundled Publication

I have also included a laundry list of notices including both high and medium rated vulnerabilities for your reference.

Happy patching!

Small Projects: Temperature, Humidity and Light Sensor

This post is some free-ish form notes about a project that is either work in progress or complete.

Description

This project is a small sensor to monitor Temperature, Humidity, and Light levels. The project may end up in a toy Star Trek TNG Tricorder case at some point in the future, but I wanted to document where it is at a today. Originally I used an Adafruit Huzzah (ESP12) board, but after I determined I wasn't going to use the wifi, I switched to the Adafruit Adalogger board. This board has many more analog channels which could be of use for other sensors in the future.

The choice of resistors and the MINLIGHT and MAXLIGHT values will vary depending on the board in use.

Parts

Code

https://github.com/suidroot/arduino/blob/master/TempHumLumSensor.ino

Photo

This photo is of my current PoC on a breadboard of this project.

Vyatta 5400 and interface inbound discards

Recently I was investigating alerts that were being generated for inbound interface discards on multiple interfaces and multiple Vyatta 5400 devices. There were not any noticeable performance issues on traffic passing through the devices. The discards would report in SNMP, show interface ethernet ethX, and ifconfig outputs. An example show interface ethernet ethX output I was reviewing is below.

vyatta@FW01:~$ sh int ethernet eth0
eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
link/ether 00:50:56:x:x:x brd ff:ff:ff:ff:ff:ff
inet 172.x.x.x/24 brd 172.x.x.x scope global eth0
inet6 fe80::250:56ff:x:x/64 scope link
valid_lft forever preferred_lft forever
Last clear: Wed Oct 29 10:55:13 GMT 2014
Description: MGMT
RX: bytes packets errors dropped overrun mcast
   242863    3664      0     163       0     0
TX: bytes packets errors dropped carrier collisions
   128065     701      0       0       0          0

I was not finding any other statistics that would match up with the quantity of discards being reported. Here are a few of the commands I looked at to look for matching discard counters.

vyatta@FW01:~$ sh int ethernet eth0 queue
vyatta@FW01:~$ sh int ethernet eth0 statistics
vyatta@FW01:~$ sh queueing
vyatta@FW01:~$ sudo netstat -s

While researching where to go next I was reminded that the Vyatta 5400 is at it's heart a Linux device server. I found a few references that beginning in the Linux kernel version 2.6.36 there were more error conditions added to this counter in the kernel.

The rx_dropped counter shows statistics for dropped frames because of: (Beginning with kernel 2.6.37)
(http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=caf586e5f23cebb2a68cbaf288d59dbbf2d74052)
Softnet backlog full -- (Measured from /proc/net/softnet_stat)
Bad / Unintended VLAN tags
Unknown / Unregistered protocols
IPv6 frames when the server is not configured for IPv6
If any frames meet those conditions, they are dropped before the protocol stack and the rx_dropped counter is incremented.

via http://www.novell.com/support/kb/doc.php?id=7007165

When taking a look I found that the version of Vyatta code in use contains the Linux kernel version 3.3.8. The only way to verify if these conditions are causing the counter to increment is to put the interface into promiscuous mode. Since this was a production system I instead looked for neighboring Linux systems in the same subnet, and found they do not report the same level of discards. It appears I found my the reason behind this counter incrementing. This issue looked more urgent as we measure this counter in percentage of packets discarded and this interface does not have much traffic flowing through it. This made the percentages very high which the discarded frames where non-production impacting frames. This issue was a reminder that it is good to remember the underlying Operating System even if it is masked by a custom CLI.

A meditation on the interface discard counter

I find the interface discard counter a deceptively complex counter. When you ask people what the counter means the usual answer is that you are over running the throughput capability of an interface. Which matched pretty closely to the definition in the IF-MIB SNMP MIB.

The number of inbound packets which were chosen
to be discarded even though no errors had been
detected to prevent their being deliverable to a
higher-layer protocol.  One possible reason for
discarding such a packet could be to free up
buffer space.

ifInDiscards : https://www.ietf.org/rfc/rfc1213.txt

The description from the MIB is often the cause of this counter incrementing, however as devices get more powerful and circuits keep increasing in size, this description is becoming less applicable. There are many other issues that have been lumped into this counter, all of these other issues are vendor, platform, and configuration dependent. Some examples I have found are,

  • ASA Dispatch Unit CPU over utilization
  • ASA ACL/ASP packet drops
  • QoS on an IOS interface can cause an elevated (purposeful) number of frames dropped
  • An ASIC shared between ports on a switch is being over utilized
  • L2/L3 packet handling on some Linux kernels and some virtual network platforms

Looking at this list the interface discard counter starts to look more like a check engine light for a device or interface. As with the check engine light it is important to understand all of the that data your devices are presenting, and build good baselines of the statistics for your system. Ethan Banks has some good thoughts on data baselines in a post titled The Importance of Knowing Baselines.

Daily work log

Like a lot people I regularly have the problem at the end of a workday or even a workweek answering the question, "What did I do?" let alone what "What did I accomplish?" To find an answer for these questions I have started to keep a daily journal using both an automated report and a manual entry.  Between these two entries I tend to have a good idea of what occurred during my workday and work week. The idea to keep a journal was inspired in part by a post over at Productivityist: Taking Journaling to Another Level. It is useful to note that I am a Mac user so all off the tools that I use and have strung together to generate my automated report are Mac specific. That being said, I have found this this practice to be very helpful in keeping track of what I am or am not accomplishing and use it in concert with my daily and weekly review.

Screen Shot 2014-11-01 at 15.44.17I started this project by writing the automated script using an AppleScript that glues together all of the various apps I use (Mail.app, Omnifocus, Lync, and Apple Calendar) and generates an automated report of what email I sent, tasks I completed, my scheduled meetings and people I interacted with over IM. There were some limitations on what I could actually pull out of all the various apps, but in the end this is high-level list, if I need more I can the app that has the information I need. I kick off this script when my workday comes to an end and the script will collect all the data and stuff it into a new entry for the day in Day One.

After using the script for a while I would still have a lot of those "oh yeah I did XYZ task that day" moments when doing my daily and weekly reviews. The items I was not capturing were items such as phone calls, general thoughts, gripes, or half done tasks that may not show up in any of the apps I collect data from. This eventually led me to keep a daily journal entry in Day One along side my scripts entry. These entries maybe something as simple as a few bullet points I add over the course of the day, or something more detailed along side of the basic thoughts.

Reviewing these two entries feeds into my own daily and weekly review of my to-do lists to figure out what needs to be put in the list for tomorrow or a later day. A by-product also this simplifies a weekly status I report for my manager. These reports also provide me a view into items that have been touched in the many projects I have running concurrently and random tasks that come in through all of my various inboxes.

I recommend this practice for anyone, it can help you feel more accomplished, remember those ideas you had that you couldn't do anything about, and vent about the frustrations. I also find it helps me see what all I have accomplished even when my to-do list does not get smaller or items did not get completed when I planned them to be completed. You can download a copy of my daily script can be found in my Github at,

https://github.com/suidroot/workprojects/blob/master/Daily%20Report%20to%20DayOne.scpt
I implore you to not laugh too hard; this script has been hacked together with a lot of twine, glue, and duct tape, and has been my first attempt at AppleScript.

Footnote: This post was inspired in part by an Engadget post about DayOne.

Nexus 7000 and a systematic Bug

I have been thinking about an old issue that a customer encountered with an pair of Nexus 7000 switches about a year and half ago. When the issue first came onto my radar it was in a bad place, this customer had Nexus 2000 Fabric Extenders that would go offline and eventually the Nexus 7000 would go offline causing some single homed devices to be come in reachable, and in the process broader reachability issues. This is occurred intermittently which always causes data collection to be complicated. After working with TAC and finally collecting all of the information the the summary of the multiple causes came down the these 5 items.

1. Fabric extender link become error disabled due to CSCtz01813
2. Nexus is reloaded to recover from Fabric Extender issue.
3. After reload ports get stuck in INIT state due to CSCty8102 and failed to come online.
4. Peer Link, Peer Keep-Alive and VPCs fail to come online since ports are err-disabled from sequence time out.
5. VPC would go into a split brain state causing SVIs to go into shutdown mode.
6. Network connectivity is lost until reload of module and ports are brought online.

The summary is two bugs that would get triggered at random times causing a firestorm of confusing outages. The two temporary work arounds to mitigate the problem before we could upgrade the code on the switches was to,

  1. VPC keep alive link to Admin port on Supervisor.
  2. Use EEM script to reset a register when a module comes on line.

When thinking about what occurred it is important to remember the Nexus 7000 platform consists of many line cards that each contain an independent "brain" (Forwarding Engine(s) and supporting systems on the line cards) that are connected and orchestrated by the Supervisor module. It is true previous statement was a bit of a simplification, however I find it enigmatic of some of the design challenges you can on the Nexus 7000 platform. For example there are many limitations with Layer 3 routing features and VPC. In the example above it could be said that this sort of complexity can cause safety features such as those build into VPC to cause more harm then good when they encounter an in planned failure scenario. This is different from the Catalyst platform where (for the most part) everything is processed through an central processor.

Over all the Nexus 7000 system design allows for tightly coupled interactions between the modules, supervisors and even more loosely coupled interactions between chassises. These interactions can allow for the high speed and throughput that can be delivered, however is adds to the complexity of troubleshooting and complex designs. In the end what makes this issue so interesting to me and and why I keep mentally revisiting it is that it is an example of a system failure. Every single cause if occurred individually would have been as greatly problematic but their interactions together caused the observed issue to be many times worse.

Some great Nexus 7000 references

My interest of academics of systems

Lately I've been very interested in academic side of computers. Complex systems, Theoretical Computing, and Control Theory are two of my focuses right now. This has come about because I'm getting more interested in how the systems work and how ti measure them, more then how to implement them. My career has been very focus on the implementation then how systems work and can be measured. I've never had any sort of formal Computer Science education, making a lot of this new territory to me. As I dive deeper into these topics I realize how much math I have forgotten over the years. These topics are some of  reasons for me to refresh my math skills, however math skills are also analyze sampled data such as monitoring data. A great video discussing data analysis is by Noah Kantrowitz at Monitorama PDX 2014.

Monitorama PDX 2014 - Noah Kantrowitz from Monitorama on Vimeo.

Some of the topics I'm learning about are much broader then others. The definitions of these fields of study as defined by their Wikipedia articles are as follows,

Control theory is an interdisciplinary branch of engineering and mathematics that deals with the behavior of dynamical systems with inputs, and how their behavior is modified by feedback.
Wikipedia: Control Theory

The field of theoretical computer science is interpreted broadly so as to include algorithms, data structures, computational complexity theory, distributed computation, parallel computation, VLSI, machine learning, computational biology, computational geometry, information theory, cryptography, quantum computation, computational number theory and algebra, program semantics and verification, automata theory, and the study of randomness. Work in this field is often distinguished by its emphasis on mathematical technique and rigor.
Wikipedia: Theoretical Computer Science

Complex systems present problems both in mathematical modelling and philosophical foundations. The study of complex systems represents a new approach to science that investigates how relationships between parts give rise to the collective behaviors of a system and how the system interacts and forms relationships with its environment.
Wikipedia: Complex Systems

All of these topics I feel are important as products start to become much simpler and centrally controlled or incredibly complex in their interactions. Algorithms, controls, and data are becoming more and more important to understand.

My First OpenDaylight

Over the last few days I've started to the play with the OpenDaylight Test VM Image. This image is was easy to get up and running and have a playground with mininet and a pre-baked OpenDaylight (ODL) controller to play with. After deploying the OVA file in Virtualbox poking around the file system I got down to "business" with getting a test topology in place. I made some changes to initial mininet configuration startup file to make the topology more complex and changing the startup command to look like the following,

sudo mn --controller 'remote,ip=127.0.0.1,port=6633' --topo tree,3

This yielded a 8 hosts and 7 switches topology. At one point I have 63 hosts and some number of switches things broke pretty hard so I dialed it back a little bit. I want over to the webui for the controller and after some fiddling Names and Tiers on the switches. My test topology in the ODL console is show in the following screenshot.

ODL Home

I also had full reachability from all of the mininet hosts.

mininet> pingall
*** Ping: testing ping reachability
h1 > h2 h3 h4 h5 h6 h7 h8
h2 > h1 h3 h4 h5 h6 h7 h8
h3 > h1 h2 h4 h5 h6 h7 h8
h4 > h1 h2 h3 h5 h6 h7 h8
h5 > h1 h2 h3 h4 h6 h7 h8
h6 > h1 h2 h3 h4 h5 h7 h8
h7 > h1 h2 h3 h4 h5 h6 h8
h8 > h1 h2 h3 h4 h5 h6 h7
*** Results: 0% dropped (56/56 received)

Now that I had things working it was time to find ways to break it. Diving into the flow rules I threw together a basic Drop rule on one of the transit links.

Flow Rule Split Network

As expected the network was split into two.

mininet> pingall
*** Ping: testing ping reachability
h1 > h2 h3 h4 X X X X
h2 > h1 h3 h4 X X X X
h3 > h1 h2 h4 X X X X
h4 > h1 h2 h3 X X X X
h5 > X X X X h6 h7 h8
h6 > X X X X h5 h7 h8
h7 > X X X X h5 h6 h8
h8 > X X X X h5 h6 h7
*** Results: 57% dropped (24/56 received)

Lets see about black holing a single host now.

Drop H1 This drops all traffic from the host connected to port 1 on the switch which happens to be h1

mininet> pingall
*** Ping: testing ping reachability
h1 > X X X X X X X
h2 > X h3 h4 h5 h6 h7 h8
h3 > X h2 h4 h5 h6 h7 h8
h4 > X h2 h3 h5 h6 h7 h8
h5 > X h2 h3 h4 h6 h7 h8
h6 > X h2 h3 h4 h5 h7 h8
h7 > X h2 h3 h4 h5 h6 h8
h8 > X h2 h3 h4 h5 h6 h7
*** Results: 25% dropped (42/56 received)

OpenDaylight has always peaked my interested, I've been trying to follow the mailing lists and some of the discussions out there and the Test VM is a nice way to start to get under the hood. I have a lot more to learn and there are a ton of other plugins to start to explore. Not to mention to start to think about the API and writing some code against it.

Notes

  1. If you do not set switch roles properly end hosts my not show up on the topology.

  2. Flow rule names can not have spaces in them.

  3. The controller had the Access switches properly classified in the Tier however the transit switches were not set to either Distribution or Core.