11 Apr

Juniper VMX Trial and Error

I have spent some time scratching my head on ESXi-based VMX and I thought I would share some experience. This isn’t meant to be a guide, or replace Juniper’s own docs, but to supplement (and help me remember stuff 2 years later).

My setup:

Dell server, 10 Core Xeon E5-2640 (20 thread), 48GB RAM, ESXi 6.5

I have deployed the OVAs from Juniper for VMX 17.4R1.16.

vCP: 1CPU, 4GB ram, 2x e1000 NIC (br-ext and br-int port groups)

vFCP: 14 CPU, 16 GB RAM, 2x e1000 NIC (br-ext and br-int) plus 2x e1000 NIC (to be my ge-0/0/0 and ge-0/0/1)

The br-ext port group is just on an existing DHCP enabled vSwitch, and I can SSH into the VMX components fine. It seems that in Junos 17.4, the vFPC also gets a DHCP address for its ext bridge interface, which is nice.

The br-int port group is on its own dedicated vSwitch. All my vSwitches have MTU 9000, all security options enabled (promiscuous mode, mac forging etc.. All on).

My two ‘WAN’ interfaces, which are vNIC 3 and 4 under ESXi are there to prove things are working (I have a Linux VM attached to each, via a dedicated vSwitch/port group each). I run simple iperf tests across them, no routing protocols involved at this stage. In this lab/test I am using no physical NIC, so there is no bottleneck – nor is this a particularly realistic test for the real world deployment of VMX.

My topology is:

VM1 — VMX — VM2

Confusingly for you, my VM1 and VM2 are actually called Bird and Space Host. Don’t ask. Again, I am using a vSwitch as a cable between VM and VMX, with no physical cabling required. The br-ext link connects vCP, vFPC and an external network for management.

Lite-mode Vs Performance mode:

By default, VMX runs in performance mode. I find on ESXi (due to dpdk polling), that performance mode absolutely kills my allocated CPU threads. My ESXi reports running around 95% CPU load when a performance mode FPC is sitting idle. I find this has a major impact on TCP throughput, as well as making the ESXi box hopeless for doing other tasks. I am not a kernel expert, so I don’t really understand the implications of this CPU load.. I will leave it alone.

The real issue I had with VMX was before I even got off the ground. I was using the vFPC with 4 NIC (2 for bridges, 2 for ge- ports). By default, I assigned e1000 virtio NICs to the VM. This ended with me being stuck in ‘Present Absent’, which is what ‘show chassis fpc’ would show me for FPC 0. By default, you are in performance mode – and that doesn’t like e1000 NICs. Change the two “ge-” interfaces, in my case vmnic3 and vmnic4 to ‘VMXNET3’ and it fires up and starts passing packets. This appears to be a bug specific to Junos 17.4R1 – according to a phone-call I had with JTAC.

As I have 1Gbit/s licenses for the VMX, lite-mode is fine.

Detailed ESXi Setup

One of the things I find painful with VMX is the quality of the documentation, particularly for VMWare. Juniper releases OVAs for this platform, but shrinks away from documenting the nuts and bolts sufficiently.

Starting with the vCP OVA:

VMWare details of vCP VM

I’ve set the machine to have 1 CPU, 4GB of RAM and I’m using two port-groups for the NICs, br-ext and br-int, as described earlier in this post.

I also upgraded the VM hardware version to 13 (the OVA comes as version 10). This was based on a blog post I read in the middle of the night. I wish I could say why this mattered (JTAC suggested this only improves things when using KVM-based VMX and SR-IOV, but hey).

Summary of vCP

Now onto the vFPC VM:

VMWare details of vFPC VM

As you can see in the screenshot, I have set the 16GB of memory to be reserved. This helped with performance, particularly of my testing VMs running on the same host. I have also expanded one of my ‘WAN’ interfaces to show that it’s an E1000 NIC connecting to one of my Linux hosts.

The VM hardware version of my working vFPC is version 10.

Summary of vFPC

It’s best to set up all of this hardware in advance of switching either of the VMs on. Once you do, your vFPC should pull down a DHCP address from your br-ext bridge (mine is set up as a port group on my vSwitch0, which also shares kernel management for the ESXi itself). The vCP won’t get a DHCP address by default, as that’s not supported on fxp interfaces. I configure mine via the ESXi console.

Is it working?

Once you’ve booted both VMs, you will need to give them about 4-5 minutes. From my own bashing around in the log files, it seems that the vFPC pulls down some config from the vCP and then starts up RIOT, the process which is meant to emulate the MX series’ Trio chipset.

Note – under 17.4R1.16, the vFPC won’t work correctly by default (we set our interfaces to e1000) – so you will need to do the following to enable lite-mode, from the vCP CLU (login as root, no password. Then enter ‘cli’)

This (plus a reboot of the vFPC VM for good measure) will put you into lite-mode. Once this reboot (~5mins) process has finished, you can check 2 important things from the vCP CLI. First, check the chassis hardware and see if we’re in lite-mode for real:

From here, you can see FPC 0’s CPU is listed as RIOT-LITE. That’s what we wanna see.

Next, you can check the status of the FPC itself:

This garbled-by-my-wordpress-theme output shows the FPC in slot 0 is up and running. The temperatire will never move on from ‘testing’ as it’s not a real probe (but it is on a real Trio-based FPC!)

To test the performance (another post on that one day, perhaps) – I fire some packets from VM1 to VM2. They rely on the VMX to do the routing, as they are in different subnets. I’m using some quite expensive hardware/software here to send a few packets around a pretend network – but it proves the thing works:

So there we go, a VMX in lite-mode, throwing 1Gbit/s of iperf traffic around.

Things that might be going wrong

Getting to this stage took me a while, so here are some things you might be finding are going wrong trying to use ESXi and VMX together.

1- Can’t access vFPC

This might be caused by a fairly random problem I’ve seen in 17.4R1 where 2 of the 3 NICs that the vFPC automatically stands up don’t show. You will be left with ‘int’ only. Console into the vFPC and have a look (root/root will get you in):

That shows 3, so in my case it’s working as you’d hope

2 – Throughput sucks

Check your VMX license is applied. Even the trial license is good enough for most lab cases.

You can see here I have a 1000Mbit license for bandwidth. Go me.

If you have a license applied and throughput still sucks, you might have a resource problem or some other issue. These can maybe be discussed in the comments below, but you might do better running up a thread in the Juniper official VMX support forum. Good luck!



08 Nov

Step by step guide: Preparing a Debian VM for Junos Automation

This is a bit specific, and, like most of my posts – a cheap way for me to remember something next time I need to do it 🙂

I am currently obsessed with network automation. My favourite ‘stack’ at the moment is Ansible, git and the Juniper Ansible libraries. There are a thousand ways to skin this particular cat, but for my current project (enforcing ‘golden config’ across a large number of devices) – this limited number of tools does the job.

As with most cool new tech, there are hundreds of posts and docs, most of which are similar enough to give the illusion of cohesion, but all critically different when it comes to the nitty-gritty, causing confusion and angst. At least, that’s my impression.

So – if you want a Junos Automation machine, ready to attack your network with Python and Ansible, follow along.

I’m using Debian 8, a fresh install. Splat these commands in to set up the bits you will need. I have tested these and find they work, resolving the dependencies and resulting in no errors.

You’ll end up with Ansible installed in /etc/ansible, the freshest Juniper library for Python (version 2.7) interaction via Ansible. You’ll also have git installed, one for installing the Junos EZNC package and for future use.

My end goal here is to use this system to completely automate my network, but for the time being – we’re good to start using Ansible to take baby steps towards that goal.

I have created a directory called lab-automation, and in it three sub-directories. One called scripts (for my playbooks and shell scripts), one called logs and the other called configs, for my configs! I have created a basic Ansible playbook, which uses the previously installed Juniper.junos role, and connects to my lab routers (mx1-mx4, as defined in the /etc/ansible/hosts file and my /etc/hosts file).

This will run through all my defined hosts, and using the Juniper role (installed previously, living in /etc/ansible/roles) – grab my router configs and store them in ‘configs’ directory. Note, I am using a username (same as my Linux user) and SSH key authentication, because I hate passwords and refuse to learn how to use them in Ansible 🙂

If I run this playbook, what happens?

Great. My files are pulled down from the network. I can do all kinds of fun things with the Juniper Ansible library – and so can you. Check it out here.

04 Sep

Juniper to Fortinet ISIS configuration

Hoo boy. I have been trying to configure a small mesh network for a fault-resilient office setup. In my network, I have a ‘square’ setup, two VMX routers, two Fortigate virtual firewall appliances, all running on top of ESXi 6.5 (two physical hypervisors). It looks like this:

Anyway. In order to redistribute the default route(s) received from the upstreams, I wanted to use iBGP inside the ‘square’ of devices.. iBGP relies on an IGP, so I chose the coolest one available, ISIS.

This is a very simple setup, but there was no way I could get an adjacency to form between the router and firewall (green to black in the diagram). I tried 100 things (changing hello intervals (pointless!), LSP generation times, MTU, MTU, MTU and several other desperate things like disabling hello-padding, enabling and disabling ‘adjacency checking’ on the Forti-devices).. Nothing.

Eventually, I enabled trace-options on the Junos side of things – I could see my adjacencies with the Forti-devices stuck in the ‘Initializing’ phase, implying the three-way-handshake was busted.. The traceoptions showed some guff, but nothing that pointed to an easily solvable problem (i.e. not MTU)..

Finally, using the debug features of the Fortigate box, I found:

Bearing in mind, there isn’t a single bit of IPv6 config on any of these devices (yet, it’s going to be fully dual stack, don’t worry!) – so what was up.. Turns out, the Fortigate devices were a bit sensitive, and needed the following knob in my Juniper ISIS config:

All of a sudden.. My ISIS adjacencies are up and solid.

Hopefully this will be useful to some sucker in future who chooses to use ISIS in their corporate network 🙂


01 Aug

Quick recipe for Layer2 Circuit local switching

I am always forgetting how to do l2circuits in Juniper, partially as there seem to be a zillion ways to configure encapsulation and VLAN handling, all of which seem to commit but seemingly very few seem to work.

This is a super quick note-to-self which describes how to locally switch (could simply be extended to LDP-signalled l2circuit over an MPLS core though) a point-to-point circuit, one end VLAN tagged and the other end untagged.

For this example, we have two interfaces – both on a single MX router called mx2.lab. Our ‘tagged’ or NNI facing interface is xe-0/0/1, and we’re using VLAN 250. Our ‘untagged’ or CPE facing interface is xe-2/2/1, not using a VLAN at all (dedicating the whole interface). This can (again) be expanded to use S/C tags, multiple encapsulations etc, but I’m not going there yet.

What we’re aiming to see is traffic coming in on a VLAN tagged interface and being locally switched to an untagged interface. To lab this, I have a VLAN-tagged interface with IP and an untagged VM, sitting on – when it’s configured, they should be able to ping one another.


We need 3 chunks of config to make this config work:

  • The tagged interface

  • The untagged interface

  • The l2circuit config

With that config loaded on mx2.lab, packets will fly between the untagged VM on xe-2/2/1 and xe-0/0/1.250


11 Jul

How to connect your Raspberry Pi to eduroam

Note – I took much of the code snippet here from ‘Sruc‘ on the RPI forums, but wanted to post a clear method that I know works. Cheers Sruc!

The eduroam network (for universities, researchers and highschools around the world) is a great thing. One login lets you connect to wifi access points all over the place, as long as you’re enrolled in or working for a participating organisation.

One thing that bugged me out of the box with the Raspberry Pi (in my case, a Raspberry Pi 3 running Pixel) – was the Enterprise WPA wifi not working out of the box.

Follow these simple steps to get it working:

  • Open a Terminal from your Pi’s gui (or just use the shell if you don’t have a gui!)
  • Open up the wpa_supplicant.conf file:

  • Paste in the following, changing the bits you normally use to log in to eduroam (your university/whatever email and password is normally what you use for authentication)

(Add this snippet below what’s already in the file, change the ‘identity’ and ‘password’ fields!)

  • Save and exit the editor (in nano that’s CTRL-O, Enter, CTRL-X)
  • Now we need to tell the Pi to reload the file, again, in the Terminal or shell

  • I find a reboot here is necessary, so flip the Pi and wait for it to boot. When it returns, you should be connected to eduroam (as long as your Pi can see the eduroam SSID!)

Note – I am not sure if this will work for all instances of eduroam, as some Universities etc handle authentication differently – please check your organisation’s help pages or get in touch with them first – they usually have a guide.

20 Jun

Very useful Ubuntu 16 Networking Note

I hate when things change for no good reason. This week, it’s the interface naming of ethernet on Ubuntu 16. No more does it default to ‘eth0’.. It uses some other ‘ens’ style.. Garbage!

First up, find your ethernet interfaces (this VM has 1 interface to start):

Bah, looks gross!

Fix it by editing your grub config:

Change the line GRUB_CMDLINE_LINUX=""

to   GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0"

Regenerate your grub file:

Edit your /etc/network/interfaces file, change the names to eth0, eth1 etc

Reboot, and voila.

If you add a new interface, it will come on as ethX, following the already provisioned interfaces.

Now it looks better (I added a new 10G interface, and it comes in as eth1)

Awww yeeeeah

25 Mar

Openstack Newton – Provider Network Issue

When playing with an Openstack POC recently, I nearly pulled my hair out. I am running a flat provider network between my compute nodes (all Ubuntu 16.04), which connect via  Cisco 2900 with an inbuilt switch module. The Cisco has gateway addresses for the dual-stack host networks. I was using native IPv6 and NAT’d private space for IPv4.

Whenever I went to launch an instance, DHCP would work (SLAAC for v6), and the Horizon front-end would show the generated addresses assigned to the instance. Looks good. Going into the console of the instance, I’d see (with ifconfig) no IP addresses on my host NIC.. Looking in the “neutron-dhcp-agent.log” log, I would see:

2017-03-24 23:49:44.169 2476 WARNING stevedore.named [req-6710e8a6-5991-446e-b8d2-5af6c9d27625 – – – – -] Could not load neutron.agent.linux.interface.BridgeInterfaceDriver

Whenever an instance was on. Cycling over and over for the number of instances. When I looked at the bridge status (in my topology the eno4 phyical interface of the compute/controller nodes are connected to the physical provider network), I would not see eno4 in the bridge created to connect hosts to physical. The way Openstack Neutron does this is to build a bridge, then add the physical and TAP interfaces). Mine was missing. Why…

Turns out – I had an IPv6 DHCP scope on my Cisco provider network interface facing the Openstack environment. As soon as I removed this piece of config (and simply left the IPv4 and IPv6 gateways on that interface) – eno4 showed up in the bridge and it all went smoothly.

What a mission.

09 Jan

Xiaomi Mi 5s – Overcoming a fake ROM


I bought a fancy pants new Xiaomi phone. My Samsung S5 was getting a bit long in the tooth. I wanted another flagship, but Samsung’s ROMs have been bloating up recently and while I’m a custom ROM fanboy, I was getting sick of Cyanogenmod and the occasional app not working with an unlocked bootloader, custom ROM etc.

So I got a Xiaomi Mi 5S for about $400 NZD from Banggood. The phone and case arrived 7 days after I ordered, so far so good. What I didn’t know in advance was Banggood (and other vendors) sneakily put their own ROM on the phone which is rambo jambo’d with Malware, Adware and can’t receive OTA (over the air) software updates – in short, it’s a shit sandwich.

The MIUI forums are a great place to get help, but if you have a Banggood Mi 5s phone with an 8.x.x.x fake ROM – you can probably follow these steps.

  1. First up, confirm you are on a fake ROM. This page is a good way to do so. I had and it looked bad news. I couldn’t do a software update or run Pokemon Go, so I decided I was affected and moved on.
  2. Unlock your Boot Loader. What does this mean? Well, it’s a setting on an Android phone that lets you install custom software. In our case, we’re actually trying to do the opposite, by rolling this dodgy fake ROM back to the official MIUI release. Nevertheless, we need to unlock our BL. Xiaomi makes us do this via logging in to their website and requesting an unlock on our Mi account. Simply create a Mi account and request an unlock. It might take a few days – mine took approx. 72 hours to have a verification SMS.
  3. Once we have the confirmation that the Mi account is enabled for unlocking, we need to download some software (unfortunately you need a Windows PC for this – although I did succeed using a VMWare Fusion VM on OSX). This software is the Unlocking app – we need to log in to the Mi account on the phone and inside this programme simultaneously – otherwise it will stall on 50% in the unlocking process. So – click here to get the latest Mi Flash.
  4. Open Mi Flash programme, turn your phone on in Recovery mode (Hold volume Up/Pwr button at the same time, from when the phone is off). Log in on the application and select Unlock.. This should work smoothly.
  5. Now, we have unlocked the bootloader, we should be able to use the Update application on the phone itself (usually in a folder called Tools). Best bet is to go to the official source and grab a Global Rom (assuming you don’t want a Chinese one) – over here. Once you have the rom, put it on your phone’s internal storage in a folder called downloaded_rom (create one if it doesn’t exist), open the aforementioned Update app and select the .zip file you just uploaded. It will take about 45 mins to work.

Mean buzz – you have a official Xiaomi phone/ROM – not some dodgy crippleware. Instead of asking questions here, I recommend heading to the MIUI forums, already linked. Try and only use official software from Xiaomi/MIUI.. Not everyone out there is friendly.


07 Nov

Source Specific Multicast with iperf

oooh iperf ssm

As part of my lab at work, I need to have lots of traffic flying around all over the place. Unicast IPv4 and IPv6, of course – but also traffic in L3VPNs and multicast traffic. Multicast is a big part of the day-to-day traffic load in my production network, so it’s important to be there in the lab too.

I’ve used a variety of tools to generate multicast traffic in the past, more often than not the excellent OMPing. This time, however, I wanted to really chuck some volume around, to make my stats look nice and to really show up on NFSen. iperf is the natural choice for generating lots of traffic in a hurry, I’m using iperf3 to throw about 20Gbit/s around the lab core, but for some reason the developers have removed multicast functionality from the new and improved iperf3.

ASM (or any-source multicast) is (more) traditional multicast. It relies on interested parties joining a group (a multicast address) where any sender can be considered the source (*,G). It works well when your environment is well set up, with a rendezvous point and protocols such as PIM-SM (and MSDP when you wanna go multi-domain). Iperf handles this pretty easily, simply set up a server to ‘listen’ on one end, and a client to send on the other. This can be achieved by having two hosts, connected via a router which acts as the PIM-SM rendezvous point and has IGMPv2 on the interfaces, so the listener can tell it’s router that it’s keen on having a listen.

On the listener:
iperf -s -B -u -f m -i 1

On the sender:
iperf -c -u -b 100m -f m -i 1 -t -1 -T 32

On the listener, we’re telling iperf to be a server (-s), listening to an ASM multicast group (-B), using UDP (-u), formatting the output in megabit/s (-f) and telling us what’s up every 1 second (-i).

On the sender, we’re telling iperf to be a client (-c), using UDP, sending a 100Mbit/s stream (-b), sending forever (-t) and setting the multicast TTL to 32 (-T) – overkill, but hey.

This is a quick and easy way to check a basic ASM setup. I use it to confirm my multi-domain lab setup is working with multicast as I would expect and to generate lots of traffic to amuse myself.

Getting source specific

Source specific multicast is a bit cooler than ASM, because it should work for more people, more easily. If you have a single source (like a TV encoder, or some kind of unique data source) that you want plenty of receivers to see, and you have a nice way of telling them about your source address (like a higher-layer application – or in our case, manual config) – then SSM is the protocol for you.

To make SSM work, you really only need IGMPv3 in your network. Most *nix OS’s and even Windows supports IGMPv3 – usually by default. To check on your *nix host, you can run:
cat /proc/sys/net/ipv4/conf/eth0/force_igmp_version

If that returns a 0, and you’re on a modern-ish OS, you’re good to go. You can force your kernel to use IGMPv3 per interface, with:
echo "3" > /proc/sys/net/ipv4/conf/eth0/force_igmp_version

While most OS’s support IGMPv3 out of the box, plenty of network administrators in your friendly local LAN have probably forgotten to turn it on, or have left it with IGMPv2 and called it a day. No good.

Assuming you’re all good with IGMPv3, you then need to have an application which can listen not only for a particular multicast group address, but also a specific source. iperf, by default, doesn’t support this. Lucky for us, then, noushi has written an extension to iperf that allows for a couple of new flags, to set a specific source and interface. You can grab it here.

If you already had iperf installed, remove it. In Debian/Ubuntu, that would be:
sudo apt-get autoremove iperf -y

Unzip the file from git, then we compile:
sudo make install

Now we can run a client to send multicast as before (but this time, sending to the SSM range):
iperf -c -u -b 10m -f m -i 1 -t -1 -T 32 -p 5002

I’m running it on a different port (-p) from default, because I want SSM and ASM at the same time.

For the listener, we need to tell our newly improved iperf to listen for a particular source (the regular old IP address of our sender):
iperf -s -B -u -f m -i 1 -p 5002 -O -X eth0.101

The two new flags are there, along with our SSM multicast group (and new port). The -O flag sets the SSM source and -X tells iperf to use a particular interface. I’m not sure -X is new, but I’ve never used it before – so let’s say it is.

If it’s working, we’ll see a 10Mbit/s SSM stream turn up on the listener.

27 Oct

nfsen + debian + apache = d’oh

I was re-doing one of my lab monitoring tools, a VM that hosted too many sparse and poorly maintained pieces of software. Now re-homing each bit onto its own VM (partially for sanity) – I ended up re-installing the excellent NFSen (a netflow monitoring tool/frontend for nfdump).

The software includes a directory named ‘icons’ in the web root, which doesn’t seem insane to me. What is insane, however, is Apache’s decision (by default!) to include an alias for a folder named ‘icons’ in the root. That means that without knowing it, the NFSen icons folder was being redirected to /usr/share/apache2/…/ whatever. That caused a headache.

To find this out, I ran:
cd /etc/apache2
grep -iR /usr/share *

This told me about the dang alias file, /etc/apache2/mods-available/alias.conf

I went into that file, commented out this dumb default, reset apache and now it’s away laughing.