Getting started with Helm Kubernetes Package Manager

In this short blog post, we will look at getting starting with Helm. Helm is a package manager for Kubernetes, similar to Apt, RPM, Yum to install Linux packages.  In the Cloud Native space  Helm can be used to install, update and package applications that are deployed to a Kubernetes cluster.

Installing packages with Helm is as simple as adding a repository with helm repo add <<repo_name>> << charts_url >> and then running helm install bitnami/nginx  <<name>. The quickstart guide on the Helm site is a great resource to get started – https://helm.sh/docs/intro/quickstart/

In addition to installing applications to a Kubernetes platform Helm can be used to package your own application for easy deployment.

Before we get into some of the details of creating a sample helm chart, let’s look at the main concepts. With Helm, there are some important concepts,

  • Charts –  A chart is information required to create an instance of a Kubernetes application.
  • Config – A collection of configuration files that gets packaged into the chart
  • Repo – A place to store charts
  • Release –  A release is a running instance of a chart that would have its own release name.  On a Kubernetes cluster, there can be more than one instance of the same chart each with its own release name.

Packaging an application into a Helm Chart

With the introductory concepts out of the way let’s look at creating a sample  Helm Chart.  At a very high level creating a helm chart involves defining variables in the Kubernetes manifest files and providing values for those variables as part of the Helm package. Helm then parses the manifest files through its templating engine substituting the defined variables.

In this example, I’m using the sample PHP GuestBook application, the manifest files can be found here PHP GuestBook. The application contains a front-end pod with a front-end service and a mongo DB pod with its service. To get started we will run,

helm create guestbook

This will create a directory structure that looks like the below output with some boilerplate manifests that we can leverage. To package the guestbook application we will place the deployment and service manifest in the template directory, we can also get rid of the boilerplate YAML files.


├── Chart.yaml
├── charts
├── templates
│   ├── NOTES.txt
│   ├── _helpers.tpl
│   ├── deployment.yaml
│   ├── hpa.yaml
│   ├── ingress.yaml
│   ├── service.yaml
│   ├── serviceaccount.yaml
│   └── tests
│   └── test-connection.yaml
└── values.yaml

We will now edit the mongo deployment manifest and add variables in the YAML file. Helm provides built-in objects such as the Release, Chart, Values objects that we can use in our manifest files, In this case, I have defined values for variables in the helm values.yaml file.
The frontend pods and the mongo DB pods can also be split into two different charts with their own values.yaml. In this way, the mongo DB pod can be updated independently from the frontend application

Below spec is the deployment and service manifest for the frontend pod. I’ve provided variables for the name of deployment/service, the number of replicas, image repository and image version. Similar variables are added to the deployment and service manifest of the mongo DB pod.


apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{.Values.fe_deploy.name}}
  labels:
    app.kubernetes.io/name: guestbook
    app.kubernetes.io/component: frontend
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: guestbook
      app.kubernetes.io/component: frontend
  replicas: {{.Values.fe_deploy.replicas}}
  template:
    metadata:
      labels:
        app.kubernetes.io/name: guestbook
        app.kubernetes.io/component: frontend
    spec:
      containers:
      - name: {{.Values.fe_deploy.name}}
        image: {{.Values.fe_deploy.images.repository}}:{{.Values.fe_deploy.images.version}}
        # image: {{.Values.fe_deploy.images.repository}}:{{.Values.fe_deploy.images.version}}
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        env:
        - name: GET_HOSTS_FROM
          value: dns
        ports:
        - containerPort: 8
---
apiVersion: v1
kind: Service
metadata:
  name: {{.Values.fe_svc.name}}
  labels:
    app.kubernetes.io/name: guestbook
    app.kubernetes.io/component: frontend
spec:
  # if your cluster supports it, uncomment the following to automatically create
  # an external load-balanced IP for the frontend service.
  type: NodePort
  ports:
  - port: 80
    targetPort: 80
    nodePort: 30007
  selector:
    app.kubernetes.io/name: guestbook
    app.kubernetes.io/component: frontend
---

Now that variables are defined in the Kubernetes manifest files we need to provide values for these variables. To do so we will edit the values.yaml in that directory and define values for our variables.


mongo_deploy:
 name: gb-mongo
 replicas: 1
 images:
  version: 4.2

mongo_svc:
 name: mongo


fe_deploy:
 name: gb-frontend
 replicas: 3
 images:
  repository: paulczar/gb-frontend
  version: v5

fe_svc:
 name: gb-frontend-svc

To test that our helm package works and the variables are substituted correctly we can run helm install with the –dry-run option. The output should provide Kubernetes manifests with the defined values.

helm install guestbook –dry-run ./guestbook

To install our application we run

helm install guestbook ./guestbook

We can validate the status of our application by running kubectl get pods to verify if all the pods are running. We can run helm list to list installed charts



~/ helm list
NAME             NAMESPACE REVISION  UPDATED                                STATUS   CHART              APP VERSION
guestbook        default   1         2021-06-10 02:33:16.280572 +1000 AEST deployed  guestbook-0.1.0    1.16.0
nginx-1623140150 default   1         2021-06-08 18:15:54.889611 +1000 AEST deployed  nginx-9.1.0        1.21.0

Layer 2 Bridging with VMware NSX-T

VMware NSX-T DataCenter ships with native L2 bridging capability. If you have been running NSX for vSphere and have requirements for L2 Bridging you would be familiar with the software based L2 bridging capability which is part of the functionality of the Distributed Router.
L2 bridging has a few use-cases within the datacenter and its not uncommon to see L2 bridging being configured to allow applications that reside on overlay logical switches direct layer 2 access to machines that reside outside the logical space or to physical machines residing in a VLAN
With NSX for vSphere, this feature is provided by the Distributed router. The vSphere host that runs the Distributed router control-VM acts as the bridge instance and performs the bridging function.

In NSX-T DataCenter there are now 2 ways to bridge Overlay-VLAN networks,
* L2 bridging with ESXi host transport nodes
* L2 bridging with NSX Edge transport nodes

In this article, we will look at configuring L2 bridging using ESXi host transport nodes.

To configure an ESXi host as a bridge host we first need to create a bridge cluster. The hosts in the bridge cluster need to belong to an overlay transport zone

Creating the ESXi Bridge cluster

Under Fabric –> Nodes –> ESXi Bridge Clusters. Click Add and select the hosts that will act as a bridge cluster.

The next step is to attach this bridge cluster to the logical switch to which we want to bridge traffic. Here we specify the VLAN to bridge and turn on HA if required

And that’s it, we should now have L2 bridge created between the overlay logical switch and the VLAN L2 segment. To validate this further we can login via SSH to the active bridge ESXi host and run the below commands,

The “nsxcli -c get bridge” command will list the bridge ID and list the number of networks on this bridge instance

We can also run “net-bridge -X -l” which will list further information about the configured bridge. In this case, we can validate if BFD is up, state of the bridge,  the VLAN ID that is being bridged, the bridge nodes in this bridge cluster and also some stats on the number of packets ingressing/egressing this bridge instance.

As our bridge instance state is UP we can validate if the bridge is learning MAC addresses in the VLAN segment using the “net-bridge –mac-address-table <UUID>” command

Summary

In summary, NSX-T DataCenter provides the flexibility to bridge overlay networks with VLAN networks if you have the requirement to do so.
As bridging is done in software performance is the biggest concern with this solution. To mitigate performance issues the design should look at spreading out the load across multiple bridge instances.

Promiscuous mode with NSX Distributed firewall

TL;DR – When running appliances that require promiscuous mode it’s advised to place the appliance in the DFW exclusion list, especially if the appliance is a security hardened virtual appliance

Last year I was working with a customer that had rolled out NSX to implement micro-segmentation. This customer is a mid-sized customer that is integrating NSX into their existing production vSphere environment. The customer had decided to implement security policy based on higher level abstractions i.e. security groups but also decided they need security policy enforced everywhere, using the Applied To column set to Distributed Firewall. Not an ideal implementation but perhaps that discussion is for another blog post

The environment being a brownfield environment, the customer wanted to implement security policy to the existing virtual appliances. They were using F5 virtual appliance for load balancing and had specific requirements to configure promiscuous mode on the F5 port group to support VLAN groups(https://support.f5.com/kb/en-us/products/big-ip_ltm/releasenotes/product/relnotes_ltm_ve_10_2_1.html)

Long story short, by configuring the F5 port groups in promiscuous mode they were facing intermittent packet drops across the environment. We were quickly able to identify that the workloads impacted were the workloads that resided on the host that had the F5 virtual appliances. As we could not power off the appliance we decided to place the F5 appliance in an exclusion list and like magic, everything returned to normal, no packet drops.

Well now that we understood what was causing the issue but as with all things we needed to understand why this behaviour was seen. We started by picking a virtual machine on the F5 host and turned up logging on the rule that allowed traffic for a particular service, in this case, SSH. We also added a tag to that rule so we could filter the logs.

We started to follow an SSH session that we initiated from 10.68.0.250 to 10.68.145.170. We were seeing the SYN from 10.68.0.250 reaches 10.68.145.170 and this is allowed by the firewall as rule 1795 permits it but interestingly enough we were also seeing the same traffic flow on a different filter, in this case, 53210

 

/var/log/dfwpktlog.log
T01:29:48.363Z 28341 INET match PASS domain-c7/1795 IN 52 TCP 10.68.0.250/65448->10.68.145.170/22 S 12345_ADMIN
T01:29:48.363Z 53210 INET match PASS domain-c7/1795 IN 52 TCP 10.68.0.250/65448->10.68.145.170/22 S 12345_ADMIN

 

Note: You can get the filter hash for a virtual machine by running vsipioctl getfilters. Each vNIC gets a filter hash

 

[root@esx101:~] vsipioctl getfilters

Filter Name : nic-11240904-eth0-vmware-sfw.2
VM UUID : 50 1e 02 7b b0 88 05 47-c2 ed dd 92 22 93 12 1d
VNIC Index : 0
Service Profile : –NOT SET–
Filter Hash : 28341

 

Looking at the output of vsipioctl getfilters we identified that the duplicate filter (53210) belonged to the F5 virtual appliance.

We also captured packets at the switchport of the virtual machine and we see 10.68.145.170 sending a SYN/ACK back to the client, this packet was also seen on both filters. The subsequent frame shows the client immediately sending a RST packet to close the connection. This was a bit strange as the firewall rule allows this traffic and there was no other reason why the client would immediately send a RST packet

Screen Shot 2018-01-17 at 2.07.11 PM

 

Interestingly the dfwpktlogs had recorded packets on the duplicate filter hitting a different rule (rule 1026) and this rule had a REJECT action so a RST packet would be sent out for TCP connections


/var/log/dfwpktlog.log
T01:29:48.364Z 53210 INET match REJECT domain-c7/1026 IN 52 TCP 10.68.145.170/22->10.68.0.250/65448 SA default-deny
T01:29:48.364Z 53210 INET match REJECT domain-c7/1026 IN 40 TCP 10.68.145.170/22->10.68.0.250/65448 R default-deny

 

Why would traffic be hitting a different rule when the initial SYN was evaluated by rule 1795 ??

So, when a virtual machine is connected to a promiscuous portgroup its filter is also in promiscuous mode and hence its filter would receive all traffic. In this particular case, the F5 appliance was placed in security group that had its own security policy applied to it. Duplicated traffic that was sent to this filter was being subjected to this security policy. The security policy had a REJECT action configured and hence a RST packet was being sent to the client

Having identified this, we provided the customer with a couple of options to mitigate the issue.

I hope you found this article to be as interesting as this issue was to me.Until next time !!

Powercli script to run commands on ESX hosts

Recently while working with a customer on a large scale NSX environment we hit a product bug that required us to increase the memory allocated to the vswfd (vShield-Statefull-Firewall) process on the ESXi host.

There were 2 main reasons we hit this issue,
1. Due to high churn in the distributed firewall there was a significantly high number of updates the vsfwd daemon has to process
2. Non-optimized way of allocating memory for these updates caused vsfwd to consume all of its allocated memory.

Anyway the purpose of this article is not to talk about the issue but a way to automate the process of updating the vsfwd memory on the ESXi hosts. As this was a large scale environment it was not ideal to manually edit config files on every ESXi host to increase the memory. Automating this process would eliminate errors arising due to manual intervention and keeps configuration consistent across all hosts.

I started writing the script in python and was going to leverage the paramiko module to SSH and run commands. I quickly found it a bit difficult to manage the numerous host IP addresses with python so I switched to PowerCli as I could use the Get-VMhost command to get the list of hosts in a cluster.

I’ve put the script up on the github for anyone interested – script

Until next time and here’s to learning something new each day!!!

Capturing and decoding vxlan encapsulated packets

In this short post we will look at capturing packets that are encapsulated with the VXLAN protocol and how to decode them with wireshark for troubleshooting and debugging purposes. This procedure is handy when you want to analyze network traffic on a logical switch or between logical switches.

In order to capture packets on ESXi we will use the pktcap-uw utility. Pktcap-uw is quite a versatile tool, allowing to capture at multiple points on the network stack and even trace packets through the stack. Further details on pktcap-uw can be found in VMware product documentation –here

The limitation with the current version of pktcap-uw is that we need to run 2 sets of commands to capture both egress and ingress. With that said lets get to it.. In this environment I will capture packets on vmnic4 on source and destination ESXi hosts.

To capture VXLAN encapsulated packets egressing uplink vmnic4 on the source host

pktcap-uw --uplink vmnic4 --dir 1 --stage 1 -o /tmp/vmnic4-tx.pcap

To capture VXLAN encapsulated packets ingressing uplink vmnic4 on the destination host

pktcap-uw --uplink vmnic4 --dir 0 --stage 0 -o /tmp/vmnic4-rx.pcap

If you have access to the ESXi host and want to look at the packet capture with the VXLAN headers you can use the tcpdump command like so,

tcpdump

This capture can further be imported into wireshark and the frames decoded. When the capture is first opened wireshark displays only the outer source and destination IP which are VXLAN endpoints. We need to map destination UDP port 8472 to the VXLAN protocol to see the inner frames

To do so, open the capture with Wireshark –> Analyze –> Decode As

vxlan_decode

Once decoded wireshark will display the inner source and destination IP address and inner protocol.

vxlan_decap

I hope you find this post helpful, until next time!!