kubernetes Network Containers,Pods,Services,Node


One of the diffcult things to learn about Kubernetes as a software analyst is how the networking works.

There are several essential things to understand about networking in Kubernetes
  • Communication between containers in the same pod
  • Communication between pods on the same node
  • Communication between pods on different nodes
  • Communication between pods and services
  • How does DNS work? How do we discover IP addresses?
  • How to use a Volume to communicate between two Containers running in the same Pod
Kubernetes is a powerful container management tool that automates the deployment and management of containers. Kubernetes (k8’s) is the next big wave in cloud computing.
When it comes to running containers in production, you can end up with dozens, even thousands of containers over time. These containers need to be deployed, managed, and connected and updated; if you were to do this manually, you’d need an entire team dedicated to this.

Before proceeding Let’s get a Kubernetes cluster deployed locally & recall some concepts:


Installing kubectl

Kubernetes’ command-line tool, kubectl, is used to manage a cluster and applications running inside it. Here’s how to install it on Windows, Linux, and Mac:
Windows
curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/windows/amd64/kubectl.exe
Linux
curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl
Mac
curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/darwin/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl
Verifying your setup
kubectl version --output=yaml

Pods
A pod is the lowest unit of an application in Kubernetes. Now, before we move on, we need to get one thing straight — and that is a pod is not equal to a container in the Docker world. A pod can be made up of multiple containers. If you have come from a pure Docker background, this can be hard to wrap your head around. If a pod can have more than one container, how does it work? There are some limits we need to be aware of. A pod has the following:
  • A single IP address
  • Share localhost
  • A shared IPC space
  • A shared network port range
  • Shared volumes
The containers in a pod talk to each other via local host, whereas pod-to-pod communication is done via services.
ReplicaSets
Pods are mortal, and if they die, that is the end of them. What if you want to have three versions of the same pod running for availability?
Enter the replication controller.
The main responsibility for the replication controller is to prevent against failure, and it sits above the pod resource type and controls it. Let’s look at an example. I want to deploy 4 of my pod x, this time I would create a replica set. A replica set has a defined number of pods that needs to be running, in this case 4. If one of the pods fails or dies, the replication controller will start a new pod for me and again, I will have 4 of pod x running. So, this functionality looks after the issue we mentioned earlier about pods being mortal.
Services
If we want to have connectivity to our pods, we will need to create a service. In Kubernetes, a service is a network abstraction over a set of pods. This allows for the traffic to be load balanced for failures. A service allows Kubernetes to set a single DNS record for the pods. As we mentioned earlier, each pod has a separate IP address.
Deployments
The deployment resource type sits above a replica set and can manipulate them. Why would we want to manipulate a replica set? Replica sets are all or nothing. If you need to do an upgrade, you need to replace the replica set. This action will cause downtime to your application.
One of the main benefits of Kubernetes is high availability. Deployments give us the functionality to do upgrades without downtime. As you do in a replica set, you specify the number of pods you would like to run. Once you trigger an update a deployment will do a rolling upgrade on the pods, all while making sure the upgrade is successful on the pod before moving to the next one.
While Kubernetes is opinionated in how containers are deployed and operated, it is very non-prescriptive of how the network should be designed in which Pods are to be run. Kubernetes imposes the following fundamental requirements on any networking implementation (barring any intentional network segmentation policies):
    • All pods can communicate with all other pods without NAT
    • All nodes running pods can communicate with all pods (and vice-versa) without NAT
    • IP that a pod sees itself as is the same IP that other pods see it as

Communication between containers in the same pod

First, if you've got two containers running in the same pod, how do they talk to each other?
This happens via localhost and port numbers. Just like when you’re running multiple servers on your own laptop.
This is possible because containers in the same pod are in the same network namespace – they share networking resources.

What is a network namespace?

It’s a collection of network interfaces (connections between two pieces of equipment on a network) and routing tables (instructions for where to send network packets).
Namespaces are helpful because you can have many network namespaces on the same virtual machine without collisions or interference.
(You wouldn’t want all your pods to run containers that listen on port 3000 in the same namespace – they’d all collide!)
There’s a secret container that runs on every pod in Kubernetes. This container’s #1 job is to keep the namespace open in case all the other containers on the pod die. It’s called the pause container.
So, each pod gets its own network namespace. Containers in the same pod are in the same network namespace. This is why you can talk between containers via localhost and why you need to watch out for port conflicts when you’ve got multiple containers in the same pod.
Communication between pods on the same node
Each pod on a node has its own network namespace. Each pod has its own IP address.
And each pod thinks it has a totally normal ethernet device called eth0 to make network requests through. But Kubernetes is faking it – it’s just a virtual ethernet connection.
Each pod’s eth0 device is actually connected to a virtual ethernet device in the node.
A virtual ethernet device is a tunnel that connects the pod’s network with the node. This connection has two sides – on the pod’s side, it’s named eth0, and on the node’s side, it’s named vethX.
Why the X? There’s a vethX connection for every pod on the node. (So they’d be veth1, veth2, veth3, etc.)
When a pod makes a request to the IP address of another node, it makes that request through its own eth0 interface. This tunnels to the node’s respective virtual vethX interface.
But then how does the request get to the other pod?
The node uses a network bridge.

What is a Network Bridge?

A network bridge connects two networks together. When a request hits the bridge, the bridge asks all the connected devices (i.e. pods) if they have the right IP address to handle the original request.
(Remember that each pod has its own IP address and it knows its own IP address.)
If one of the devices does, the bridge will store this information and also forward data to the original back so that its network request is completed.
In Kubernetes, this bridge is called cbr0. Every pod on a node is part of the bridge, and the bridge connects all pods on the same node together.

Communication between pods on different nodes

But what if pods are on different nodes?
Well, when the network bridge asks all the connected devices (i.e. pods) if they have the right IP address, none of them will say yes.
(Note that this part can vary based on the cloud provider and networking plugins.)
After that, the bridge falls back to the default gateway. This goes up to the cluster level and looks for the IP address.
At the cluster level, there’s a table that maps IP address ranges to various nodes. Pods on those nodes will have been assigned IP addresses from those ranges.
For example, Kubernetes might give pods on node 1 addresses like 100.96.1.1, 100.96.1.2, etc. And Kubernetes gives pods on node 2 addresses like 100.96.2.1, 100.96.2.2, and so on.
Then this table will store the fact that IP addresses that look like 100.96.1.xxx should go to node 1, and addresses like 100.96.2.xxx need to go to node 2.
After we’ve figured out which node to send the request to, the process proceeds the roughly same as if the pods had been on the same node all along.

Communication between pods and services

One last communication pattern is important in Kubernetes.
In Kubernetes, a service lets you map a single IP address to a set of pods. You make requests to one endpoint (domain name/IP address) and the service proxies requests to a pod in that service.
This happens via kube-proxy a small process that Kubernetes runs inside every node.
This process maps virtual IP addresses to a group of actual pod IP addresses.
Once kube-proxy has mapped the service virtual IP to an actual pod IP, the request proceeds as in the above sections.

How does DNS work? How do we discover IP addresses?

DNS is the system for converting domain names to IP addresses.
Kubernetes clusters have a service responsible for DNS resolution.
Every service in a cluster is assigned a domain name like my-service.my-namespace.svc.cluster.local.
Pods are automatically given a DNS name, and can also specify their own using the hostname and subdomain properties in their YAML config.
So when a request is made to a service via its domain name, the DNS service resolves it to the IP address of the service.
Then kube-proxy converts that service's IP address into a pod IP address. After that, based on whether the pods are on the same node or on different nodes, the request follows one of the paths explained above.

 

How to use a Volume to communicate between two Containers running in the same Pod


Create a Pod that runs two Containers

The two containers share a Volume that they can use to communicate. Here is the configuration file for the Pod:
pods/2-container-pod.yaml
image

Plain text:
apiVersion: v1
kind: Pod
metadata:
  name: 2-containers
spec:
  restartPolicy: Never
  volumes:
  - name: shared-data
    emptyDir: {}
  containers:
  - name: nginx-container
    image: nginx
    volumeMounts:
    - name: shared-data
      mountPath: /usr/share/nginx/html
  - name: debian-container
    image: debian
    volumeMounts:
    - name: shared-data
      mountPath: /pod-data
    command: ["/bin/sh"]
    args: ["-c", "echo Hello from the debian container > /pod-data/index.html"]


In the configuration file, you can see that the Pod has a Volume named shared-data.
The first container listed in the configuration file runs an nginx server. The mount path for the shared Volume is /usr/share/nginx/html. The second container is based on the debian image, and has a mount path of /pod-data. The second container runs the following command and then terminates.
echo Hello from the debian container > /pod-data/index.html

Notice that the second container writes the index.html file in the root directory of the nginx server.
Create the Pod and the two Containers:
kubectl apply -f https://k8s.io/examples/pods/two-container-pod.yaml
View information about the Pod and the Containers:
kubectl get pod two-containers --output=yaml


image
apiVersion: v1
kind: Pod
metadata:
  ...
  name: two-containers
  namespace: default
  ...
spec:
  ...
  containerStatuses:
  - containerID: docker://c1d8abd1 ...
    image: debian
    ...
    lastState:
      terminated:
        ...
    name: debian-container
    ...
  - containerID: docker://96c1ff2c5bb ...
    image: nginx
    ...
    name: nginx-container
    ...
    state:
      running:
You can see that the debian Container has terminated, and the nginx Container is still running.
Get a shell to nginx Container:
kubectl exec -it 2-containers -c nginx-container -- /bin/bash
In your shell, verify that nginx is running:
root@2-containers:/# apt-get update
root@2-containers:/# apt-get install curl procps
root@2-containers:/# ps aux
The output is similar to this:
USER       PID  ...  STAT START   TIME COMMAND
root         1  ...  Ss   21:12   0:00 nginx: master process nginx -g daemon off;
Recall that the debian Container created the index.html file in the nginx root directory. Use curl to send a GET request to the nginx server:
root@2-containers:/# curl localhost
The output shows that nginx serves a web page written by the debian container:
Hello from the debian container

Pod Networking

In Kubernetes, a pod is the most basic unit of organization: a group of tightly-coupled containers that are all closely related and perform a single function or service.
Networking-wise, Kubernetes treats pods similar to a traditional virtual machine or a single bare-metal host: each pod receives a single unique IP address, and all containers within the pod share that address and communicate with each other over the lo loopback interface using the localhost hostname. This is achieved by assigning all of the pod’s containers to the same network stack.
This situation should feel familiar to anybody who has deployed multiple services on a single host before the days of containerization. All the services need to use a unique port to listen on, but otherwise communication is uncomplicated and has low overhead.

Pod to Pod Networking

Most Kubernetes clusters will need to deploy multiple pods per node. Pod to pod communication may happen between two pods on the same node, or between two different nodes.
Pod to Pod Communication on One Node
On a single node you can have multiple pods that need to communicate directly with each other. Before we trace the route of a packet between pods, let’s inspect the networking setup of a node. The following diagram provides an overview, which we will walk through in detail:




Each node has a network interface – eth0 in this example – attached to the Kubernetes cluster network. This interface sits within the node’s root network namespace. This is the default namespace for networking devices on Linux.
Just as process namespaces enable containers to isolate running applications from each other, network namespaces isolate network devices such as interfaces and bridges. Each pod on a node is assigned its own isolated network namespace.
Pod namespaces are connected back to the root namespace with a virtual ethernet pair, essentially a pipe between the two namespaces with an interface on each end (here we’re using veth1 in the root namespace, and eth0 within the pod).
Finally, the pods are connected to each other and to the node’s eth0 interface via a bridge, br0 (your node may use something like cbr0 or docker0). A bridge essentially works like a physical ethernet switch, using either ARP (address resolution protocol) or IP-based routing to look up other local interfaces to direct traffic to.
Let’s trace a packet from pod1 to pod2 now:
  • pod1 creates a packet with pod2’s IP as its destination
  • The packet travels over the virtual ethernet pair to the root network namespace
  • The packet continues to the bridge br0
  • Because the destination pod is on the same node, the bridge sends the packet to pod2’s virtual ethernet pair
  • the packet travels through the virtual ethernet pair, into pod2’s network namespace and the pod’s eth0 network interface
Now that we’ve traced a packet from pod to pod within a node, let’s look at how pod traffic travels between nodes.
Pod to Pod Communication Between Two Nodes
Because each pod in a cluster has a unique IP, and every pod can communicate directly with all other pods, a packet moving between pods on two different nodes is very similar to the previous scenario.
Let’s trace a packet from pod1 to pod3, which is on a different node:



  • pod1 creates a packet with pod3’s IP as its destination
  • The packet travels over the virtual ethernet pair to the root network namespace
  • The packet continues to the bridge br0
  • The bridge finds no local interface to route to, so the packet is sent out the default route toward eth0
  • Optional: if your cluster requires a network overlay to properly route packets to nodes, the packet may be encapsulated in a VXLAN packet (or other network virtualization technique) before heading to the network. Alternately, the network itself may be set up with the proper static routes, in which case the packet travels to eth0 and out the the network unaltered.
  • The packet enters the cluster network and is routed to the correct node.
  • The packet enters the destination node on eth0
  • Optional: if your packet was encapsulated, it will be de-encapsulated at this point
  • The packet continues to the bridge br0
  • The bridge routes the packet to the destination pod’s virtual ethernet pair
  • The packet passes through the virtual ethernet pair to the pod’s eth0 interface
Now that we are familiar with how packets are routed via pod IP addresses, let’s take a look at Kubernetes services and how they build on top of this infrastructure.

Pod to Service Networking

It would be difficult to send traffic to a particular application using just pod IPs, as the dynamic nature of a Kubernetes cluster means pods can be moved, restarted, upgraded, or scaled in and out of existence. Additionally, some services will have many replicas, so we need some way to load balance between them.
Kubernetes solves this problem with Services. A Service is an API object that maps a single virtual IP (VIP) to a set of pod IPs. Additionally, Kubernetes provides a DNS entry for each service’s name and virtual IP, so services can be easily addressed by name.
The mapping of virtual IPs to pod IPs within the cluster is coordinated by the kube-proxy process on each node. This process sets up either iptables or IPVS to automatically translate VIPs into pod IPs before sending the packet out to the cluster network. Individual connections are tracked so packets can be properly de-translated when they return. IPVS and iptables can both do load balancing of a single service virtual IP into multiple pod IPs, though IPVS has much more flexibility in the load balancing algorithms it can use.
Note: this translation and connection tracking processes happens entirely in the Linux kernel. kube-proxy reads from the Kubernetes API and updates iptables ip IPVS, but it is not in the data path for individual packets. This is more efficient and higher performance than previous versions of kube-proxy, which functioned as a user-land proxy.
Let’s follow the route a packet takes from a pod, pod1 again, to a service, service1:



  • pod1 creates a packet with service1’s IP as its destination
  • The packet travels over the virtual ethernet pair to the root network namespace
  • The packet continues to the bridge br0
  • The bridge finds no local interface to route the packet to, so the packet is sent out the default route toward eth0
  • Iptables or IPVS, set up by kube-proxy, match the packet’s destination IP and translate it from a virtual IP to one of the service’s pod IPs, using whichever load balancing algorithms are available or specified
  • Optional: your packet may be encapsulated at this point, as discussed in the previous section
  • The packet enters the cluster network and is routed to the correct node.
  • The packet enters the destination node on eth0
  • Optional: if your packet was encapsulated, it will be de-encapsulated at this point
  • The packet continues to the bridge br0
  • The packet is sent to the virtual ethernet pair via veth1
  • The packet passes through the virtual ethernet pair and enters the pod network namespace via its eth0 network interface
When the packet returns to node1 the VIP to pod IP translation will be reversed, and the packet will be back through the bridge and virtual interface to the correct pod.

Pod
Like cells are for living organs, pods are for Kubernetes. Pods are the building blocks of the Kubernetes application. A pod has either one or more containers within, and they run on the same host and same network configuration and share common resources.


Node
Nodes are the worker machines in the Kubernetes cluster, like a virtual machine or physical machine. Every node has inbuilt services to run the pods.


Service
Service is an abstraction that defines a logical set of pods where the application is running.


Kubernetes Networking Conditions
The main purpose of Kubernetes platform is to create a flat network structure and simplify the cluster networking process to make it easy for end users. To achieve this, Kubernetes requires the network administrators to set the following network rules and conditions as fundamental requirements –
  • All the Pods should be able to communicate with one another without the need of Network Address Translation (NAT)
  • All the nodes should be able to communicate with all the pods without the need of NAT
  • The IP address of one pod is the same as what is seen by the other pod
These rules help to reduce the challenges in porting the applications from traditional VMs to containers and also reduces the network complexities.
Kubernetes Networking Problems and their Solutions
In order to take full advantage of Kubernetes, let’s take a look at the four distinct networking problems and their solutions-
  • Container-to-Container Networking
  • Pod-to-Pod Networking
  • Pod-to-Service Networking
  • Internet-to-Service Networking
Container-to-Container Networking
The highly-coupled container-to-container networking happens inside a pod. As explained previously, a pod is a group of containers within the Service with the same IP address and port number assigned by the Pod’s network namespace. The communication between two containers happens via a localhost connection since they reside inside the same namespace.
For example, if the network IP address is 216.233.22.33 and there are two containers A & B. In this case, both the containers will communicate through the IP address along with the specified port number (say, 50 and 75) that is assigned by the network namespace as shown in the below image.
Pod-to-Pod Networking
In this type, we will take a look at how two pods can communicate with one another across two different nodes. Every node has a designated and unique range of IP addresses (Classless Inter-Domain Routing (CIDR) block) for the pod IPs. Every pod has a dedicated IP address that is seen by the other pods and this IP address does not conflict with the pods on the other nodes.
To understand pod-to-pod networking, let’s take a scenario in which two pods reside on the same machine and share a common node. Every pod resides inside their own Ethernet namespace which then communicates with the other namespaces on the same node.
Linux offers a mechanism to connect namespaces across different nodes using the Virtual Ethernet Device (VED) or a veth pair. Each veth pair has two virtual network interfaces – VETH0 and VETH1 that can be spread across multiple namespaces. In order to connect the pods, one end of the veth pair is assigned to the root network namespace while the other is assigned to the pod’s network namespace. The VED acts as the intermediary to establish the connection between the root network namespace and the pod network namespace and share the data between them.
Pod-to-Pod Networking
Pod-to-Service Networking
The third networking challenge is the pod-to-service networking. In general, the IP addresses of pods are less durable and in case of an application crash or a machine reboot, the IP address will disappear and a new IP address has to be assigned. This results in the IP address changing without any notice. To overcome this problem, Kubernetes makes use of the concept of Services. Services help to keep a track of the IP addresses that keep changing over a period of time. In this case, even if the pod IP addresses associated with the service keep changing, there will not be any problem connecting to the pods as the client directly connects to the Service’s static virtual IP address.
Pod-to-Service Networking
Internet-to-Service Networking
The previous three challenges and solutions were related to routing the traffic within the Kubernetes cluster. What if you want to expose the application outside the cluster to the external internet? There are two techniques to address these challenges – Ingress and Egress.
Ingress is one of the most robust ways to expose the service to the external world (outside the Kubernetes cluster) and allow access to the Service. In simple terms, ingress is a set of rules that define which connections should reach the Service and which connections should be blocked. Ingress is similar to a load balancer and NodePort that filters request coming from outside the Kubernetes cluster to the service. With ingress, users can also consolidate all the different rules in a single place.
Egress is the process of routing the traffic from the node to outside the Kubernetes cluster. In order for the traffic in the cluster to be made available outside, you can attach an Internet Gateway to the Virtual Private Cloud (VPC).
However, since the IP address of the pod and the IP address of the node that hosts the pod is different, and the translation of IP addresses at the Internet Gateway only works for the VM IP addresses, there is no clue for the NAT on which pods are running in which VMs. As a result, Kubernetes solves this problem using iptables.
For example, a packet is getting transmitted from the pod to the external internet. If the source IP address is from the pod, the Internet Gateway will reject the input as it only understands the IP addresses connected to the VMs. In this case, the iptables will perform the NAT and changes the source of the packet so that it appears to the Internet Gateway that the packet is coming from the VM and not the pod. Then the Internet Gateway will perform another round of NAT (from internal to an external IP). And, the packet can finally reach the internet.

1 Comments

Previous Post Next Post