Red Hat OpenShift Networking

Networking in openshift




Summary
  • OpenShift uses a non-routable pod network to handle traffic in the cluster.
  • OpenShift SDN is a software-defined networking implementation using Open vSwitch (OVS) to create a scalable and highly configurable network for OpenShift traffic.
  • Network traffic is passed in and out of containers using multiple OVS interfaces configured together on each host.
  • Each container has a corresponding veth interface on the host that’s linked to the container interface using the Linux kernel.
  • OpenShift provides an internal DNS service to make interactions between pods easy to manage and scale.
  • OpenShift SDN’s default plugin provides a flat network topology that allows all pods to communicate with each other in the cluster.
  • You can change networking plugins to the multitenant plugin, which effectively isolates network communications at the project level for applications.
  • A running container is nothing more than a Linux process which is namespaced and constrained with regards to access (SELinux) and resource consumption (cgroups). In each namespace, there is a single (virtual) network interface called eth0 which is assigned an IP address chosen by the docker daemon (and a MAC address guaranteed to be conflict-free to match). Docker allocates these IP addresses from the RFC1918 private 172.17.0.0/16 IP range. All ingress and egress network packets of this container use this interface. In the container’s file system, the docker daemon overlays the files /etc/hostname, /etc/hosts and /etc/resolv.conf to ensure that network related services such as DNS behave as expected.
    This single virtual network interface does NOT mean that there cannot be dedicated NICs, e.g. for administrative purposes or to isolate storage traffic. Rather those NICs exist on the host: access to NAS/SAN storage from a container is performed via Linux filesystem and device API calls against the Linux kernel. From there, network packets are sent via the appropriate NIC. The same applies for administrative networks – here the administrator connects to the docker host, from where he/she applies administrative commands via the docker daemon.

Network flows on a plain docker host are therefore as follows. Most of these steps are not routing hops, but passing of a packet on between virtual layer2 devices on the same node. This means that the overhead is comparatively low:
  • Administrative traffic: admin network → Docker Host
  • SAN/NAS traffic: Container API call → Linux Kernel → storage network
  • Network traffic between Containers: ContainerA eth0 → vethXXX → docker0 bridge → vethYYYY → containerB eth0
  • Outbound from Container:  Container eth0 → vethXXX → docker0 bridge → IPTables NAT → host network
  • Inbound to Container (“host port”): Host network → IPTables DNAT → docker0 bridge → vethXXXX →  Container eth0

A practical example

The following listing shows the network interfaces of a plain docker host (1) loopback, (2) eth0 external, (3) eth1 administration and (4) the docker0 bridge:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[root@test-rhel7 ~]# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:11:0d:aa brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.22/24 brd 192.168.100.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe11:daa/64 scope link
       valid_lft forever preferred_lft forever
3: eth1:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:17:5d:ae brd ff:ff:ff:ff:ff:ff
    inet 192.168.103.22/24 brd 192.168.103.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe17:5dae/64 scope link
       valid_lft forever preferred_lft forever
4: docker0:  mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:a9:b8:55:0b brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
[root@test-rhel7 ~]#
A container’s namespace only knows (1) loopback and (7) eth0 as shown in the following listing. Notice that the eth0 has “@if8” appended, which indicates the number of the network device outside the docker namespace.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@test-rhel7 ~]# docker run -i -t centos /bin/sh
[...truncated...]
sh-4.2# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
7: eth0@if8:  mtu 1500 qdisc noqueue state UP
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.17.0.2/16 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe11:2/64 scope link
       valid_lft forever preferred_lft forever
sh-4.2#
In the hosts namespace, launching the container creates an additional interface veth268c73c@if7. The first part is randomly chosen, the second part “@if7” indicates that it connects directly to the eth0 interface in the container which had the number 7.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[root@test-rhel7 ~]# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:11:0d:aa brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.22/24 brd 192.168.100.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe11:daa/64 scope link
       valid_lft forever preferred_lft forever
3: eth1:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:17:5d:ae brd ff:ff:ff:ff:ff:ff
    inet 192.168.103.22/24 brd 192.168.103.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe17:5dae/64 scope link
       valid_lft forever preferred_lft forever
4: docker0:  mtu 1500 qdisc noqueue state UP
    link/ether 02:42:a9:b8:55:0b brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:a9ff:feb8:550b/64 scope link
       valid_lft forever preferred_lft forever
8: veth268c73c@if7:  mtu 1500 qdisc noqueue master docker0 state UP
    link/ether ae:c9:d3:7c:55:36 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::acc9:d3ff:fe7c:5536/64 scope link
       valid_lft forever preferred_lft forever
[root@test-rhel7 ~]#
That the container is actually connected to the docker0 bridge can be seen in the following listing:
1
2
3
[root@test-rhel7 ~]# brctl show docker0
bridge name bridge id       STP enabled interfaces
docker0     8000.0242a9b8550b   no      veth268c73c

Building Blocks of OpenShift


Concepts & Terminology

  • Containers
    • A container is an operating system level construct that allows for the running of isolated software systems within a single OS
  • Images
    • An image is a portable package containing all content, binaries, and configuration data that define a container instance
  • Pods
    • A pod is a wrapper around one or more containers which adds a networking & configuration layer for managing containers across hosts
  • Replication
    • Replication is the act of instantiating multiple copies of a pod definition in order to provide multiple instances of a runtime environment
    • Managed by a replication controler
    • Accomplished by re-instantiating a container image
  • Services
    • A service is a set of replicated pods.
  • Routes
    • A route is a load balancing mechanism used to expose services externally
  • Projects
    • A project is an isolation mechanism used to provide users ability to create resources while keeping them separate and secure from other OpenShift users
    • Wraps around Kubernetes namespace
  • Labels and Selectors
    • A label is a key-pair value tag that can be applied to most resources in OpenShift to give extra meaning or context to that resource for later filtering for selecting.
    • A selector is a parameter used by many resources in OpenShift to associate a resource with another resource by specifying a label.
    • Labels and Selectors can be used to:
      • Assign certain projects, services, pods to run on a certain set of nodes
      • Create regions, zones and other network topology constructs
      • Assign pods to services
      • Assign a router to a certain set of projects (perhaps internal/external apps)
      • Much more…​
  • Builds and Deployments
    • A build in OpenShift is the process by which application content, code, or packages are build into a container image (we call this an application image)
    • A deployment in OpenShift is the process of instantiating an application image to create running containers/pods running the applicationOpenshift on OpenStack Reference Arch Diagram

Overview

OpenShift Enterprise uses a software-defined networking (SDN) approach to provide a unified cluster network that enables communication between pods across the OpenShift Enterprise cluster. This pod network is established and maintained by the OpenShift Enterprise SDN, which configures an overlay network using Open vSwitch (OVS).
OpenShift Enterprise SDN provides two SDN plug-ins for configuring the pod network:
  • The ovs-subnet plug-in is the original plug-in which provides a "flat" pod network where every pod can communicate with every other pod and service.
  • The ovs-multitenant plug-in provides OpenShift Enterprise project level isolation for pods and services. Each project receives a unique Virtual Network ID (VNID) that identifies traffic from pods assigned to the project. Pods from different projects cannot send packets to or receive packets from pods and services of a different project.
    However, projects which receive VNID 0 are more privileged in that they are allowed to communicate with all other pods, and all other pods can communicate with them. In OpenShift Enterprise clusters, the default project has VNID 0. This facilitates certain services like the load balancer, etc. to communicate with all other pods in the cluster and vice versa.
Following is a detailed discussion of the design and operation of OpenShift Enterprise SDN, which may be useful for troubleshooting.

Information on configuring the SDN on masters and nodes is available in Configuring the SDN.

Design on Masters

On an OpenShift Enterprise master, OpenShift Enterprise SDN maintains a registry of nodes, stored in etcd. When the system administrator registers a node, OpenShift Enterprise SDN allocates an unused subnet from the cluster network and stores this subnet in the registry. When a node is deleted, OpenShift Enterprise SDN deletes the subnet from the registry and considers the subnet available to be allocated again.
In the default configuration, the cluster network is the 10.1.0.0/16 class B network, and nodes are allocated /24 subnets (i.e., 10.1.0.0/24, 10.1.1.0/24, 10.1.2.0/24, and so on). This means that the cluster network has 256 subnets available to assign to nodes, and a given node is allocated 254 addresses that it can assign to the containers running on it. The size and address range of the cluster network are configurable, as is the host subnet size.
Note that OpenShift Enterprise SDN on a master does not configure the local (master) host to have access to any cluster network. Consequently, a master host does not have access to pods via the cluster network, unless it is also running as a node.
When using the ovs-multitenant plug-in, the OpenShift Enterprise SDN master also watches for the creation and deletion of projects, and assigns VXLAN VNIDs to them, which will be used later by the nodes to isolate traffic correctly.

Design on Nodes

On a node, OpenShift Enterprise SDN first registers the local host with the SDN master in the aforementioned registry so that the master allocates a subnet to the node.
Next, OpenShift Enterprise SDN creates and configures six network devices:
  • br0, the OVS bridge device that pod containers will be attached to. OpenShift Enterprise SDN also configures a set of non-subnet-specific flow rules on this bridge. The ovs-multitenant plug-in does this immediately.
  • lbr0, a Linux bridge device, which is configured as Docker’s bridge and given the cluster subnet gateway address (eg, 10.1.x.1/24).
  • tun0, an OVS internal port (port 2 on br0). This also gets assigned the cluster subnet gateway address, and is used for external network access. OpenShift Enterprise SDN configures netfilter and routing rules to enable access from the cluster subnet to the external network via NAT.
  • vlinuxbr and vovsbr, two Linux peer virtual Ethernet interfaces. vlinuxbr is added to lbr0 and vovsbr is added to br0 (port 9 with the ovs-subnet plug-in and port 3 with the ovs-multitenant plug-in) to provide connectivity for containers created directly with Docker outside of OpenShift Enterprise.
  • vxlan0, the OVS VXLAN device (port 1 on br0), which provides access to containers on remote nodes.
Each time a pod is started on the host, OpenShift Enterprise SDN:
  1. moves the host side of the pod’s veth interface pair from the lbr0 bridge (where Docker placed it when starting the container) to the OVS bridge br0.
  2. adds OpenFlow rules to the OVS database to route traffic addressed to the new pod to the correct OVS port.
  3. in the case of the ovs-multitenant plug-in, adds OpenFlow rules to tag traffic coming from the pod with the pod’s VNID, and to allow traffic into the pod if the traffic’s VNID matches the pod’s VNID (or is the privileged VNID 0). Non-matching traffic is filtered out by a generic rule.
The pod is allocated an IP address in the cluster subnet by Docker itself because Docker is told to use the lbr0 bridge, which OpenShift Enterprise SDN has assigned the cluster gateway address (eg. 10.1.x.1/24). Note that the tun0 is also assigned the cluster gateway IP address because it is the default gateway for all traffic destined for external networks, but these two interfaces do not conflict because the lbr0 interface is only used for IPAM and no OpenShift Enterprise SDN pods are connected to it.
OpenShift SDN nodes also watch for subnet updates from the SDN master. When a new subnet is added, the node adds OpenFlow rules on br0 so that packets with a destination IP address the remote subnet go to vxlan0 (port 1 on br0) and thus out onto the network. The ovs-subnet plug-in sends all packets across the VXLAN with VNID 0, but the ovs-multitenant plug-in uses the appropriate VNID for the source container.

Packet Flow

Suppose you have two containers, A and B, where the peer virtual Ethernet device for container A’s eth0 is named vethA and the peer for container B’s eth0 is named vethB.

If Docker’s use of peer virtual Ethernet devices is not already familiar to you, review Docker’s advanced networking documentation.
Now suppose first that container A is on the local host and container B is also on the local host. Then the flow of packets from container A to container B is as follows:
eth0 (in A’s netns) → vethAbr0vethBeth0 (in B’s netns)
Next, suppose instead that container A is on the local host and container B is on a remote host on the cluster network. Then the flow of packets from container A to container B is as follows:
eth0 (in A’s netns) → vethAbr0vxlan0 → network [1]vxlan0br0vethBeth0 (in B’s netns)
Finally, if container A connects to an external host, the traffic looks like:
eth0 (in A’s netns) → vethAbr0tun0 → (NAT) → eth0 (physical device) → Internet
Almost all packet delivery decisions are performed with OpenFlow rules in the OVS bridge br0, which simplifies the plug-in network architecture and provides flexible routing. In the case of the ovs-multitenant plug-in, this also provides enforceable network isolation.

Network Isolation

You can use the ovs-multitenant plug-in to achieve network isolation. When a packet exits a pod assigned to a non-default project, the OVS bridge br0 tags that packet with the project’s assigned VNID. If the packet is directed to another IP address in the node’s cluster subnet, the OVS bridge only allows the packet to be delivered to the destination pod if the VNIDs match.
If a packet is received from another node via the VXLAN tunnel, the Tunnel ID is used as the VNID, and the OVS bridge only allows the packet to be delivered to a local pod if the tunnel ID matches the destination pod’s VNID.
Packets destined for other cluster subnets are tagged with their VNID and delivered to the VXLAN tunnel with a tunnel destination address of the node owning the cluster subnet.
As described before, VNID 0 is privileged in that traffic with any VNID is allowed to enter any pod assigned VNID 0, and traffic with VNID 0 is allowed to enter any pod. Only the default OpenShift Enterprise project is assigned VNID 0; all other projects are assigned unique, isolation-enabled VNIDs. Cluster administrators can optionally control the pod network for the project using the administrator CLI.

HOW HAPROXY ALWAYS DEPLOYS TO THE SAME NODE



  • From a container within a pod to another container within the same pod: Pod lo → Pod lo
  • Between pods on the same node: PodA eth0 → vethXXXX → (ovs) br0 → vethYYYY → PodB eth0
  • From a pod to a plain docker container on the same node: Pod eth0 → vethXXXX → ovs br0 → vovsbr → vlinuxbr → lbr0 → vethYYYY → Container eth0
  • From a plain docker container to a pod on the same node: Container eth0 → vethXXXX → lbr0 → vlinuxbr → vovsbr → br0 → vethYYYY → Pod eth0
  • Outbound from pod: Pod eth0 → vethXXXX → (ovs) br0 → tun0 → (IPTables NAT) → host network
  • There is still the capability to bind a container to a host port to allow inbound network traffic: host network → IPTables DNAT → tun0 → (ovs) br0 → vethXXXX → Pod eth0, but more common is to use the OpenShift Router to realize inbound traffic.

A practical example

The following (truncated) listing shows that an OpenShift node knows a number of additional network interfaces, among them (5) the ovs bridge device br0, (7) the docker bridge lbr0, (8) and (9) the vovsbr/vlinuxbr device pair and (10) the tun0 device. Note that (7) lbr0 and (10) tun0 share the same IP address.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
[root@openshift ~]# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:12:96:98 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0
       valid_lft 77432sec preferred_lft 77432sec
    inet6 fe80::a00:27ff:fe12:9698/64 scope link
       valid_lft forever preferred_lft forever
3: ovs-system:  mtu 1500 qdisc noop state DOWN
    link/ether 0e:49:a7:f2:e2:7b brd ff:ff:ff:ff:ff:ff
5: br0:  mtu 1450 qdisc noop state DOWN
    link/ether ae:5f:43:6d:37:4d brd ff:ff:ff:ff:ff:ff
7: lbr0:  mtu 1450 qdisc noqueue state UP
    link/ether 8a:8a:b7:6b:5b:50 brd ff:ff:ff:ff:ff:ff
    inet 10.1.0.1/24 scope global lbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::d8a0:74ff:fe19:d4be/64 scope link
       valid_lft forever preferred_lft forever
8: vovsbr@vlinuxbr:  mtu 1450 qdisc pfifo_fast master ovs-system state UP
    link/ether 76:49:cc:f0:08:42 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7449:ccff:fef0:842/64 scope link
       valid_lft forever preferred_lft forever
9: vlinuxbr@vovsbr:  mtu 1450 qdisc pfifo_fast master lbr0 state UP
    link/ether 8a:8a:b7:6b:5b:50 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::888a:b7ff:fe6b:5b50/64 scope link
       valid_lft forever preferred_lft forever
10: tun0:  mtu 1450 qdisc noqueue state UNKNOWN
    link/ether ca:8e:2f:26:4f:bf brd ff:ff:ff:ff:ff:ff
    inet 10.1.0.1/24 scope global tun0
       valid_lft forever preferred_lft forever
    inet6 fe80::c88e:2fff:fe26:4fbf/64 scope link
       valid_lft forever preferred_lft forever
12: vethd6edc06@if11:  mtu 1450 qdisc noqueue master ovs-system state UP
    link/ether 0e:04:d8:cd:3c:92 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::c04:d8ff:fecd:3c92/64 scope link
       valid_lft forever preferred_lft forever
[...truncated...]
Even though a number of pods (resulting in about 20 containers) are running on the node, no vethXXXX device is connected to the lbr0:
1
2
3
4
5
6
[root@openshift ~]# docker ps | wc -l
20
[root@openshift ~]# brctl show lbr0
bridge name bridge id STP enabled interfaces
lbr0 8000.8a8ab76b5b50 no vlinuxbr
[root@openshift ~]#
They are instead connected to the ovs bridge br0. Note that the number of devices is much lower than 20, since a pod hosts usually at least two containers (the pod controller and one or more workload containers).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[root@openshift ~]# ovs-vsctl show
7c0a94c8-63f5-4be9-bd31-fcd94a06cc47
Bridge "br0"
fail_mode: secure
Port vovsbr
Interface vovsbr
Port "veth1e54157"
Interface "veth1e54157"
Port "br0"
Interface "br0"
type: internal
Port "vethd43b712"
Interface "vethd43b712"
Port "veth175be20"
Interface "veth175be20"
Port "veth3516ca3"
Interface "veth3516ca3"
Port "vxlan0"
Interface "vxlan0"
type: vxlan
options: {key=flow, remote_ip=flow}
Port "veth6d98fc1"
Interface "veth6d98fc1"
Port "veth9a1ff7c"
Interface "veth9a1ff7c"
Port "vethb03e6a9"
Interface "vethb03e6a9"
Port "tun0"
Interface "tun0"
type: internal
Port "vethd6edc06"
Interface "vethd6edc06"
ovs_version: "2.4.0"
[root@openshift ~]#
Querying the interfaces from within a running pod, you can see it looks very much like the plain docker example. The service (the embedded docker registry) listens on a single port:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[root@openshift ~]# oc get pods
NAME READY STATUS RESTARTS AGE
docker-registry-1-3u208 1/1 Running 0 33d
image-registry-n2c5e 1/1 Running 0 33d
router-1-ms7i2 1/1 Running 0 33d
[root@openshift ~]# oc rsh image-registry-n2c5e
root@image-registry-n2c5e:/# ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
11: eth0@if12: mtu 1450 qdisc noqueue state UP group default
link/ether 02:42:0a:01:00:02 brd ff:ff:ff:ff:ff:ff
inet 10.1.0.2/24 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::42:aff:fe01:2/64 scope link
valid_lft forever preferred_lft forever
root@image-registry-n2c5e:/# netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 *:5000 *:* LISTEN
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node Path
root@image-registry-n2c5e:/#
The OpenShift pod can easily be accessed from a plain docker container running on the same host:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[root@openshift ~]# docker run -i -t centos /bin/sh
[...truncated...]
sh-4.2# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
29: eth0@if30:  mtu 1450 qdisc noqueue state UP
    link/ether 02:42:0a:01:00:0b brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.1.0.11/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe01:b/64 scope link
       valid_lft forever preferred_lft forever
sh-4.2# ping -w 3 10.1.0.2
PING 10.1.0.2 (10.1.0.2) 56(84) bytes of data.
64 bytes from 10.1.0.2: icmp_seq=1 ttl=64 time=0.078 ms
64 bytes from 10.1.0.2: icmp_seq=2 ttl=64 time=0.091 ms
64 bytes from 10.1.0.2: icmp_seq=3 ttl=64 time=0.094 ms
64 bytes from 10.1.0.2: icmp_seq=4 ttl=64 time=0.091 ms
--- 10.1.0.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.078/0.088/0.094/0.011 ms
sh-4.2#
And vice versa:
1
2
3
4
5
6
7
8
9
10
root@image-registry-n2c5e:/# ping -w 3 10.1.0.11
PING 10.1.0.11 (10.1.0.11) 56(84) bytes of data.
64 bytes from 10.1.0.11: icmp_seq=1 ttl=64 time=0.276 ms
64 bytes from 10.1.0.11: icmp_seq=2 ttl=64 time=0.051 ms
64 bytes from 10.1.0.11: icmp_seq=3 ttl=64 time=0.073 ms
--- 10.1.0.11 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.051/0.133/0.276/0.101 ms
root@image-registry-n2c5e:/#

While this specific container remains attached to the lbr0 device:
1
2
3
4
5
[root@openshift ~]# brctl show lbr0
bridge name bridge id       STP enabled interfaces
lbr0        8000.866a139a1945   no      veth3bb4c4b
                            vlinuxbr
[root@openshift ~]#

Inter-Node networking therefore adds the following flow:
Between pods on different nodes: PodA eth0 → vethXXXX → (ovs) br0 → vxlan0 (L3 encapsulation) → (tunnel via host network) → vxlan0 (L3 deencapsulation) → br0 → vethYYYY → Pod eth0

OpenShift services add the following flows (using a single node example for simplicity):
  • From a Pod to another Pod (on the same node) via Service / Usermode : PodA eth0 → vethXXXX → (ovs) br0 → tun0 → IPTables NAT → kube-proxy → tun0 → (ovs) br0 → vethYYYY → PodB eth0
  • From a Pod to another Pod (on the same node) via Service / IPTables : PodA eth0 → vethXXXX → (ovs) br0 → tun0 → IPTables NAT → tun0 → (ovs) br0 → vethYYYY → PodB eth0
From external client to OpenShift Pod: external client → network → OpenShift Node eth0 → (IPTables DNAT) → tun0 → ovs br0 → vethXXXX → Pod A eth0 → (userspace router) → Pod A eth0 → vethXXXX → ovs br0 → vethYYYY → Pod B eth0

OpenShift Software Defined Network (SDN) concept

Basically anyone with OpenShift background might be interested in this post.

SDN encompasses several types of technologies like including functional separation, network virtualization and automation which focus on enabling the network control to become directly programmable. OpenShift uses SDN approach to provide an architecture that enables communication between pods across the OpenShift Container Platform cluster. SDN comprises three layers;

· Application layer

· Control layer

· Infrastructure layer

These three SDNs’ layers communicate using Northbound and Southbound APIs between them.


clip_image001



Control Layer:

· Centralized view of the network.

· How packets should flow through the network

Application Layer:

· Network applications or functions organizations use (Firewall, Load Balance etc.)

Infrastructure Devices:

· Physical device in network. (Switch devices etc.)

OpenShift SDN plug-ins for configuring the pod Network:

· ovs-subnet: Provide a “flat” pod network for communication between pods.

· ovs-multitenant: Provide project level isolation between pods and services. Each project has a unique Virtual Network ID that identifies pods and service level rights.

· ovs-networkpolicy: Create and configure own network isolation policies.

OpenShift SDN Networks

In the default configuration, the cluster network is the 10.128.0.0/14 network and nodes are allocated /23 subnets. OpenShift needs to have two different network CIDRs which are the Pod Network, and Services Network.

Pod Network variable can be defined with “osm_cluster_network_cidr” ansible host file variable. This variable will determine the maximum number of Pod IPs for the cluster. Default (10.128.0.0/14) value will provide 262,142 IPs for the cluster. If you need to configure Pod network, follow official  for more detail.

Service Network variable can be defined with “openshift_portal_net” ansible host file variable. Default (172.30.0.0/16) value will provide 65,534 IPs for the cluster.

Each OpenShift node has its own subnet which allows to Pods to get an IP in that range. This variable configures number of bits to allocate to each host’s subnet.

OpenShift SDN Network Devices

· Bridge Network Device(br0): Bridge network interface that the pod containers will be attached to.

· Tunnel Interface: Tun0 is used for external network access. It is located port2 on br0. OpenShift SDN uses “netfilter” and routing rules to enable access to the external network via NAT.

· Vxlan: Vxlan provides access to container on remote nodes.

Basic Package Flow for OpenShift SDN

· Container A and B are located at same host









clip_image002



· Container A and B are working on same cluster but remote host.





clip_image003



· Container A connects to the external host







clip_image004

Post a Comment

Previous Post Next Post