Kubernetes requires networks to follow these rules:
- All pods can communicate with each other without NAT
- All nodes can communicate with pods without NAT, in both directions
- The IP seen by a container is the same as the IP seen by external components
There are two types of network setup:
- Default k8s network,
- Or CNI with its plugins – most frequently used, and which will be the base of our comparison.
Default K8s network
The first solution to configure the network is to create a virtual bridge with an IP range, then add manually on each host a route between hosts. With Google or Amazon cloud solutions, manual configuration is possible, but when you don’t perfectly stick with it, it tends to be more difficult to manage the configurations.
Hereafter a configuration example for bridge creation with one node and one interface:
brctl addbr kubbr1
ip link add veth1kubbr1 type veth peer name kubbr
ip addr add 10.10.1.1/24 dev veth1kubbr1
ip link set veth1kubbr1 up
brctl addif kubbr1 veth1kubbr1
ip link set up kubbr1
CNI with a network plugin
The second solution to setup Kubernetes network is to use the Container Network Interface (CNI) and a network plugin. Basic configurations are automatically generated. Creation and administration of networks become easier.
Linux networking can be defined in two ways: underlay or overlay. Kubernetes makes it possible to use both.
- Underlay is defined at the physical level (switchs, routers…)
- Overlay is a virtual network, composed of vlan, veth (virtual interface) and VxLAN; it encapsulates the network trafic. Overlay is a bit slower than underlay, as it creates tunnels between hosts, which reduces available MTUs.
About Container Network Interface
CNI is a group of specifications and libraries (written in Go) that aim at helping network plugins integration. A CNI plugin must be executed by the container system management. It manages the interface setup (and its IP) in a namespace and its configuration with the host (bridge connection and routes management).
The network’s type configuration used is written in the /etc/cni/net.d/xx-mynet.conf file.
Network name, type (bridge, vlan, ipvlan, loopback, macvlan, ptp) are usually specified, as well as its IPam (dhcp, host-local) with a type, subnetwork and routes linked to it. Network plugins create their own types with the information given above.
Comparison of different CNI + plugin solutions on k8s
Numerous networking solutions compatible with Kubernetes are available. They all use or create a CNI plugin.
Three solutions are mainly used: Calico, Flannel and WeaveNet. Two others, Cilium and Contiv, provide interesting features too.
In this blog post, we are going to present different solutions and their operations with Kubernetes.
The deployment tests have benn done with Kubespray.
Calico is a network solution for Kubernetes which is described as a simple, scalable and secure solution.
It supports ipv4 and ipv6. It uses kube-proxy to manage filtering rules. Kube-proxy uses Linux iptables to create filtering rules on a network and isolate containers.
In more detail: Calico works in L2 mode by default. It is possible to configure it to use IpinIP (L3). IPinIP is a tunnelled IP, an IP packet encapsulates another IP packet and adds a header field “SourceIP ” which is the entry point of a tunnel and the field “Destination” which is used as an endpoint.
Calico offers two configurations for IPinIP:
- always: all the trafic is encapsulated
- crossSubnet: only the subnetwork trafic is encapsulated
Calico uses several components:
- Felix: Calico agent, installed on each node supplies end points (external interface), it shares ip table and routes between nodes.
- BIRD (bpg): client (and route reflector) used with confd for BGP. Border Gateway Protocol (BGP) is a routing protocol which shares its routing informations (tables) with autonomous systems routers. BIRD is a daemon that act like a dynamic router used by Calico with BGP. It is used to centralize routes distribution.
- confd: monitores etcd for BGP configuration changes. It automatically manages BIRD configurations.
- etcd or API Kubernetes: etcd is a key-value store.
Calico supports Kubernetes networkPolicies, its configuration is done in yaml in configuration file /etc/calico/calicoctl.cfg. The configuration is also possible using the
It is possible to create subnetworks or new Calico networks with the
calicoctl command line.
Here is an example of a network creation with Calico :
calicoctl create -f -<<EOF
- apiVersion: v1 # or projectcalico/v3
Calico networks and subnetworks can be represented as follow. In this example, Two subnetworks are created and stretched accross nodes, allowing pods on differents hosts to communicate on the same network.
Selecting a network can be done when configuring a pod, using annotations:
"cni.projectcalico.org/ipv4pools": "[\"10.112.12.0/24\"]" # or ipv6pools if ipv6 is used
Calico is installed as a DaemonSet with a container on each node and a “calico-controller” on the master. Each cluster node has 3 Calico components installed: Felix, BIRD, and confd; the master also needs a controller.
Documentation : https://docs.projectcalico.org/v3.1/usage/
Kernel ≥ 4.8
Cilium is a network solution for Kubernetes. It uses L3/L4 for the network part and L7 for the application part.
The L7 support allows adding high-level filter rules for web applications. It supports ipv4 and ipv6. Cilium is the only solution to offer BPF filtering.
BPF – Berkeley Packet Filter
BPF is a packet filter solution which can replace iptables. The filter isn’t performed at application level, but at the kernel level: it’s more efficient and secure.
Cilium uses BPF to create and apply filter rules on packets, no iptable rule is created. Filters are more effective and flexible.
Cilium works with the cilium-agent run on each node. Cilium-agent manages operations and filters to share with hosts. It compiles BPF filters and redirects it to the kernel’s host.
Cilium can work with overlay network (picture1, default) or with native routing (picture2). IPv4 and IPv6 can be used in both cases. As we can see in the picture below, a subnetwork is assigned on each node. Native routing is more complex to use.
Every outgoing local network packet is sent on routing kernel system which is used to redirect packets.
Security rules are managed for IPv6; concerning IPv4, you have to use CIDR rules. IP forwarding is enabled by Cilium, but redirecting rules must be managed manually or with the BGP routing protocol.
Cilium adds its own networkPolicies usable with
kubectl get ciliumnetworkpolicy (or cnp) to filter L7 packets like http or kafka.
Kubernetes networkPolicies are applied automatically. All these configurations can be written in yaml. The example below is a L7 filter rule made with CiliumNetworkPolicy to filter http and allow “/” path access only for pods with the ‘access’ label set to true.
description: "Allow HTTP GET / from app=web to access=true"
- matchLabels: # check labels
- port: "80"
- method: "GET"
A Kubernetes networkPolicy is visible with the
cilium policy get command. The file is in json format. Below, a networkPolicy example written in yaml applied with the Kubernetes command, and its recording in json format returned by the
cilium policy get command.
Several commands are available to access the Cilium data state. The list is available here http://cilium.readthedocs.io/en/latest/cheatsheet/. Some examples:
kubectl exec -n kube-system cilium-qpxvw
-- ls -al /sys/fs/bpf/tc/globals/ # list bpf entry
-- cilium bpf policy list -n 19898 # display policy with this number (iptables type display)
-- cilium status # give node status (--all-containers for all node)
-- cilium policy get # list all policies recieve by cilium (json format)
Cilium offers identity system and allows to give priority level thanks to tags used on pods.
Cilium operates with kube-proxy. But when ClusterIP (load balancing for pods traffic) is used, Cilium works as a proxy by adding and deleting BPF rules on each node. When it is used with Istio, it uses Envoy as a proxy.
Cilium is deployed as a DaemonSet on Kubernetes. It creates a bridge on each node. Theses bridges communicate with the pods veth.
Contiv is a network solution for Kubernetes distributed by Cisco and using the VxLAN and BGP protocols.
It supports IPv4 and IPv6. It offers Cisco ACI (Application Centric Infrastructure) integration as well, but Cisco offers a specific ACI network solution for Kubernetes. It is based on Open vSwitch for pipelines and uses etcd for key-values storage.
Contiv runs two components.
- netplugin runs on each node. It implements the CNI plugin, and also manages pod interface creations, IP allocations and so on.
- netmaster runs on the master nodes as a DaemonSet. It manages network requests and sends routes definitions to the netplugin components
The components use their host IP to communicate. A default contiv vlan is created to mesh all the cluster nodes.
Networks can be created with the
netctl command. The available networks are visible with the
netctl net ls command. IPs are defined from pools with Nw Type parameter set to data.
The network selection for pods is done in the yaml definition using labels:
Contiv operates with overlays to define subnetworks, shared on all the nodes of the cluster:
Contiv doesn’t support Kubernetes networkPolicies, but offers its own networkPolicies with its own api for filtering rules. It offers two rules types: Bandwitdth and Isolation.
The Bandwidth type can be used to limit bandwidth for a group. The Isolation type can be used to create whitelists or blacklists for an application in a group.
netctl command allows to do these configurations but rule creation is also possible with the web api offered by Contiv.
This interface allows to manage tenants, users (possible ldap authentification) and networks too.
Contiv is a certified Openshift plugin.
Contiv is deployed as a DaemonSet on Kubernetes. It can be deployed following the instructions provided on https://github.com/contiv/install#kubernetes-installation.
Documentation : http://contiv.github.io/documents/gettingStarted/.
Flannel can run using several encapsulation backends, VxLAN being the recommended one (others are more experimental). Only IPv4 is supported.
Flannel uses etcd to store its configuration and the managed networks information. It implements subnetworks on each host using a flanneld agent.
Flannel creates only one VxLAN. Every new node is attached to this VxLAN with a veth. Flannel doesn’t support running multiple networks on a single daemon, but it is possible to run multiple daemons on each host.
Flannel creates a cni0 bridge on each node and attaches veth interfaces to it. Each node manages a subnetwork of the Flannel pool. The communication is possible using VxLAN tunnels created by flanneld on each host.
It is possible to enable packet routing using VxLAN-GBP when many hosts are on the same network. VxLAN networks are then used only if the network is different.
Flannel manages ipv4 traffic only between cluster nodes. It focuses on network and does not support Kubernetes networkPolicies.
Flannel is deployed as a DaemonSet. Before deploying it, the following options need to be added in /etc/kubernetes/manifests/kube-controller-manager.manifest:
--allocate-node-cidrs=true --cluster-cidr=10.32.0.0/16. The kubelet service must be restarted. Installation is described here.
Documentation : https://github.com/coreos/flannel#flannel
kernel ≥ 3.8, docker ≥1.10.0, Kubernetes ≥ 1.4, master with at least 2 CPU.
Weave net provides VxLAN on layer 2 networking for Kubernetes. It uses kube-proxy and kube-dns. It supports IPv4 and IPv6.
Unlike other network solutions using etcd to stock data, WeaveNet saves its settings and data in a
/weavedb/weave-netdata.db file and shares it on each pod createy by the DaemonSet. Each pod owns a node physical interface IP address when it is created. Each pod has two containers: weave and weave-npc (Network Policy Controller).
Weave containers manages all Weave operations on the node. The weave-npc container manages the Kubernetes NetworkPolicies.
As we can see on the picture below, on each pod a “weave” bridge is created. Containers are connected to the host bridge using a virtual interface. Communication between hosts is encapsulated using VxLAN.
It is possible to define a weave password to encrypt network communication (documentation : configuration options). More informations about encryption put in Weave: https://github.com/weaveworks/weave/issues/3086It is possible to define more subnetworks. Subnetworks allocation are made with IPAM and can be done on different modes:
- seed: You should arrange a first cluster with fixed devices number. A subnetwork is assigned on each device. It is possible to add the device into the cluster, which can either do an integral part of the cluster (a new fixed device) or be integrated dynamically according to the requirement, the device can be deleted.
- consensus : Determines networks thanks to consensus algorithm (default choice). It is used to put weave in interactive mode or to put fix cluster where the addition or deletion devices are rare. This mode uses
weave primecommand who automatically put nodes IP allocation.
- observer : Offers to add nodes as observer role. It asks IP pool if needed (in overload node case) for a node to split its IP pool. It allows to dynamically add cluster nodes according to requirement (autoscaling).
Several installation modes are available:
- Each node runs WeaveNet directly on a host device.
- WeaveNet can be deployed directly with DaemonSet on each Kubernetes node (se this link).
Documentation : https://www.weave.works/docs/net/latest/kubernetes/kube-addon/
Network plugins are a real advantage and make network management easier. Each provides a different set of features.
For a POC or if we want to quickly setup the network, it is best to use Flannel or WeaveNet.
Calico, Contiv, and Cilium offer to use an underlay network (including BGP) directly, and avoid VxLAN encapsulation.
Several solutions (Calico, Contiv) offer to add many virtual networks for the whole cluster, pods can connect on a same network from different nodes.
Cilium is more security focused and offers application layer filtering. It uses BPF to filter at the kernel level. BPF filter offers better performances that iptables filters.
|Solution||Kubernetes networkPolicies||IPv6||Layers used||Networks||Deployment||Command ligne||Note|
|Calico||Yes||Yes||L3 (IPinIP, BGP)||Many networks on cluster||DaemonSet||
|Cilium||Yes + ciliumNetworkPolicies||Yes||L3/L4 + L7 (filtering)||Subnetworks by nodes||DaemonSet||
||Can replace kube-proxy (BPF filtering)|
|Contiv||No||Yes||L2 (VxLAN) / L3 (BGP)||Many networks on cluster||DaemonSet||
|Flannel||No||No||L2 (VxLAN)||Many networks on same cluster with multi daemon||DaemonSet||No|
|Weave net||Yes||Yes||L2 (VxLAN)||Subnetworks by nodes||DaemonSet||No||Encryption possible, don’t use etcd|