openstack L3-GRE 网络结构分析记录 (Icehouse) 第五篇(多外部网络)

在这样的一个环境中,所有计算节点与互联网之间的通信都通过一个网络节点,且网络节点只有一个互联网线路,在实际中,往往可能涉及到多个运营商线路的接入,那么当一个网络节点需要同时与多个外部网络进行连接的话,该怎么办。

openstack L3-GRE 网络结构分析记录 (Icehouse) 第二篇

在上一篇文章中分析了单租户在不同网络情形下的计算节点的网络结构,但未有描述当数据包到达网络节点时候,数据包是如何走向公网的,网络的结构如下:

P6SU57JE1H7Y86U4U

~XU_KGP9}3W$@DSEMN`GG32

通过前一篇文章我们知道,对于该租户的网络来说,相当于计算节点中的虚机实例通过GRE的隧道与网络节点的虚拟路由器打通了二层,逻辑结构就像这样:

CRQU2S{QXGBIYPCDQZ3{7_D

那么这个二层究竟是如何相互联系起来的。

看下网络节点的ovs:

QQ图片20141225171859

当数据包通过GRE tunnel到达br-tun后,br-tun中的 patch-int port与 br-int中的 patch-tun port互联,数据包被送到br-int桥里,由于tunnel的ID和br-int的vlan是关联起来的,因此数据包会被正确的tagged并送到正确的vlan segment中去,而每个对应的vlan都会有一个qr**接口与路由器连接起来,因此数据包可以到达路由器接口,该接口其实也就是该网络的gateway。随后虚拟路由器执行数据转发将数据路由到qg-***接口(路由器设置了其缺省路由指向外部实际网络的网关),该qg***接口在br-ex桥中与实际的物理网卡eth2进行桥接,从而数据包送出。

EUV_))()Y48RY)Q~QZ)5Y}W

可以通过查看路由器的接口来对应上述各个接口:

那么这个虚拟路由器是如何转发数据包的:

 

当数据包通过路由器去往外部网络时,虚拟路由器的的iptables会执行相关NAT操作:

从上表可以看出,系统还将 192.168.0.3和10.10.20.7做了一对一的static映射,因此当10.10.20.7机器的数据经过路由后将执行snat为192.168.0.3地址,而如果从外部直接访问192.168.0.3则直接可以访问到10.10.20.7,从而实现了从外部网络访问内部虚机。

QQ图片20141225190327

TDI`KHEFFSPZ8U9Z~S]1Z_V

在Horizon界面的access&security中增加一条入站,目的端口为22的tcp规则

FMR3_B8KDV7]6QDUJK4JHB7

阅读更多

openstack冗余HA架构设计

Overview

Before you install any hardware or software, you must know what you’re trying to achieve. This section looks at the basic components of an OpenStack infrastructure and organizes them into one of the more common reference architectures. You’ll then use that architecture as a basis for installing OpenStack in the next section.

As you know, OpenStack provides the following basic services:

Compute:
Compute servers are the workhorses of your installation; they’re the servers on which your users’ virtual machines are created. nova-compute controls the life cycle of these VMs.
Networking:

Typically, an OpenStack environment includes multiple servers that need to communicate to each other and to outside world. Fuel supports both old nova-network and new neutron based OpenStack Networking implementations:

  • With nova-network, Flat-DHCP and VLAN modes are available.
  • With neutron, GRE tunnels or VLANs can be used for network segmentation.
Storage:

OpenStack requires block and object storage to be provisioned. Fuel provides the following storage options out of the box:

  • Cinder LVM provides persistent block storage to virtual machines over iSCSI protocol
  • Swift object store can be used by Glance to store VM images and snapshots, it may also be used directly by applications
  • Ceph combines object and block storage and can replace either one or both of the above.

Compute, Networking, and Storage services can be combined in many different ways. Out of the box, Fuel supports the following deployment configurations:

Multi-node Deployment

In a production environment, you will not likely ever have a Multi-node deployment of OpenStack, partly because it forces you to make a number of compromises as to the number and types of services that you can deploy. It is, however, extremely useful if you just want to see how OpenStack works from a user’s point of view.

More commonly, your OpenStack installation will consist of multiple servers. Exactly how many is up to you, of course, but the main idea is that your controller(s) are separate from your compute servers, on which your users’ VMs will actually run. One arrangement that will enable you to achieve this separation while still keeping your hardware investment relatively modest is to house your storage on your controller nodes.

Multi-node with HA Deployment

Production environments typically require high availability, which involves several architectural requirements. Specifically, you will need at least three controllers, and certain components will be deployed in multiple locations to prevent single points of failure. That’s not to say, however, that you can’t reduce hardware requirements by combining your storage, network, and controller nodes:

We’ll take a closer look at the details of this deployment configuration in Details of Multi-node with HA Deployment section.

Details of Multi-node with HA Deployment

OpenStack services are interconnected by RESTful HTTP-based APIs and AMQP-based RPC messages. So redundancy for stateless OpenStack API services is implemented through the combination of Virtual IP (VIP) management using Pacemaker and load balancing using HAProxy. Stateful OpenStack components, such as the state database and messaging server, rely on their respective active/active and active/passive modes for high availability. For example, RabbitMQ uses built-in clustering capabilities, while the database uses MySQL/Galera replication.

Lets take a closer look at what an OpenStack deployment looks like, and what it will take to achieve high availability for an OpenStack deployment.

Red Hat OpenStack Architectures

Red Hat has partnered with Mirantis to offer an end-to-end supported distribution of OpenStack powered by Fuel. Because Red Hat offers support for a subset of all available open source packages, the reference architecture has been slightly modified to meet Red Hat’s support requirements to provide a highly available OpenStack environment.

Below is the list of modifications:

Database backend:
MySQL with Galera has been replaced with native replication in a Master/Slave configuration. MySQL master is elected via Corosync and master and slave status is managed via Pacemaker.
Messaging backend:
RabbitMQ has been replaced with QPID. Qpid is an AMQP provider that Red Hat offers, but it cannot be clustered in Red Hat’s offering. As a result, Fuel configures three non-clustered, independent QPID brokers. Fuel still offers HA for messaging backend via virtual IP management provided by Corosync.
Nova networking:
Neutron (Quantum) is not available for Red Hat OpenStack because the Red Hat kernel lacks GRE tunneling support for OpenVSwitch. This issue should be fixed in a future release. As a result, Fuel for Red Hat OpenStack Platform will only support Nova networking.

Multi-node Red Hat OpenStack Deployment

In a production environment, it is not likely you will ever have a Multi-node deployment of OpenStack, partly because it forces you to make a number of compromises as to the number and types of services that you can deploy. It is, however, extremely useful if you just want to see how OpenStack works from a user’s point of view.

More commonly, your OpenStack installation will consist of multiple servers. Exactly how many is up to you, of course, but the main idea is that your controller(s) are separate from your compute servers, on which your users’ VMs will actually run. One arrangement that will enable you to achieve this separation while still keeping your hardware investment relatively modest is to house your storage on your controller nodes.

Multi-node with HA Red Hat OpenStack Deployment

Production environments typically require high availability, which involves several architectural requirements. Specifically, you will need at least three controllers, and certain components will be deployed in multiple locations to prevent single points of failure. That’s not to say, however, that you can’t reduce hardware requirements by combining your storage, network, and controller nodes:

OpenStack services are interconnected by RESTful HTTP-based APIs and AMQP-based RPC messages. So redundancy for stateless OpenStack API services is implemented through the combination of Virtual IP (VIP) management using Corosync and load balancing using HAProxy. Stateful OpenStack components, such as the state database and messaging server, rely on their respective active/passive modes for high availability. For example, MySQL uses built-in replication capabilities (plus the help of Pacemaker), while QPID is offered in three independent brokers with virtual IP management to provide high availability.

HA Logical Setup

An OpenStack Multi-node HA environment involves three types of nodes: controller nodes, compute nodes, and storage nodes.

Controller Nodes

The first order of business in achieving high availability (HA) is redundancy, so the first step is to provide multiple controller nodes.

As you may recall, the database uses Galera to achieve HA, and Galera is a quorum-based system. That means that you should have at least 3 controller nodes.

Every OpenStack controller runs HAProxy, which manages a single External Virtual IP (VIP) for all controller nodes and provides HTTP and TCP load balancing of requests going to OpenStack API services, RabbitMQ, and MySQL.

When an end user accesses the OpenStack cloud using Horizon or makes a request to the REST API for services such as nova-api, glance-api, keystone-api, quantum-api, nova-scheduler, MySQL or RabbitMQ, the request goes to the live controller node currently holding the External VIP, and the connection gets terminated by HAProxy. When the next request comes in, HAProxy handles it, and may send it to the original controller or another in the environment, depending on load conditions.

Each of the services housed on the controller nodes has its own mechanism for achieving HA:

  • nova-api, glance-api, keystone-api, quantum-api and nova-scheduler are stateless services that do not require any special attention besides load balancing.
  • Horizon, as a typical web application, requires sticky sessions to be enabled at the load balancer.
  • RabbitMQ provides active/active high availability using mirrored queues.
  • MySQL high availability is achieved through Galera active/active multi-master deployment and Pacemaker.
  • Quantum agents are managed by Pacemaker.
  • Ceph monitors implement their own quorum based HA mechanism and require time synchronization between all nodes. Clock drift higher than 50ms may break the quorum or even crash the Ceph service.

Compute Nodes

OpenStack compute nodes are, in many ways, the foundation of your environment; they are the servers on which your users will create their Virtual Machines (VMs) and host their applications. Compute nodes need to talk to controller nodes and reach out to essential services such as RabbitMQ and MySQL. They use the same approach that provides redundancy to the end-users of Horizon and REST APIs, reaching out to controller nodes using the VIP and going through HAProxy.

Neutron Networking: Neutron Routers and the L3 Agent

Neutron L3 Agent / What is it and how does it work?

Neutron has an API extension to allow administrators and tenants to create “routers” that connect to L2 networks. Known as the “neutron-l3-agent”, it uses the Linux IP stack and iptables to perform L3 forwarding and NAT. In order to support multiple routers with potentially overlapping IP addresses, neutron-l3-agent defaults to using Linux network namespaces to provide isolated forwarding contexts. Like the DHCP namespaces that exist for every network defined in Neutron, each router will have its own namespace with a name based on its UUID.

Network Design / Implementing Neutron Routers

While deploying instances using provider networks is suitable in many cases, there is a limit to the scalability of such environments. Multiple flat networks require corresponding bridge interfaces, and using VLANs may require manual switch and gateway configuration. All routing is handled by an upstream routing device such as a router or firewall, and said device may also be responsible for NAT as well. Any benefits are quickly outweighed by manual control and configuration processes.

Using the neutron-l3-agent allows admins and tenants to create routers that handle routing between directly-connected LAN interfaces (usually tenant networks, GRE or VLAN) and a single WAN interface (usually a FLAT or VLAN provider network). While it is possible to leverage a single bridge for this purpose (as is often the documented solution), the ability to use already-existing provider networks is my preferred solution.

Neutron L3 Agent / Floating IPs

One of the limitations of strictly using provider networks to provide connectivity to instances is that a method to provide public connectivity directly to instances must be handled by a device outside of Neutron’s control. In the previous walkthoughs, the Cisco ASA in the environment handled both 1:1 static NAT and a many-to-one PAT for outbound connectivity.

In nova-networking, the concept of a “floating ip” is best understood as a 1:1 NAT translation that could be modified on-the-fly and “float” between instances. The IP address used as the floating IP was an address in the same L2 domain as the bridge of the hypervisors (or something routed to the hypervisor.) Assuming multi_host was true, an iptables SNAT/DNAT rule was created on the hypervisor hosting the instance. If the user wanted to reassociate the floating IP with another instance, the rule was removed and reapplied on the appropriate hypervisor using the same floating IP and the newly-associated instance address. Instance IPs never changed – only NAT rules.

Neutron’s implementation of floating IPs differs greatly from nova-networks, but retains many of the same concepts and functionality. Neutron routers are created that serve as the gateway for instances and are scheduled to a node running neutron-l3-agent. Rather than manipulating iptables on the hypervisors themselves, iptables in the router namespace is modified to perform the appropriate NAT translations. The floating IPs themselves are procured from the provider network that is providing the router with its public connectivity. This means floating IPs are limited to the same L3 network as the router’s WAN IP address.

A logical representation of this concept can be seen below:

While logically it appears that floating IPs are associated directly with instances, in reality a floating IP is associated with a Neutron port. Other port associations include:

  • security groups
  • fixed ips
  • mac addresses

A port is associated with the instance indicated by the “device_id” field of the “port-show” command:

Networking / Layout

For this installment, a Cisco ASA 5510 will once again serve as the lead gateway device. In fact, I’ll be building upon the configuration already in place from the flat and/or VLAN networking demonstration in the previous installments:

10.240.0.0/24 will continue to serve as the management network for hosts. A single VLAN network will be created to demonstrate the ability to use either a flat or VLAN network.

  • VLAN 10 – MGMT – 10.240.0.0/24
  • VLAN 20 – GATEWAY_NET – 192.168.100.0/22

A single interface on the servers will be used for both management and provider network connectivity. Neutron works with Open vSwitch to build peer-to-peer tunnels between hosts that serve to carry encapsulated tenant network traffic.

Networking / L3 Agent Configuration

阅读更多

Neutron Networking: VLAN Provider Networks

In this multi-part blog series I intend to dive into the various components of the OpenStack Neutron project, and to also provide working examples of networking configurations for clouds built with Rackspace Private Cloud powered by OpenStackon Ubuntu 12.04 LTS.

In the previous installment, Neutron Networking: Simple Flat Network, I demonstrated an easy method of providing connectivity to instances using an untagged flat network. In this third installment, I’ll describe how to build multiple provider networks using 802.1q vlan tagging.<!– more –>

Getting Started / VLAN vs Flat Design

One of the negative aspects of a flat network is that it’s one large broadcast domain. Virtual Local Area Networks, or VLANs, aim to solve this problem by creating smaller, more manageable broadcast domains. From a security standpoint, flat networks provide malicious users the potential to see the entire network from a single host.

VLAN segregation is often used in a web hosting environment where there’s one vlan for web servers (DMZ) and another for database servers (INSIDE). Neither network can communicate directly without a routing device to route between them. With proper security mechanisms in place, if a server becomes compromised in the DMZ it does not have the ability to determine or access the resources in the INSIDE vlan.

The diagrams below are examples of traditional flat and vlan-segregated networks:

VLAN Tagging / What is it and how does it work?

At a basic level on a Cisco switch there are two types of switchports: access ports and trunk ports. Switchports configured as access ports are placed into a single vlan and can communicate with other switchports in the same vlan. Switchports configured as trunks allow traffic from multiple vlans to traverse a single interface. The switch adds a tag to the Ethernet frame that contains the corresponding vlan ID as the frame enters the trunk. As the frame exits the trunk on the other side, the vlan tag is stripped and the traffic forwarded to its destination. Common uses of trunk ports include uplinks to other switches and more importantly in our case, hypervisors serving virtual machines from various networks.

VLAN Tagging / How does this apply to Neutron?

In the previous installment I discussed flat networks and their lack of vlan tagging. All hosts in the environment were connected to access ports in the same vlan, thereby allowing hosts and instances to communicate with one another on the same network. VLANs allow us to not only separate host and instance traffic, but to also create multiple networks for instances similar to the DMZ and INSIDE scenarios above.

Neutron allows users to create multiple provider or tenant networks using vlan IDs that correspond to real vlans in the data center. A single OVS bridge can be utilized by multiple provider and tenant networks using different vlan IDs, allowing instances to communicate with other instances across the environment, and also with dedicated servers, firewalls, load balancers and other networking gear on the same Layer 2 vlan.

Networking / Layout

For this installment, a Cisco ASA 5510 will once again serve as the lead gateway device. In fact, I’ll be building upon the configuration already in place from the flat networking demonstration in the previous installment. 10.240.0.0/24 will continue to serve as the management network for hosts and the flat provider network, and two new provider networks will be created:

  • VLAN 100 – MGMT – 10.240.0.0/24 (Existing)
  • VLAN 200 – DMZ – 192.168.100.0/24 (NEW)
  • VLAN 300 – INSIDE – 172.16.0.0/24 (NEW)

A single interface on the servers will be used for both management and provider network connectivity.

Networking / Configuration of Network Devices

阅读更多