docker基础知识之network namespace

network namespace用来隔离Linux系统的网络设备、ip地址、端口号、路由表、防火墙等网络资源。用户可以随意将虚拟网络设备分配到自定义的networknamespace里,而连接真实硬件的物理设备则只能放在系统的根networknamesapce中。

一个物理的网络设备最多存在于一个network namespace,可以通过创建veth pair在不同的network namespace之间创建通道,来达到通讯的目的。

容器的bridge模式的实现思路为创建一个veth pair,一端放置在新的namespace,通常命名为eth0,另外一端放在原先的namespace中连接物理网络设备,以此实现网络通信。

docker daemon负责在宿主机上创建veth pair,把一端绑定到docker0网桥,另一端到新建的network namespace进程中。建立的过程中,docker daemon和dockerinit通过pipe进行通讯。

一、测试例子

测试network namespace的过程比较复杂。

docker默认采用的为bridge模式,在容器所在的宿主机上看到的网卡情况如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
[root@localhost software]# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 08:00:27:6c:3e:95 brd ff:ff:ff:ff:ff:ff
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 08:00:27:a5:78:ca brd ff:ff:ff:ff:ff:ff
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:a3:75:00:16 brd ff:ff:ff:ff:ff:ff
18: veth71f2650@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
    link/ether ca:05:f7:db:6f:4c brd ff:ff:ff:ff:ff:ff link-netnsid 0

其中的enp0s3和enp0s8可以忽略,为虚拟机使用的网卡。docker0和veth71f2650@if17是需要关注的网卡。

1
2
3
[root@localhost software]# brctl show
bridge name	bridge id		STP enabled	interfaces
docker0		8000.0242a3750016	no		veth71f2650

下面的操作为在已经运行docker的虚拟机上的,以便于跟docker进行比较。

以下命令根据coolshell中的步骤进行配置,并对执行命令的顺序进行了调整。

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# 增加network namespace ns1
[root@localhost software]# ip netns add ns1
[root@localhost software]# ip netns
ns1

# 激活namespace ns1中的lo设备
[root@localhost software]# ip netns exec ns1   ip link set dev lo up

# 创建veth pair
[root@localhost software]# ip link add veth-ns1 type veth peer name lxcbr0.1
# 多出了lxcbr0.1@veth-ns1和veth-ns1@lxcbr0.1两个设备
# 后面的操作步骤中将lxcbr0.1位于主网络命名空间中,veth-ns1位于ns1命名空间中
[root@localhost software]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 08:00:27:6c:3e:95 brd ff:ff:ff:ff:ff:ff
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 08:00:27:a5:78:ca brd ff:ff:ff:ff:ff:ff
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:a3:75:00:16 brd ff:ff:ff:ff:ff:ff
18: veth71f2650@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
    link/ether ca:05:f7:db:6f:4c brd ff:ff:ff:ff:ff:ff link-netnsid 0
19: lxcbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether c6:b7:4d:7f:f8:90 brd ff:ff:ff:ff:ff:ff
20: lxcbr0.1@veth-ns1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether c6:8a:26:3d:ba:de brd ff:ff:ff:ff:ff:ff
21: veth-ns1@lxcbr0.1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether f2:03:22:93:d6:f4 brd ff:ff:ff:ff:ff:ff

# 将设备veth-ns1放入到ns1命名空间中
[root@localhost software]# ip link set veth-ns1 netns ns1
# 可以看到veth-ns1设备在当前命名空间消失了
[root@localhost software]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 08:00:27:6c:3e:95 brd ff:ff:ff:ff:ff:ff
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 08:00:27:a5:78:ca brd ff:ff:ff:ff:ff:ff
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:a3:75:00:16 brd ff:ff:ff:ff:ff:ff
18: veth71f2650@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
    link/ether ca:05:f7:db:6f:4c brd ff:ff:ff:ff:ff:ff link-netnsid 0
19: lxcbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether c6:b7:4d:7f:f8:90 brd ff:ff:ff:ff:ff:ff
20: lxcbr0.1@if21: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether c6:8a:26:3d:ba:de brd ff:ff:ff:ff:ff:ff link-netnsid 1
# 同时在命名空间ns1中看到了设备veth-ns1,同时可以看到veth-ns1设备的状态为DOWN
[root@localhost software]# ip netns exec ns1  ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
21: veth-ns1@if20: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether f2:03:22:93:d6:f4 brd ff:ff:ff:ff:ff:ff link-netnsid 0

# 将ns1中的veth-ns1设备更名为eth0
[root@localhost software]# ip netns exec ns1  ip link set dev veth-ns1 name eth0
[root@localhost software]# ip netns exec ns1  ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
21: eth0@if20: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether f2:03:22:93:d6:f4 brd ff:ff:ff:ff:ff:ff link-netnsid 0

# 为容器中的网卡分配一个IP地址,并激活它
[root@localhost software]# ip netns exec ns1 ifconfig eth0 192.168.10.11/24 up
# 可以看到eth0网卡上有ip地址
[root@localhost software]# ip netns exec ns1  ifconfig
eth0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 192.168.10.11  netmask 255.255.255.0  broadcast 192.168.10.255
        ether f2:03:22:93:d6:f4  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# 添加一个网桥lxcbr0,类似于docker中的docker0
[root@localhost software]# brctl addbr lxcbr0
[root@localhost software]# brctl show
bridge name	bridge id		STP enabled	interfaces
docker0		8000.0242a3750016	no		veth71f2650
lxcbr0		8000.000000000000	no

# 关闭生成树协议,默认该协议为关闭状态
[root@localhost software]# brctl stp lxcbr0 off
[root@localhost software]# brctl show
bridge name	bridge id		STP enabled	interfaces
docker0		8000.0242a3750016	no		veth71f2650
lxcbr0		8000.000000000000	no

# 为网桥配置ip地址
ifconfig lxcbr0 192.168.10.1/24 up
[root@localhost software]# ifconfig lxcbr0
lxcbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.10.1  netmask 255.255.255.0  broadcast 192.168.10.255
        inet6 fe80::c4b7:4dff:fe7f:f890  prefixlen 64  scopeid 0x20<link>
        ether c6:b7:4d:7f:f8:90  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8  bytes 648 (648.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# 将veth设备中的其中一个lxcbr0.1添加到网桥lxcbr0上
[root@localhost software]# brctl addif lxcbr0 lxcbr0.1
# 可以看到网桥lxcbr0中已经包含了设备lxcbr0.1
[root@localhost software]# brctl show
bridge name	bridge id		STP enabled	interfaces
docker0		8000.0242a3750016	no		veth71f2650
lxcbr0		8000.c68a263dbade	no		lxcbr0.1

# 为网络空间ns1增加默认路由规则,出口为网桥ip地址
[root@localhost software]# ip netns exec ns1     ip route add default via 192.168.10.1
[root@localhost software]# ip netns exec ns1 ip route
default via 192.168.10.1 dev eth0
192.168.10.0/24 dev eth0 proto kernel scope link src 192.168.10.11

# 为ns1增加resolv.conf
[root@localhost software]# mkdir -p /etc/netns/ns1
[root@localhost software]# echo "nameserver 8.8.8.8" > /etc/netns/ns1/resolv.conf

二、常用命令

1. 列出当前的network namespace

1.1 使用lsns命令

lsns命令通过读取/proc/${pid}/ns目录下进程所属的命名空间来实现,如果是通过ip netns add场景的命名空间,但是没有使用该命名空间的进程,该命令是看不到的。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# lsns -t net
        NS TYPE NPROCS   PID USER  COMMAND
4026531956 net     383     1 root  /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026532490 net       1  1026 rtkit /usr/libexec/rtkit-daemon
4026532762 net       2 24872 root  /pause
4026532866 net      20 25817 root  /pause
4026532965 net       3 30763 root  /pause
4026533059 net       3  2794 root  /bin/sh -c python /usr/src/app/clean.py "${endpoints}" "${expire}"
4026533163 net       2  1122 102   /docker-java-home/jre/bin/java -Xms2g -Xmx2g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan
4026533266 net       4 13920 root  /pause
4026533371 net       2  1844 root  /pause
4026533559 net       3  1067 root  sleep 4

1.2 通过ip netns命令

该命令仅会列出有名字的namespace,对于未命名的不能显示。

  • ip netns identify ${pid} 可以找到进程所属的网络命名空间
  • ip netns list: 显示所有有名字的namespace

2. 通过pid进入具体的network namespace

2.1 通过nsenter命令

nsenter --target $PID --net可以进入到对应的命名空间

2.2 docker --net参数

docker提供了--net参数用于加入另一个容器的网络命名空间docker run -it --net container:7835490487c1 busybox ifconfig

2.3 setns系统调用

一个进程可以通过setns()系统调用来进入到另外一个namespace中。

编写setns.c程序,该程序会进入到进程id所在的网络命令空间,并使用gcc setns.c -o setns进行编译,编译完成后执行./setns /proc/4913/ns/net ifconfig可以看到网卡的信息为容器中的网卡信息。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
  int fd = open(argv[1], O_RDONLY);
  if (setns(fd, 0) == -1) {
    perror("setns");
    exit(-1);
  }
  execvp(argv[2], &argv[2]);
  printf("execvp exit\n");
}

如果执行./setns /proc/4913/ns/net /bin/bash,在宿主机上查看docker进程和/bin/bash进程的网络命名空间/proc/${pid}/ns/net,会发现都指向lrwxrwxrwx 1 root root 0 Sep 14 14:42 net -> net:[4026532133]同一个位置。

3. pid的获取方式

最简单的方式上文第1点中的PID列

3.1 /proc/[pid]/ns

可以使用如下命令查看当前容器在宿主机上的进程id。

1
docker inspect --format '{{.State.Pid}}' a1bf0119d891

每个进程在/proc/${pid}/ns/目录下都会创建其对应的虚拟文件,并链接到一个真实的namespace文件上,如果两个进程下的链接文件链接到同一个地方,说明两个进程同属于一个namespace。

1
2
3
4
5
6
7
8
[root@localhost runc]# ls -l /proc/4913/ns/
total 0
lrwxrwxrwx 1 root root 0 Sep 11 00:21 ipc -> ipc:[4026532130]
lrwxrwxrwx 1 root root 0 Sep 11 00:21 mnt -> mnt:[4026532128]
lrwxrwxrwx 1 root root 0 Sep 11 00:18 net -> net:[4026532133]
lrwxrwxrwx 1 root root 0 Sep 11 00:21 pid -> pid:[4026532131]
lrwxrwxrwx 1 root root 0 Sep 11 00:21 user -> user:[4026531837]
lrwxrwxrwx 1 root root 0 Sep 11 00:21 uts -> uts:[4026532129]

reference