Linux中的veth pair设备

veth pair是一对虚拟的网络设备,两个网络设备彼此连接。常用于两个network namespace之间的连接,如果在同一个命名空间下有很多的限制。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
┌──────────────────────────────────────────────────────────────────────────────┐
│                                                                              │
│                                                                              │
│                               network protocol                               │
│                                                                              │
│                                                                              │
└────────────────────▲─────────────────────────▲──────────────────────▲────────┘
                     │                         │                      │
                     │                         │                      │
                     │                         │                      │
                     │                         │                      │
                     │                         │                      │
               ┌─────▼────┐              ┌─────▼────┐           ┌─────▼────┐
               │          │              │          │           │          │
               │   eth0   │              │  veth0   ◀───────────▶  veth1   │
               │          │              │          │           │          │
               └─────▲────┘              └──────────┘           └──────────┘

             physical network

实战

veth设备的ping测试

1. 只给一个veth设备配置ip的情况测试

给veth0配置ip 192.168.100.10,可以看到主机的路由表中增加了目的地为192.168.100.0的记录

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
[root@localhost vagrant]# ip link add veth0 type veth peer name veth1
[root@localhost vagrant]# ip addr add 192.168.100.10/24 dev veth0
[root@localhost vagrant]# ip addr add 192.168.100.11/24 dev veth1
## 因为veth创建完后默认不启用,此时还没有路由
[root@localhost vagrant]# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         gateway         0.0.0.0         UG    100    0        0 eth0
10.0.2.0        0.0.0.0         255.255.255.0   U     100    0        0 eth0
192.168.33.0    0.0.0.0         255.255.255.0   U     101    0        0 eth1

## 启用veth0后增加路由
[root@localhost vagrant]# ip link set veth0 up
[root@localhost vagrant]# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         gateway         0.0.0.0         UG    100    0        0 eth0
10.0.2.0        0.0.0.0         255.255.255.0   U     100    0        0 eth0
192.168.33.0    0.0.0.0         255.255.255.0   U     101    0        0 eth1
192.168.100.0   0.0.0.0         255.255.255.0   U     0      0        0 veth0

## 启用veth1后居然又增加了一条路由信息
[root@localhost vagrant]# ip link set veth1 up
[root@localhost vagrant]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:26:10:60 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic eth0
       valid_lft 86214sec preferred_lft 86214sec
    inet6 fe80::5054:ff:fe26:1060/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:98:06:20 brd ff:ff:ff:ff:ff:ff
    inet 192.168.33.11/24 brd 192.168.33.255 scope global noprefixroute eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe98:620/64 scope link
       valid_lft forever preferred_lft forever
4: veth1@veth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether e2:15:95:0a:1f:da brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.11/24 scope global veth1
       valid_lft forever preferred_lft forever
    inet6 fe80::e015:95ff:fe0a:1fda/64 scope link
       valid_lft forever preferred_lft forever
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether b2:2c:f6:e4:74:c5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.10/24 scope global veth0
       valid_lft forever preferred_lft forever
    inet6 fe80::b02c:f6ff:fee4:74c5/64 scope link
       valid_lft forever preferred_lft forever
[root@localhost vagrant]# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         gateway         0.0.0.0         UG    100    0        0 eth0
10.0.2.0        0.0.0.0         255.255.255.0   U     100    0        0 eth0
192.168.33.0    0.0.0.0         255.255.255.0   U     101    0        0 eth1
192.168.100.0   0.0.0.0         255.255.255.0   U     0      0        0 veth0
192.168.100.0   0.0.0.0         255.255.255.0   U     0      0        0 veth1

默认情况下arp表如下:

1
2
3
4
5
6
# arp
Address                  HWtype  HWaddress           Flags Mask            Iface
localhost.localdomain            (incomplete)                              veth0
192.168.33.1             ether   0a:00:27:00:00:00   C                     eth1
gateway                  ether   52:54:00:12:35:02   C                     eth0
10.0.2.3                 ether   52:54:00:12:35:03   C                     eth0

使用ping命令ping -I veth0 192.168.100.11 -c 2,默认情况下veth1和veth0会接收到arp报文,但并没有arp的响应报文。这是因为默认情况下有些arp内核参数的限制。执行如下命令解决arp的限制。

1
2
3
4
5
echo 1 > /proc/sys/net/ipv4/conf/veth1/accept_local
echo 1 > /proc/sys/net/ipv4/conf/veth0/accept_local
echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/veth0/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/veth1/rp_filter

veth pair设备的删除

1
2
# 删除veth0后会自动删除veth1
$ ip link delete veth0

container与host veth pair的关系

veth pair的其中一个设备位于container中备位于container中,另外一个设备位于host network namespace中,如何知道container中的eth0和host network namesapce中的veth设备的对应关系呢?

原理为veth pair设备都有一个ifindex和iflink值,,容器中的eth0设备的ifindex值跟host network namespace中的对应veth pair设备的iflink值相等,反之亦然。

方法一

获取iflink值:cat /sys/class/net/eth0/iflink

也可用此方法获取ifindex值:cat /sys/class/net/eth0/ifindex

方法二

1
2
3
$ ip link show eth0
3: eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
    link/ether 96:5f:80:a3:a3:01 brd ff:ff:ff:ff:ff:ff

其中的3为eth0的ifindex。18为eth0的iflink,即对应的veth pair的另外一个设备的ifindex。

host network namespace中找到对应ifindex值的veth pair设备

1
2
3
4
5
$ ip addr 
18: veth0e09999e@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP group default
    link/ether de:b0:74:89:e8:3e brd ff:ff:ff:ff:ff:ff link-netnsid 4
    inet6 fe80::dcb0:74ff:fe89:e83e/64 scope link
       valid_lft forever preferred_lft forever

其中的18为ifindex,3为对应的veth pair的ifindex。

reference