源码分析: Mbuf¶
更新于2019.03.30
目录
- 源码分析: Mbuf
- 概述
- 原理
- 数据结构
- 分配与回收
- 元信息
- Direct和Indirect mbuf
- Packet Type
- 概述
- 不同类型含义
- RTE_PTYPE_UNKNOWN
- RTE_PTYPE_L2_ETHER
- RTE_PTYPE_L2_ETHER_TIMESYNC
- RTE_PTYPE_L2_ETHER_ARP
- RTE_PTYPE_L2_ETHER_LLDP
- RTE_PTYPE_L2_ETHER_NSH
- RTE_PTYPE_L2_ETHER_VLAN
- RTE_PTYPE_L2_ETHER_QINQ
- RTE_PTYPE_L2_MASK
- RTE_PTYPE_L3_IPV4
- RTE_PTYPE_L3_IPV4_EXT
- RTE_PTYPE_L3_IPV6
- RTE_PTYPE_L3_IPV4_EXT_UNKNOWN
- RTE_PTYPE_L3_IPV6_EXT
- RTE_PTYPE_L3_IPV6_EXT_UNKNOWN
- RTE_PTYPE_L3_MASK
- RTE_PTYPE_L4_TCP
- RTE_PTYPE_L4_UDP
- RTE_PTYPE_L4_FRAG
- RTE_PTYPE_L4_SCTP
- RTE_PTYPE_L4_ICMP
- RTE_PTYPE_L4_NONFRAG
- RTE_PTYPE_L4_MASK
- RTE_PTYPE_TUNNEL_IP
- RTE_PTYPE_TUNNEL_GRE
- RTE_PTYPE_TUNNEL_VXLAN
- RTE_PTYPE_TUNNEL_NVGRE
- RTE_PTYPE_TUNNEL_GENEVE
- RTE_PTYPE_TUNNEL_GRENAT
- RTE_PTYPE_TUNNEL_GTPC
- RTE_PTYPE_TUNNEL_GTPU
- RTE_PTYPE_TUNNEL_ESP
- RTE_PTYPE_TUNNEL_MASK
- RTE_PTYPE_INNER_L2_ETHER
- RTE_PTYPE_INNER_L2_ETHER_VLAN
- RTE_PTYPE_INNER_L2_ETHER_QINQ
- RTE_PTYPE_INNER_L2_MASK
- RTE_PTYPE_INNER_L3_IPV4
- RTE_PTYPE_INNER_L3_IPV4_EXT
- RTE_PTYPE_INNER_L3_IPV6
- RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN
- RTE_PTYPE_INNER_L3_IPV6_EXT
- RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN
- RTE_PTYPE_INNER_L3_MASK
- RTE_PTYPE_INNER_L4_TCP
- RTE_PTYPE_INNER_L4_UDP
- RTE_PTYPE_INNER_L4_FRAG
- RTE_PTYPE_INNER_L4_SCTP
- RTE_PTYPE_INNER_L4_ICMP
- RTE_PTYPE_INNER_L4_NONFRAG
- RTE_PTYPE_INNER_L4_MASK
- RTE_PTYPE_ALL_MASK
- RTE_ETH_IS_IPV4_HDR
- RTE_ETH_IS_IPV6_HDR
- RTE_ETH_IS_TUNNEL_PKT
- 参考
原理¶
DPDK把元数据(metadata)和实际数据存储在一个mbuf中,并且使mbuf结构体尽量小,目前仅占用2个cache line,且最常访问的成员在第1个cache line中。
mbuf从前至后主要由mbuf首部(即rte_mbuf结构体)、head room、实际数据和tailroom构成。用户还可以在mbuf首部和head room之前加入一定长度的私有数据(private data)。head room的大小在DPDK编译配置文件(如common_linuxapp)中指定,如 CONFIG_RTE_PKTMBUF_HEADROOM=128
。mbuf的基本结构如下图所示:
一些指针、成员或函数结果的内容在下表中列出,mbuf指针简写为m:
项 | 内容 |
---|---|
m | 首部,即mbuf结构体 |
m->buf_addr | headroom起始地址 |
m->data_off | data起始地址相对于buf_addr的偏移 |
m->buf_len | mbuf和priv之后内存的长度,包含headroom |
m->pkt_len | 整个mbuf链的data总长度 |
m->data_len | 实际data的长度 |
m->buf_addr+m->data_off | 实际data的起始地址 |
rte_pktmbuf_mtod(m) | 同上 |
rte_pktmbuf_data_len(m) | 同m->data_len |
rte_pktmbuf_pkt_len | 同m->pkt_len |
rte_pktmbuf_data_room_size | 同m->buf_len |
rte_pktmbuf_headroom | headroom长度 |
rte_pktmbuf_tailroom | 尾部剩余空间长度 |
注:data_off = MIN(headroom_len, buf_len)
上图中的buf只有一个数据段,在某些情况下,比如要处理巨帧(jumbo frame)时,可以把多个mbuf链接起来组成一个mbuf。下图是包含3个数据段的mbuf:
对于链式的mbuf,仅在第一个mbuf结构体中包含元数据信息。
以下代码分别创建了两个mbuf,给它们添加数据,最后将它们组合成链。在此过程中打印了上表中的一些数据,可以帮助理解各指针和长度的含义,其中省去了错误处理代码。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | static int mbuf_demo(void)
{
int ret;
struct rte_mempool* mpool;
struct rte_mbuf *m, *m2;
struct rte_pktmbuf_pool_private priv;
priv.mbuf_data_room_size = 1600 + RTE_PKTMBUF_HEADROOM - 16;
priv.mbuf_priv_size = 16;
mpool = rte_mempool_create("test_pool",
ITEM_COUNT,
ITEM_SIZE,
CACHE_SIZE,
sizeof(struct rte_pktmbuf_pool_private),
rte_pktmbuf_pool_init,
&priv,
rte_pktmbuf_init,
NULL,
0,
MEMPOOL_F_SC_GET);
m = rte_pktmbuf_alloc(mpool);
mbuf_dump(m); // (1)
rte_pktmbuf_append(m, 1400);
mbuf_dump(m); // (2)
m2 = rte_pktmbuf_alloc(mpool);
rte_pktmbuf_append(m2, 500);
mbuf_dump(m2);
ret = rte_pktmbuf_chain(m, m2);
mbuf_dump(m); // (3)
return 0;
}
|
首先注意第8,9,16行,为了演示用户私有数据,在创建mempool时传入了priv,这将在每个mbuf的首部后面添加16字节的私有数据,然后才是head room。内存池对象数目、第个对象的大小和cache大小分别是:
#define ITEM_COUNT 1024
#define ITEM_SIZE (1600 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)
#define CACHE_SIZE 32
1600是预估的一个packet的最大长度。
在(1)处,新分配了一个mbuf m,此时m的data长度为0,打印结果如下:
RTE_PKTMBUF_HEADROOM: 128
sizeof(mbuf): 128
m: 0x7fbf1a810000
m->buf_addr: 0x7fbf1a810090
m->data_off: 128
m->buf_len: 1712
m->pkt_len: 0
m->data_len: 0
m->buf_addr+m->data_off: 0x7fbf1a810110
rte_pktmbuf_mtod(m): 0x7fbf1a810110
rte_pktmbuf_data_len(m): 0
rte_pktmbuf_pkt_len(m): 0
rte_pktmbuf_headroom(m): 128
rte_pktmbuf_tailroom(m): 1584
rte_pktmbuf_data_room_size(mpool): 1712
rte_pktmbuf_priv_size(mpool): 16
用图表示如下:
在(2),用rte_pktmbuf_append模拟给m填充了1400字节的data,此时打印结果如下:
m: 0x7fbf1a810000
m->buf_addr: 0x7fbf1a810090
m->data_off: 128
m->buf_len: 1712
m->pkt_len: 1400
m->data_len: 1400
m->buf_addr+m->data_off: 0x7fbf1a810110
rte_pktmbuf_mtod(m): 0x7fbf1a810110
rte_pktmbuf_data_len(m): 1400
rte_pktmbuf_pkt_len(m): 1400
rte_pktmbuf_headroom(m): 128
rte_pktmbuf_tailroom(m): 184
rte_pktmbuf_data_room_size(mpool): 1712
rte_pktmbuf_priv_size(mpool): 16
用图表示如下:
之后创建m2并给它添加data,在(3)处将m与m2连接,m做为链的首节点,此时m的打印结果如下:
m: 0x7fbf1a810000
m->buf_addr: 0x7fbf1a810090
m->data_off: 128
m->buf_len: 1712
m->pkt_len: 1900
m->data_len: 1400
m->buf_addr+m->data_off: 0x7fbf1a810110
rte_pktmbuf_mtod(m): 0x7fbf1a810110
rte_pktmbuf_data_len(m): 1400
rte_pktmbuf_pkt_len(m): 1900
rte_pktmbuf_headroom(m): 128
rte_pktmbuf_tailroom(m): 184
rte_pktmbuf_data_room_size(mpool): 1712
rte_pktmbuf_priv_size(mpool): 16
注意pkt_len的变化,它已经加上了m2的500字节。如果此时打印m—>next, 会发现m->next == m2。
数据结构¶
rte_mbuf(librte_mbuf/rte_mbuf.h):
struct rte_mbuf {
MARKER cacheline0;
void *buf_addr; /**< Virtual address of segment buffer. */
phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */
uint16_t buf_len; /**< Length of segment buffer. */
/* next 6 bytes are initialised on RX descriptor rearm */
MARKER8 rearm_data;
uint16_t data_off;
/**
* 16-bit Reference counter.
* It should only be accessed using the following functions:
* rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
* rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
* or non-atomic) is controlled by the CONFIG_RTE_MBUF_REFCNT_ATOMIC
* config option.
*/
union {
rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
uint16_t refcnt; /**< Non-atomically accessed refcnt */
};
uint8_t nb_segs; /**< Number of segments. */
uint8_t port; /**< Input port. */
uint64_t ol_flags; /**< Offload features. */
/* remaining bytes are set on RX when pulling packet from descriptor */
MARKER rx_descriptor_fields1;
/*
* The packet type, which is the combination of outer/inner L2, L3, L4
* and tunnel types.
*/
union {
uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
struct {
uint32_t l2_type:4; /**< (Outer) L2 type. */
uint32_t l3_type:4; /**< (Outer) L3 type. */
uint32_t l4_type:4; /**< (Outer) L4 type. */
uint32_t tun_type:4; /**< Tunnel type. */
uint32_t inner_l2_type:4; /**< Inner L2 type. */
uint32_t inner_l3_type:4; /**< Inner L3 type. */
uint32_t inner_l4_type:4; /**< Inner L4 type. */
};
};
uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
uint16_t data_len; /**< Amount of data in segment buffer. */
uint16_t vlan_tci; /**< VLAN Tag Control Identifier (CPU order) */
union {
uint32_t rss; /**< RSS hash result if RSS enabled */
struct {
union {
struct {
uint16_t hash;
uint16_t id;
};
uint32_t lo;
/**< Second 4 flexible bytes */
};
uint32_t hi;
/**< First 4 flexible bytes or FD ID, dependent on
PKT_RX_FDIR_* flag in ol_flags. */
} fdir; /**< Filter identifier if FDIR enabled */
struct {
uint32_t lo;
uint32_t hi;
} sched; /**< Hierarchical scheduler */
uint32_t usr; /**< User defined tags. See rte_distributor_process() */
} hash; /**< hash information */
uint32_t seqn; /**< Sequence number. See also rte_reorder_insert() */
uint16_t vlan_tci_outer; /**< Outer VLAN Tag Control Identifier (CPU order) */
/* second cache line - fields only used in slow path or on TX */
MARKER cacheline1 __rte_cache_aligned;
union {
void *userdata; /**< Can be used for external metadata */
uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
};
struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
struct rte_mbuf *next; /**< Next segment of scattered packet. */
/* fields to support TX offloads */
union {
uint64_t tx_offload; /**< combined for easy fetch */
struct {
uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
uint64_t l3_len:9; /**< L3 (IP) Header Length. */
uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
uint64_t tso_segsz:16; /**< TCP TSO segment size */
/* fields for TX offloading of tunnels */
uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */
uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */
/* uint64_t unused:8; */
};
};
/** Size of the application private data. In case of an indirect
* mbuf, it stores the direct mbuf private data size. */
uint16_t priv_size;
/** Timesync flags for use with IEEE1588. */
uint16_t timesync;
/* Chain of off-load operations to perform on mbuf */
struct rte_mbuf_offload *offload_ops;
}
分配与回收¶
元信息¶
见 Meta Information 。似乎 Rx端网卡并不能填充l2_type, l3_type等信息。
Direct和Indirect mbuf¶
上面描述的mbuf,由mbuf结构体首部、headroom和data等部分组成,实际持有数据,这样的mbuf称为direct mbuf。但在某些时候,比如需要复制或分片报文时,可能会用到另一种mbuf,它并不真正的持有数据,而是引用direct mbuf的数据,类似于对象的浅拷贝,这种mbuf称为indirect mbuf。
可以通过attach操作生成一个indirect mbuf。每个mbuf都有一个引用计数,当direct mbuf被attach时,它的引用计数+1;当被deattch时,引用计数-1。当引用计数为0时,意味着direct mbuf没人使用,可以被释放了。
indirect mbuf机制有一些限制条件:
- 不能attach一个indirect mbuf
- attach之前,mbuf的引用计数必须是1,也就是说,它没有被其他mbuf引用过
- 不能把indirect mbuf再次attach到一个direct mbuf,除非先deattch
虽然可以直接调用attach/detach操作,但推荐使用clone操作来浅拷贝mbuf,因为clone会正确处理链式mbuf。
Packet Type¶
概述¶
rte_mbuf有32bit的packet type成员, 其构成如下所示:
0 4 8 12 16
+---------------+---------------+---------------+---------------+
| outer_L2_type | outer_L3_type | outer_L4_type | tunnel_type |
+---------------+---------------+---------------+---------------+
| inner_L2_type | inner_L3_type | inner_L4_type | |
+---------------+---------------+---------------+---------------+
为了方便, 这32bit可以使用packet_type成员来一次性访问. 不同网卡对同一个报文的报文类型的识别结果是不同的.
下面是两个例子. 以下封装的报文:
<'ether type'=0x0800
| 'version'=4, 'protocol'=0x29
| 'version'=6, 'next header'=0x3A
| 'ICMPv6 header'>
在i40e网卡上解析的报文类型如下:
RTE_PTYPE_L2_ETHER |
RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
RTE_PTYPE_TUNNEL_IP |
RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
RTE_PTYPE_INNER_L4_ICMP
以下封装的报文:
<'ether type'=0x86DD
| 'version'=6, 'next header'=0x2F
| 'GRE header'
| 'version'=6, 'next header'=0x11
| 'UDP header'>
在i40e网卡上解析的报文类型如下:
RTE_PTYPE_L2_ETHER |
RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
RTE_PTYPE_TUNNEL_GRENAT |
RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
RTE_PTYPE_INNER_L4_UDP
不同类型含义¶
RTE_PTYPE_UNKNOWN¶
表示不包含任何报文类型信息:
/**
* No packet type information.
*/
#define RTE_PTYPE_UNKNOWN 0x00000000
RTE_PTYPE_L2_ETHER¶
2层: Ethernet类型, tunnel情况中用于外层报文:
/**
* Ethernet packet type.
* It is used for outer packet for tunneling cases.
*
* Packet format:
* <'ether type'=[0x0800|0x86DD]>
*/
#define RTE_PTYPE_L2_ETHER 0x00000001
RTE_PTYPE_L2_ETHER_TIMESYNC¶
2层: 时间同步类型:
/**
* Ethernet packet type for time sync.
*
* Packet format:
* <'ether type'=0x88F7>
*/
#define RTE_PTYPE_L2_ETHER_TIMESYNC 0x00000002
RTE_PTYPE_L2_ETHER_ARP¶
2层: ARP类型:
/**
* ARP (Address Resolution Protocol) packet type.
*
* Packet format:
* <'ether type'=0x0806>
*/
#define RTE_PTYPE_L2_ETHER_ARP 0x00000003
RTE_PTYPE_L2_ETHER_LLDP¶
2层: LLDP类型:
/**
* LLDP (Link Layer Discovery Protocol) packet type.
*
* Packet format:
* <'ether type'=0x88CC>
*/
#define RTE_PTYPE_L2_ETHER_LLDP 0x00000004
RTE_PTYPE_L2_ETHER_NSH¶
2层: NSH类型:
/**
* NSH (Network Service Header) packet type.
*
* Packet format:
* <'ether type'=0x894F>
*/
#define RTE_PTYPE_L2_ETHER_NSH 0x00000005
RTE_PTYPE_L2_ETHER_VLAN¶
2层: VLAN类型:
/**
* VLAN packet type.
*
* Packet format:
* <'ether type'=[0x8100]>
*/
#define RTE_PTYPE_L2_ETHER_VLAN 0x00000006
RTE_PTYPE_L2_ETHER_QINQ¶
2层: QinQ类型:
/**
* QinQ packet type.
*
* Packet format:
* <'ether type'=[0x88A8]>
*/
#define RTE_PTYPE_L2_ETHER_QINQ 0x00000007
RTE_PTYPE_L2_MASK¶
2层类型掩码:
/**
* Mask of layer 2 packet types.
* It is used for outer packet for tunneling cases.
*/
#define RTE_PTYPE_L2_MASK 0x0000000f
RTE_PTYPE_L3_IPV4¶
3层: IPv4类型, 不包含选项
, bit位``0001``
/**
* IP (Internet Protocol) version 4 packet type.
* It is used for outer packet for tunneling cases, and does not contain any
* header option.
*
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'ihl'=5>
*/
#define RTE_PTYPE_L3_IPV4 0x00000010
RTE_PTYPE_L3_IPV4_EXT¶
3层: IPv4类型, 包含选项
, bit位``0011``
/**
* IP (Internet Protocol) version 4 packet type.
* It is used for outer packet for tunneling cases, and contains header
* options.
*
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'ihl'=[6-15], 'options'>
*/
#define RTE_PTYPE_L3_IPV4_EXT 0x00000030
RTE_PTYPE_L3_IPV6¶
3层: IPv6类型, 不包含扩展首部
, bit位``0100``
/**
* IP (Internet Protocol) version 6 packet type.
* It is used for outer packet for tunneling cases, and does not contain any
* extension header.
*
* Packet format:
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=0x3B>
*/
#define RTE_PTYPE_L3_IPV6 0x00000040
RTE_PTYPE_L3_IPV4_EXT_UNKNOWN¶
3层: IPv4类型, 有可能包含, 也有可能不包含选项
, bit位``1001``
/**
* IP (Internet Protocol) version 4 packet type.
* It is used for outer packet for tunneling cases, and may or maynot contain
* header options.
*
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'ihl'=[5-15], <'options'>>
*/
#define RTE_PTYPE_L3_IPV4_EXT_UNKNOWN 0x00000090
RTE_PTYPE_L3_IPV6_EXT¶
3层: IPv6类型, 包含扩展首部
, bit位``1100``
/**
* IP (Internet Protocol) version 6 packet type.
* It is used for outer packet for tunneling cases, and contains extension
* headers.
*
* Packet format:
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=[0x0|0x2B|0x2C|0x32|0x33|0x3C|0x87],
* 'extension headers'>
*/
#define RTE_PTYPE_L3_IPV6_EXT 0x000000c0
RTE_PTYPE_L3_IPV6_EXT_UNKNOWN¶
3层: IPv6类型, 有可能包含, 也有可能不包含扩展首部
, bit位``1110``
/**
* IP (Internet Protocol) version 6 packet type.
* It is used for outer packet for tunneling cases, and may or maynot contain
* extension headers.
*
* Packet format:
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=[0x3B|0x0|0x2B|0x2C|0x32|0x33|0x3C|0x87],
* <'extension headers'>>
*/
#define RTE_PTYPE_L3_IPV6_EXT_UNKNOWN 0x000000e0
RTE_PTYPE_L3_MASK¶
3层类型掩码:
/**
* Mask of layer 3 packet types.
* It is used for outer packet for tunneling cases.
*/
#define RTE_PTYPE_L3_MASK 0x000000f0
RTE_PTYPE_L4_TCP¶
4层: TCP, 如果下层是IPv4, 则它后面没有更多分片
/**
* TCP (Transmission Control Protocol) packet type.
* It is used for outer packet for tunneling cases.
*
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=6, 'MF'=0, 'frag_offset'=0>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=6>
*/
#define RTE_PTYPE_L4_TCP 0x00000100
RTE_PTYPE_L4_UDP¶
4层: UDP, 如果下层是IPv4, 则它后面没有更多分片
/**
* UDP (User Datagram Protocol) packet type.
* It is used for outer packet for tunneling cases.
*
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=17, 'MF'=0, 'frag_offset'=0>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=17>
*/
#define RTE_PTYPE_L4_UDP 0x00000200
RTE_PTYPE_L4_FRAG¶
4层: IPv4或IPv6分片类型
, 这样的报文无法被识别为其他4层协议:
/**
* Fragmented IP (Internet Protocol) packet type.
* It is used for outer packet for tunneling cases.
*
* It refers to those packets of any IP types, which can be recognized as
* fragmented. A fragmented packet cannot be recognized as any other L4 types
* (RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP, RTE_PTYPE_L4_ICMP,
* RTE_PTYPE_L4_NONFRAG).
*
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'MF'=1>
* or,
* <'ether type'=0x0800
* | 'version'=4, 'frag_offset'!=0>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=44>
*/
#define RTE_PTYPE_L4_FRAG 0x00000300
RTE_PTYPE_L4_SCTP¶
4层: SCTP类型, 如果下层是IPv4, 则它后面没有更多分片
/**
* SCTP (Stream Control Transmission Protocol) packet type.
* It is used for outer packet for tunneling cases.
*
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=132, 'MF'=0, 'frag_offset'=0>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=132>
*/
#define RTE_PTYPE_L4_SCTP 0x00000400
RTE_PTYPE_L4_ICMP¶
4层: ICMP, 如果下层是IPv4, 则它后面没有更多分片
/**
* ICMP (Internet Control Message Protocol) packet type.
* It is used for outer packet for tunneling cases.
*
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=1, 'MF'=0, 'frag_offset'=0>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=1>
*/
#define RTE_PTYPE_L4_ICMP 0x00000500
RTE_PTYPE_L4_NONFRAG¶
4层: IP不分片, 但无法识别为其他4层协议的类型:
/**
* Non-fragmented IP (Internet Protocol) packet type.
* It is used for outer packet for tunneling cases.
*
* It refers to those packets of any IP types, while cannot be recognized as
* any of above L4 types (RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP,
* RTE_PTYPE_L4_FRAG, RTE_PTYPE_L4_SCTP, RTE_PTYPE_L4_ICMP).
*
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'protocol'!=[6|17|132|1], 'MF'=0, 'frag_offset'=0>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'!=[6|17|44|132|1]>
*/
#define RTE_PTYPE_L4_NONFRAG 0x00000600
RTE_PTYPE_L4_MASK¶
4层类型掩码:
/**
* Mask of layer 4 packet types.
* It is used for outer packet for tunneling cases.
*/
#define RTE_PTYPE_L4_MASK 0x00000f00
RTE_PTYPE_TUNNEL_IP¶
tunnel: IP in IP类型:
/**
* IP (Internet Protocol) in IP (Internet Protocol) tunneling packet type.
*
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=[4|41]>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=[4|41]>
*/
#define RTE_PTYPE_TUNNEL_IP 0x00001000
RTE_PTYPE_TUNNEL_GRE¶
tunnel: GRE类型:
/**
* GRE (Generic Routing Encapsulation) tunneling packet type.
*
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=47>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=47>
*/
#define RTE_PTYPE_TUNNEL_GRE 0x00002000
RTE_PTYPE_TUNNEL_VXLAN¶
tunnel: VXLAN类型:
/**
* VXLAN (Virtual eXtensible Local Area Network) tunneling packet type.
*
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=17
* | 'destination port'=4789>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=17
* | 'destination port'=4789>
*/
#define RTE_PTYPE_TUNNEL_VXLAN 0x00003000
RTE_PTYPE_TUNNEL_NVGRE¶
tunnel: NVGRE类型:
/**
* NVGRE (Network Virtualization using Generic Routing Encapsulation) tunneling
* packet type.
*
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=47
* | 'protocol type'=0x6558>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=47
* | 'protocol type'=0x6558'>
*/
#define RTE_PTYPE_TUNNEL_NVGRE 0x00004000
RTE_PTYPE_TUNNEL_GENEVE¶
tunnel: GENEVE类型:
/**
* GENEVE (Generic Network Virtualization Encapsulation) tunneling packet type.
*
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=17
* | 'destination port'=6081>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=17
* | 'destination port'=6081>
*/
#define RTE_PTYPE_TUNNEL_GENEVE 0x00005000
RTE_PTYPE_TUNNEL_GRENAT¶
tunnel: VXLAN或GRE类型, 因为有的网卡无法单独识别这两种类型:
/**
* Tunneling packet type of Teredo, VXLAN (Virtual eXtensible Local Area
* Network) or GRE (Generic Routing Encapsulation) could be recognized as this
* packet type, if they can not be recognized independently as of hardware
* capability.
*/
#define RTE_PTYPE_TUNNEL_GRENAT 0x00006000
RTE_PTYPE_TUNNEL_GTPC¶
tunnel: GTP-C类型:
/**
* GTP-C (GPRS Tunnelling Protocol) control tunneling packet type.
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=17
* | 'destination port'=2123>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=17
* | 'destination port'=2123>
* or,
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=17
* | 'source port'=2123>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=17
* | 'source port'=2123>
*/
#define RTE_PTYPE_TUNNEL_GTPC 0x00007000
RTE_PTYPE_TUNNEL_GTPU¶
tunnel: GTP-U类型:
/**
* GTP-U (GPRS Tunnelling Protocol) user data tunneling packet type.
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=17
* | 'destination port'=2152>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=17
* | 'destination port'=2152>
*/
#define RTE_PTYPE_TUNNEL_GTPU 0x00008000
RTE_PTYPE_TUNNEL_ESP¶
tunnel: ESP类型
/**
* ESP (IP Encapsulating Security Payload) tunneling packet type.
*
* Packet format:
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=51>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=51>
*/
#define RTE_PTYPE_TUNNEL_ESP 0x00009000
RTE_PTYPE_TUNNEL_MASK¶
tunnel类型掩码:
/**
* Mask of tunneling packet types.
*/
#define RTE_PTYPE_TUNNEL_MASK 0x0000f000
RTE_PTYPE_INNER_L2_ETHER¶
内层中的2层: Ethernet类型:
/**
* Ethernet packet type.
* It is used for inner packet type only.
*
* Packet format (inner only):
* <'ether type'=[0x800|0x86DD]>
*/
#define RTE_PTYPE_INNER_L2_ETHER 0x00010000
RTE_PTYPE_INNER_L2_ETHER_VLAN¶
内层中的2层: VLAN类型:
/**
* Ethernet packet type with VLAN (Virtual Local Area Network) tag.
*
* Packet format (inner only):
* <'ether type'=[0x800|0x86DD], vlan=[1-4095]>
*/
#define RTE_PTYPE_INNER_L2_ETHER_VLAN 0x00020000
RTE_PTYPE_INNER_L2_ETHER_QINQ¶
内层中的2层: QinQ类型:
/**
* QinQ packet type.
*
* Packet format:
* <'ether type'=[0x88A8]>
*/
#define RTE_PTYPE_INNER_L2_ETHER_QINQ 0x00030000
RTE_PTYPE_INNER_L2_MASK¶
内层中的2层掩码:
/**
* Mask of inner layer 2 packet types.
*/
#define RTE_PTYPE_INNER_L2_MASK 0x000f0000
RTE_PTYPE_INNER_L3_IPV4¶
内层中的3层: IPv4类型, 不包含选项
/**
* IP (Internet Protocol) version 4 packet type.
* It is used for inner packet only, and does not contain any header option.
*
* Packet format (inner only):
* <'ether type'=0x0800
* | 'version'=4, 'ihl'=5>
*/
#define RTE_PTYPE_INNER_L3_IPV4 0x00100000
RTE_PTYPE_INNER_L3_IPV4_EXT¶
内层中的3层: IPv4类型, 包含选项
/**
* IP (Internet Protocol) version 4 packet type.
* It is used for inner packet only, and contains header options.
*
* Packet format (inner only):
* <'ether type'=0x0800
* | 'version'=4, 'ihl'=[6-15], 'options'>
*/
#define RTE_PTYPE_INNER_L3_IPV4_EXT 0x00200000
RTE_PTYPE_INNER_L3_IPV6¶
内层中的3层: IPv6类型, 不包含扩展首部
/**
* IP (Internet Protocol) version 6 packet type.
* It is used for inner packet only, and does not contain any extension header.
*
* Packet format (inner only):
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=0x3B>
*/
#define RTE_PTYPE_INNER_L3_IPV6 0x00300000
RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN¶
内层中的3层: IPv4类型, 有可能包含, 也有可能不包含选项
/**
* IP (Internet Protocol) version 4 packet type.
* It is used for inner packet only, and may or maynot contain header options.
*
* Packet format (inner only):
* <'ether type'=0x0800
* | 'version'=4, 'ihl'=[5-15], <'options'>>
*/
#define RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN 0x00400000
RTE_PTYPE_INNER_L3_IPV6_EXT¶
内层中的3层: IPv6类型, 包含扩展首部
/**
* IP (Internet Protocol) version 6 packet type.
* It is used for inner packet only, and contains extension headers.
*
* Packet format (inner only):
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=[0x0|0x2B|0x2C|0x32|0x33|0x3C|0x87],
* 'extension headers'>
*/
#define RTE_PTYPE_INNER_L3_IPV6_EXT 0x00500000
RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN¶
内层中的3层: IPv6类型, 可能包含, 也可能不包含扩展首部
/**
* IP (Internet Protocol) version 6 packet type.
* It is used for inner packet only, and may or maynot contain extension
* headers.
*
* Packet format (inner only):
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=[0x3B|0x0|0x2B|0x2C|0x32|0x33|0x3C|0x87],
* <'extension headers'>>
*/
#define RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN 0x00600000
RTE_PTYPE_INNER_L3_MASK¶
内层中的3层类型掩码:
/**
* Mask of inner layer 3 packet types.
*/
#define RTE_PTYPE_INNER_L3_MASK 0x00f00000
RTE_PTYPE_INNER_L4_TCP¶
内层中的4层: TCP, 如果下层是IPv4, 则它后面没有更多分片
/**
* TCP (Transmission Control Protocol) packet type.
* It is used for inner packet only.
*
* Packet format (inner only):
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=6, 'MF'=0, 'frag_offset'=0>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=6>
*/
#define RTE_PTYPE_INNER_L4_TCP 0x01000000
RTE_PTYPE_INNER_L4_UDP¶
内层中的4层: UDP, 如果下层是IPv4, 则它后面没有更多分片
/**
* UDP (User Datagram Protocol) packet type.
* It is used for inner packet only.
*
* Packet format (inner only):
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=17, 'MF'=0, 'frag_offset'=0>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=17>
*/
#define RTE_PTYPE_INNER_L4_UDP 0x02000000
RTE_PTYPE_INNER_L4_FRAG¶
内层中的4层: IP分片类型, 可能包含, 也可能不包含4层类型
/**
* Fragmented IP (Internet Protocol) packet type.
* It is used for inner packet only, and may or maynot have layer 4 packet.
*
* Packet format (inner only):
* <'ether type'=0x0800
* | 'version'=4, 'MF'=1>
* or,
* <'ether type'=0x0800
* | 'version'=4, 'frag_offset'!=0>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=44>
*/
#define RTE_PTYPE_INNER_L4_FRAG 0x03000000
RTE_PTYPE_INNER_L4_SCTP¶
内层中的4层: SCTP, 如果下层是IPv4, 则它后面没有更多分片
/**
* SCTP (Stream Control Transmission Protocol) packet type.
* It is used for inner packet only.
*
* Packet format (inner only):
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=132, 'MF'=0, 'frag_offset'=0>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=132>
*/
#define RTE_PTYPE_INNER_L4_SCTP 0x04000000
RTE_PTYPE_INNER_L4_ICMP¶
内层中的4层: ICMP, 如果下层是IPv4, 则它后面没有更多分片
/**
* ICMP (Internet Control Message Protocol) packet type.
* It is used for inner packet only.
*
* Packet format (inner only):
* <'ether type'=0x0800
* | 'version'=4, 'protocol'=1, 'MF'=0, 'frag_offset'=0>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'=1>
*/
#define RTE_PTYPE_INNER_L4_ICMP 0x05000000
RTE_PTYPE_INNER_L4_NONFRAG¶
内层中的4层: 不分片的IP类型, 可能包含也可能不饮食其他4层类型
/**
* Non-fragmented IP (Internet Protocol) packet type.
* It is used for inner packet only, and may or maynot have other unknown layer
* 4 packet types.
*
* Packet format (inner only):
* <'ether type'=0x0800
* | 'version'=4, 'protocol'!=[6|17|132|1], 'MF'=0, 'frag_offset'=0>
* or,
* <'ether type'=0x86DD
* | 'version'=6, 'next header'!=[6|17|44|132|1]>
*/
#define RTE_PTYPE_INNER_L4_NONFRAG 0x06000000
RTE_PTYPE_INNER_L4_MASK¶
内层中的4层掩码:
/**
* Mask of inner layer 4 packet types.
*/
#define RTE_PTYPE_INNER_L4_MASK 0x0f000000
RTE_ETH_IS_IPV4_HDR¶
宏: 检查外层的3层类型是否为IPv4:
/**
* Check if the (outer) L3 header is IPv4. To avoid comparing IPv4 types one by
* one, bit 4 is selected to be used for IPv4 only. Then checking bit 4 can
* determine if it is an IPv4 packet.
*/
#define RTE_ETH_IS_IPV4_HDR(ptype) ((ptype) & RTE_PTYPE_L3_IPV4)
RTE_ETH_IS_IPV6_HDR¶
宏: 检查外层的3层类型是否为IPv6:
/**
* Check if the (outer) L3 header is IPv6. To avoid comparing IPv4 types one by
* one, bit 6 is selected to be used for IPv6 only. Then checking bit 6 can
* determine if it is an IPv6 packet.
*/
#define RTE_ETH_IS_IPV6_HDR(ptype) ((ptype) & RTE_PTYPE_L3_IPV6)
RTE_ETH_IS_TUNNEL_PKT¶
宏: 检查是否为tunnel报文:
/* Check if it is a tunneling packet */
#define RTE_ETH_IS_TUNNEL_PKT(ptype) ((ptype) & \
(RTE_PTYPE_TUNNEL_MASK | \
RTE_PTYPE_INNER_L2_MASK | \
RTE_PTYPE_INNER_L3_MASK | \
RTE_PTYPE_INNER_L4_MASK))
参考¶
- DPDK Programmer’s Guide - Mbuf Library
- DPDK源码: lib/librte_mbuf/rte_mbuf.h
- DPDK源码: lib/librte_mbuf/rte_mbuf_ptype.h
- DPDK Test Plans - Unified Packet Type Tests