blob: ed2004829017e64caeacd90d808e2d50a261b5e0 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
|
Terminology
===========
- Send/Receive Queues
QP (Queue Pair): Combines RQ and SQ. Generally, irrelevant for the following
RQ (Receive Queue):
SQ (Send Queue):
CQ (Completion Queue): Completed operations reported here
EQ (Event Queue): Completions generate events (at specified rate) which in turn generate IRQs
WR/WQ (Work Request Queue): This is basically buffers (SG-lists) which should be either send or used for data reception
*QE (* Queue Event)
Flow: WQE --submit work--> WQ --execute--> SQ/RQ --on completion-> CQ --signal--> EQ -> IRQ
* Completion Event Moderation: Redeuce amount of reported events (EQ)
- Ofloads
RSS (Receive Side Scalling): Distribute load across CPU cores
LRO (Large Receive Offload): Group packets and deliver to user-space as a large single grouped packet [ ethtool -K shows if LRO on/off ]
- Various
AEV (Asynchronous Event): Errors,etc.
SRQ (Shared Receive Queue):
ICM (Interconnect Context Memory): Address Translation Tables, Control Objects, User Access Region (registers)
MPT (Memory Protection Table):
RMP (Receive Memory Pool):
TIR (Transport Interface Receive):
RQT (RQ Table):
MCG (Multicast Group):
Driver
======
- Network packets is/are streamed to ring buffers (with all Ethernet, IP, UDP/TCP headers).
The number of ring buffers dependents on VMA_RING_ALLOCATION parameter:
0 - per network interface
1 - per IP
=> 10 - per socket
20 - per thread (which was used to create the socket)
30 - per core
31 - per core (with some affinity of threads to cores)
- The memory for ring buffer is allocated based on VMA_MEM_ALLOC_TYPE:
0 - malloc (this will be very slow if large buffers are requested)
1 - contigous
=> 2 - HugePages
- The number of buffers per ring is controlled with VMA_RX_BUFS (this is total in all rings)
* Each buffer VMA_MTU bytes
* Recommended: VMA_RX_BUFS ~ #rings * VMA_RX_WRE (number of WRE allocated on all interfaces)
LibVMA
======
There is 3 interfaces:
- MP-RQ (Multi-packet Receive Queue): vma_cyclic_buffer_read
This is useful for processing data streams when packet size stays contant and the packet flow doesn't change
drastically over time. Requires ConntextX-5 or newer.
* Use 'vma_add_ring_profile' to configure the size of ring buffer (specifies buffer size & the packet size)
* Set per-socket SO_VMA_RING_ALLOC_LOGIC using setsockopt
* Call 'vma_cyclic_buffer_read' to access raw ring buffer, specifies minimum and maximum packets to return
* The returned 'completion' structure referencing the position in the ring buffer. Packets in ring buffer
include all headers (ethernet - 14 bytes, ip - 20 bytes, udp - 8 bytes).
* New packets meanwhile are written in the remaining part of the ring buffer (until the linear end of the
buffer - consequently the returned data is not overwritten).
* The buffer rewinded only on call to 'vma_cyclic_buffer_read'. Less than the specified minimum amount of
packets can be returned if currently near the end of buffer and not enough space to fullfil the minimum
requirement.
* To ensure enough space for the follow up packets, synchronization between buffer size and min/max packet
is required. It should never happen that the space for only few packets is left when end of the buffer is
close.
- SocketXtreme: socketxtreme_poll
More complex interface allowing more control over process particularly processing packets with varing size.
Requires ConnectX-5 or newer.
* Get ring buffers associated with socket 'get_socket_rings_num' and 'get_socket_rings_fds'
* Get ready completions on the specified ring buffer with 'socketxtreme_poll' (pass 'fd' returned with 'get_socket_rings_fds')
* Two types of completions: 'VMA_SOCKETXTREME_NEW_CONNECTION_ACCEPTED' and 'VMA_SOCKETXTREME_PACKET'.
* For the second type, process an associated list of buffers and keep reference counting with 'socketxtreme_ref_vma_buf',
'socketxtreme_free_vma_buf'.
* Clean/unreference received packets with socketxtreme_free_vma_packets
- Zero Copy: recvfrom_zcopy
The simplest interface working with ConnectX-3 cards. The packet is still written to ring-buffers. The data is not copied out
of ring buffers. This interface provides a way to get pointers to locations in ring buffer. There is a slight overhead compared
to MP-RQ approach to prepare list of packet pointers.
|