Terminology =========== - Send/Receive Queues QP (Queue Pair): Combines RQ and SQ. Generally, irrelevant for the following RQ (Receive Queue): SQ (Send Queue): CQ (Completion Queue): Completed operations reported here EQ (Event Queue): Completions generate events (at specified rate) which in turn generate IRQs WR/WQ (Work Request Queue): This is basically buffers (SG-lists) which should be either send or used for data reception *QE (* Queue Event) Flow: WQE --submit work--> WQ --execute--> SQ/RQ --on completion-> CQ --signal--> EQ -> IRQ * Completion Event Moderation: Redeuce amount of reported events (EQ) - Ofloads RSS (Receive Side Scalling): Distribute load across CPU cores LRO (Large Receive Offload): Group packets and deliver to user-space as a large single grouped packet [ ethtool -K shows if LRO on/off ] - Various AEV (Asynchronous Event): Errors,etc. SRQ (Shared Receive Queue): ICM (Interconnect Context Memory): Address Translation Tables, Control Objects, User Access Region (registers) MPT (Memory Protection Table): RMP (Receive Memory Pool): TIR (Transport Interface Receive): RQT (RQ Table): MCG (Multicast Group): Driver ====== - Network packets is/are streamed to ring buffers (with all Ethernet, IP, UDP/TCP headers). The number of ring buffers dependents on VMA_RING_ALLOCATION parameter: 0 - per network interface 1 - per IP => 10 - per socket 20 - per thread (which was used to create the socket) 30 - per core 31 - per core (with some affinity of threads to cores) - The memory for ring buffer is allocated based on VMA_MEM_ALLOC_TYPE: 0 - malloc (this will be very slow if large buffers are requested) 1 - contigous => 2 - HugePages - The number of buffers per ring is controlled with VMA_RX_BUFS (this is total in all rings) * Each buffer VMA_MTU bytes * Recommended: VMA_RX_BUFS ~ #rings * VMA_RX_WRE (number of WRE allocated on all interfaces) LibVMA ====== There is 3 interfaces: - MP-RQ (Multi-packet Receive Queue): vma_cyclic_buffer_read This is useful for processing data streams when packet size stays contant and the packet flow doesn't change drastically over time. Requires ConntextX-5 or newer. * Use 'vma_add_ring_profile' to configure the size of ring buffer (specifies buffer size & the packet size) * Set per-socket SO_VMA_RING_ALLOC_LOGIC using setsockopt * Call 'vma_cyclic_buffer_read' to access raw ring buffer, specifies minimum and maximum packets to return * The returned 'completion' structure referencing the position in the ring buffer. Packets in ring buffer include all headers (ethernet - 14 bytes, ip - 20 bytes, udp - 8 bytes). * New packets meanwhile are written in the remaining part of the ring buffer (until the linear end of the buffer - consequently the returned data is not overwritten). * The buffer rewinded only on call to 'vma_cyclic_buffer_read'. Less than the specified minimum amount of packets can be returned if currently near the end of buffer and not enough space to fullfil the minimum requirement. * To ensure enough space for the follow up packets, synchronization between buffer size and min/max packet is required. It should never happen that the space for only few packets is left when end of the buffer is close. - SocketXtreme: socketxtreme_poll More complex interface allowing more control over process particularly processing packets with varing size. Requires ConnectX-5 or newer. * Get ring buffers associated with socket 'get_socket_rings_num' and 'get_socket_rings_fds' * Get ready completions on the specified ring buffer with 'socketxtreme_poll' (pass 'fd' returned with 'get_socket_rings_fds') * Two types of completions: 'VMA_SOCKETXTREME_NEW_CONNECTION_ACCEPTED' and 'VMA_SOCKETXTREME_PACKET'. * For the second type, process an associated list of buffers and keep reference counting with 'socketxtreme_ref_vma_buf', 'socketxtreme_free_vma_buf'. * Clean/unreference received packets with socketxtreme_free_vma_packets - Zero Copy: recvfrom_zcopy The simplest interface working with ConnectX-3 cards. The packet is still written to ring-buffers. The data is not copied out of ring buffers. This interface provides a way to get pointers to locations in ring buffer. There is a slight overhead compared to MP-RQ approach to prepare list of packet pointers.