Architecture =========== - Current implementation follows UFO architecture: reader and dataset-builder are split in two filters. * The reader is multi-threaded. However, only a single instance of the builder is possible to schedule. This could limit maximum throughput on dual-head or even signle-head, but many-core systems. * Another problem here is timing. All events in the builder are initiaded from the reader. Consequently, as it seems we can't timeout on semi-complete dataset if no new data is arriving. * Besides, performance this is also critical for stability. With continuous streaming there is no problem, however, if a finite number of frames requested and some packets are lost, the software will wait forever for missing bits. Problems ======== - When streaming at high speed (~ 16 data streams; 600 Mbit & 600 kpck each), the data streams quickly get desynchronized (but all packets are delivered). * It is unclear if problem is on the receiver side (no overloaded CPU cores) or de-synchronization is first appear on the simmulation sender. The test with real hardware is required. * For border case scenarios, increasing number of buffers from 2 to 10-20 helps. But at full speed, even 1000s buffers are not enough. Packets counts are quickly going appart. * Further increase of packet buffer provided to 'recvmmsg' does not help (even if blocking is enforced until all packets are received) * At the speed specified above, the system works also without libvma. * Actually, with libvma a larger buffer is required. In the beginning the performance of libvma is gradually speeding up (that was always like that). And during this period a significant desynchronization happens. To compensate it, we need about 400 buffers with libvma as compared to only 10 required if standard Linux networking is utilized. - In any case (LibVMA or not), some packets will be lost in the beginning if high-speed communication is tested. * Usually, first packets are transferred OK, but, then, a few packets will be lost occasionally here and there (resulting in broken frames). This basically breaks grabbing a few packets and exitig. Unclear if server- or client-side problem (makes sense to see how real-hardware will behave). * Can we pre-heat to avoid this speeding-up problem (increase pre-allocated buffers, disable power-saving mode, ??) Or it will be also not a problem with hardware? We can send UDP packets (should be send from another host), but packets are still lost: for i in $(seq 4000 4015); do echo "data" > /dev/udp/192.168.34.84/$i; done * The following doesn't help: new version of libvma, tunning of the options. - Communication breaks with small MTU sizes (bellow 1500), but this is probably not important (Packets are delivered but with extreme latencies. Probably some tunning of network stack is required). - Technically, everything should work if we start UFO server when data is already streamed. However, the first dataset could be any. Therefore, the check fails as the data is shifted by a random number of datasets. Questions ========= - Can we pre-allocate several UFO buffers for forth-comming events. Currently, we need to buffer out-of-order packets and copy them later (or buffer everything for simplicity). We can avoid this data copy if we can get at least one packet in advance. - How I can execute 'generate' method on 'reductor' filter if no new data on the input for the specified amount of time. One option is sending empty buffer with metadata indicating timeout. But this is again hackish. - Can we use 16-bit buffers? I can set dimmensions to 1/4 of the correct value to address this. But is it possible to do in a clean way? - What is 'ufotools' python package mentioned in documentation? Just a typo?