Sending of FIX message takes too long. NOS serialization takes ~1us, but sending it to socket takes 50-100us for 1G NIC localhost address. Can it be somehow faster? Should poco socket be non-blocking (sock->setBlocking(false)? Should some NIC settings be applied for LL?
There is no point of making fast message serialization w/o able sending it to wire with the similar speed...
Ubuntu 13.10 x64 gcc4.8
Timing for sending/receiving messages (like for serialization/deserialization) would help here.
I doubt it will have anything to do with Poco, or whether the socket is blocking or non-blocking. Once the buffer is transferred to the kernel for sending it is largely out of our hands. There may be some tweaks possible at the NIC level but these are device/env specific. If there are more general optimisations available then of course we should consider them.
But 50 to 100us sounds excessive especially if in your testing it was via the loopback. However this issue is really in the realm of individual production environments, and is out of scope.
I agree that we should endeavour to make sure our software is as fast as possible. We can also recommend optimal production environments. For example, Solarflare openonload addresses this issue directly. I have used this package along with their cards before with fantastic results.
Sure, we almost can't control kernel besides recommended settings for low latency...
I know about SolarFlare cards and I'm trying to use them on prod env. But, still typical numbers for 1G, 10G cards, localhost and maybe onload cards would be helpful.
One more point. It appeared just recently when I tried to work in coro mode. Coro means that everything is in main thread and cannot block for long operation, cannot wait for completions. If it blocks or waits, the entire system blocks. I switched to coro from pipelined mode and was expecting latency reduction. But it was just opposite due to every call to send is blocking and waits for send finishes transmitting bytes to some place in kernel and/or network. So sending 3 NOS in a cycle takes 3*sending time and blocks the main thread during that time... What shall I do with that? Can sending in coro be done asynchronously?
Yes agreed that performance numbers on current production hardware would be useful.
Regarding coroutines, we could try and make the socket non-blocking - that would mean the mainloop would need to repeatedly check the connection to see if there were pending bytes to send. Might be worth a look. Did you raise a ticket for this one?
Not yet. I'll create one