Monday, June 07, 2010

Innodb now supports native AIO on Linux

With the exception of Windows InnoDB has used ’simulated AIO’ on all other platforms to perform certain IO operations. The IO requests that have been performed in a ’simulated AIO’ way are the write requests and the readahead requests for the datafile pages. Let us first look at what does ’simulated AIO’ mean in this context.

We call it ’simulated AIO’ because it appears asynchronous from the context of a query thread but from the OS perspective the IO calls are still synchronous. The query thread simply queues the request in an array and then returns to the normal working. One of the IO helper thread, which is a background thread, then takes the request from the queue and issues a synchronous IO call (pread/pwrite) meaning it blocks on the IO call. Once it returns from the pread/pwrite call, this helper thread then calls the IO completion routine on the block in question which includes doing a merge of buffered operations, if any, in case of a read. In case of a write, the block is marked as ‘clean’ and is removed from the flush_list. Some other book keeping stuff also happens in IO completion routine.

What we have changed in the InnoDB Plugin 1.1 is to use the native AIO interface on Linux. Note that this feature requires that your system has libaio installed on it. libaio is a thin wrapper around the kernelized AIO on Linux. It is different from Posix AIO which requires user level threads to service AIO requests. There is a new boolean switch, innodb_use_native_aio, to choose between simulated or native AIO, the default being to use native AIO.

How does this change the design of the InnoDB IO subsystem? Now the query thread instead of enqueueing the IO request actually dispatches the request to the kernel and returns to the normal working. The IO helper thread, instead of picking up enqueued requests, waits on the IO wait events for any completed IO requests. As soon as it is notified by the kernel that a certain request has been completed it calls the IO completion routine on that request and then returns back to wait on the IO wait events. In this new design the IO requesting thread becomes kind of a dispatcher while the background IO thread takes on the role of a collector.

What will this buy us? The answer is simple – scalability. For example, consider a system which is heavily IO bound. In InnoDB one IO helper thread works on a maximum of 256 IO requests at one time. Assume that the heavy workload results in the queue being filled up. In simulated AIO the IO helper thread will go through these requests one by one making a synchronous call for each request. This means serialisation forcing the request that is serviced last to wait for the other 255 requests before it gets a chance. What this implies is that with simulated AIO there can be at most ‘n’ IO requests in parallel inside the kernel where ‘n’ is the total number of IO helper threads (this is not entirely true because query threads are also allowed to issue synchronous requests as well, but I’ll gloss over that detail for now). In case of native AIO all 256 requests are dispatched to the kernel and if the underlying OS can service more requests in parallel then we’ll take advantage of that.

The idea of coalescing contiguous requests is now off loaded to the kernel/IO scheduler. What this means is that which IO scheduler you are using or the properties of your RAID/disk controller may now have more affect on the overall IO performance. This is also true because now many more IO requests will be inside the kernel than before. Though we have not run tests to specifically certify any particular IO scheduler the conventional wisdom has been that for database engine workloads perhaps no-op or deadline scheduler would give optimal performance. I have heard that lately a lots of improvements have gone in cfq as well. It is for you to try and as always YMMV. And we look forward to hear your story.

NOTE:InnoDB h as always used native AIO on Windows and it continues to do so in Plugin 1.1. innodb_use_native_aio will have no affect on Windows.

No comments: