The state needed to process a single rpc request and send a response is kept in the svc_rqst structure. In particular, memory is needed to hold the request, while it is being decoded, and to hold the encoded reply. NOTE: this is all changing between 2.6.18 and 2.6.19; the two arrays below are replaced by a single array, some small helper functions are removed, etc. The rq_argpages and rq_respages each hold an array of pages of size RPCSVC_MAXPAGES; the memory in these pages is used to hold requests and replies. The xdr_buf's rq_arg and rq_res point into these pages; the rq_arg->head and rq_arg->tail iovecs always point to memory within rq_argpages[0], and rq_arg->pages always points to rq_argpages[1], and similarly for rq_res and rq_respages. (XXX: That looks wrong for rq_res at least: note how the tail is allocated in nfsd_read, for example.) We now trace the life of an nfs request. First, before any requests are even received, svc_create_thread(nfsd, nfsd_svc) is used to start a thread running nfsd(). Before doing so it also allocates a new struct rqstp, which it passes to nfsd(), and allocates enough pages to contain 2 pages plus 2 + 32k bytes. (Note that the memory it allocates at the same time for rq_argp and rq_resp is to hold C structures with decoded arguments and with replies to be encoded, not to hold the raw xdr data). rq_arghi is set to the number of pages allocated, pointers to which are stored in the first rq_arghi elements of rq_argpages. The server's main loop is in the function nfsd(). It does a svc_recv(), which does a svc_pushback_allpages, which decrements rq_resused to 0 and, for each non-null page in in the first rq_resused elements of rq_respages, moves the page to the end of the rq_argpages list, appending it and incrementing rq_arghi. Then svc_recv allocates and adds to rqstp->rq_argpages enough pages to restore rq_arghi to its initial value; thus each time through the loop we return the two arrays to the same state as they were after svc_create_thread(). Then svc_recv sets rq_arg.head[0] to the whole of the first page, sets the tail to nothing, sets rq_argused to 1, sets rq_arg.pages to &rq_argpages[1], sets rq_arg.page_base to 0, sets arg->page_len to (pages-2)*PAGES_SIZE (one page for the arg head, one page left for the response), and sets arg->len to (pages-1)*PAGE_SIZE. After actually reading in the data from the sk_buff, these lengths are modified to reflect data actually copied (note the tail isn't used), and rq_argused is incremented by the number of pages actually used (in addition to the head) in xdr_buf. XXX Warning: this is a bit disorganized and probably wrong, especially from this point on. Don't read without a copy of the code on hand to compare. Next svc_process() is called from the nfsd() main loop. It calls svc_take_page(), which decrements rqstp->rq_arghi, increments rqstp->rq_resused, and adds the last argpage to the end of the respages array at rqstp->rq_resused. (XXX Why doesn't it zero out the entry of argpages it just copied?). It points rqstp->rq_arg.head[0] to this page (setting the length of the iovec to 0; that length is used as a write pointer), then sets rq_res.pages to &rqstp->rq_respages[1]. After processing the rpc header, it calls nfsd_dispatch() (in the v3 and v4 cases). XXX nfsd_dispatch() may do some more fiddling with the page arrays, especially in the case of reads or writes (see below). Then it does a svc_send() or svc_drop(). Both svc_send() and svc_drop then call svc_sock_release(), which does svc_free_allpages(), which decrements rq_resused to 0, put_page()'ing all pages it finds. (Note that this put_page() allows nfsd to forget about the page, but doesn't necessarily release it completely; the page may still be associated with a file, or ->sendpage() (which is what svc_send uses) may have returned before actually sending the page; but in that case the filesystem and/or networking code will still hold references to the page.) XXX Don't understand svc_release_skb(). What does it do? Why is it called where it's called? XXX Then isn't the pushback_unused_pages() a no-op? XXX still true? XXX: Invariants?: (rq_respages[i] != NULL) iff (i < rq_resused) (rq_argpages[i] != NULL) iff (i < rq_arghi) rqstp->rq_argused <= rqstp->rq_arghi svc_pushback_allpages: sets all the rq_respages to NULL, decrements rq_resused to 0, increments rq_arghi for each non-null respage and transfer each such page to rq_argpages. svc_pushback_unused_pages: same as pushback_allpages, but stops if we get to a respage that's equal to rq_res.pages. svc_free_allpages: same as pushback_all, but does a put_page instead of transferring to argpages array. svc_take_page: returns -ENOMEM if rq_argused >= rq_arghi; otherwise transfers a page from argpages to respages, decrementing rq_arghi and incrementing rq_resused. Comments in svc.h claim: first page in the rq_pages list is always used for head and tail. The xdr_buf.pages pointer always points to the second page in the relevant list. Note the real work of reads is done in the xdr encoding (nfsd4_encode_read). So how does the whole read process work for e.g. nfs4?: nfsd4_encode_read does svc_take_page for each page needed to hold requested amount of read data. Also sets up read->rd_iov's to point to these pages. nfsd_read: calls svc_pushback_unused_pages(rqstp) if the backing filesystem has a sendfile method, calls svc_pushback_unused_pages() and uses sendfile and nfsd_read_actor to set up the pages instead. Otherwise calls vfs_readv() which uses the iovecs set up in encode_read above. So in the sendfile case, we end up with the rq_respages list containing a bunch of pages that we don't actually need. what the heck is rq_restailpage?: rqstp->rq_reserved?