NFSd4 seqid-based reply cache: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ struct nfs4_replay { u32 rp_status; unsigned int rp_buflen; char *rp_buf; unsigned intrp_allocated; char rp_ibuf[NFSD4_REPLAY_ISIZE]; }; This struct is embedded in the nfs4_stateowner struct. A pointer to the replay is also kept in the nfsd4_op struct. This is initialized in nfs4state.c:alloc_init_open_stateowner() and alloc_init_open_lockowner(). In nfsd4_proc_compound(), operations are processed and encoded one by one. The function that processes an op that takes a stateid sets NFSERR_REPLAY_ME if it detects a replay, and then the caller (nfsd4_proc_compound()) sets the op->replay field to the nfs4_replay struct in the appropriate stateowner. Then nfs4xdr.c:nfsd4_encode_replay() uses the pointer in the nfsd4_op to encode the cached reply. It appears to be incorrect, as it doesn't check the status of the original reply! The nfs4_replay struct is filled in on succesful replies by the xdr encode routine for the appropriate op; all such routines use the ENCODE_SEQID_OP_HEAD and ENCODE_SEQID_OP_TAIL macros to handle this. There are 6 such operations: close, open, open_confirm, and open_downgrade; however the current code forgets to do this for lock and locku. The checking of the stateid to detect replays is done by the nfs4_preprocess_seqid_op for all of the operations except open, which does this on its own in nfsd4_process_open1. Note that open and the others seem to handle replays without a previously saved reply differently; open just soldiers on with the processing, figuring it'll just try the open again and fail in the same way again, whereas the others just go ahead and use the nfs4_replay as it is (which defaults to an NFS4ERR_SERVERFAULT). (Though actually if a succesful open is required before any of those other operations, perhaps this can never happen.) NFSd xid-based cache: ^^^^^^^^^^^^^^^^^^^^ See fs/nfsd/nfscache.c. Note that it should probably index on ip address, or client, or whatever, as well, but it currently only uses xid, procedure, protocol, and version. nfssvc.c:nfsd_dispatch() calls nfsd_cache_lookup() as the first thing before any other processing; as required, it encodes the reply, or sets aside a cache entry to hold the forthcoming reply (remember that we have to be able to look up ongoing replies as well as completed ones, so there has to be a placeholder there). In the case of in-progress replies it just drops the request. After all other processing is done, nfsd_dispatch() calls nfsd_cache_update() to store result in the cache. Note that nfsd_cache_lookup is also passed proc->pc_cachetype() to decide how to deal with the request; choices are RC_NOCACHE, RC_REPLSTAT, and RC_REPLBUFF. Further discussion ^^^^^^^^^^^^^^^^^^ Note that the seqid-based cache doesn't need to worry about concurrent requests: it can lock out any processing related to that stateowner for the duration of the op. This client shouldn't be resending aggressively if (as we hope) we are using a reliable transport, so lock contention should be rare. There is, however, a problem in the current implementation: we rely on a single semaphore to lock all state, but that semaphore is dropped before we encode and save the reply. The likelihood of this race should be mitigated by the use of a reliable transport, but this is still a bug.