The code in net/sunrpc/cache.c and include/linux/sunrpc/cache.h is used for, among other things, caching export options and authentication credentials. The cache code provides an interface that can be used to maintain a cache (kept as a hash table), with entries that may be updated or looked up by kernel threads. In addition, it uses a /proc interface to give a way for the kernel to request update of a cache entry from userspace. Userspace can also, in addition to receiving and responding to such requests, insert entries into caches on its own initiative, flush entries out of the cache, and list the contents of caches. Userspace sees directories in /proc/net/rpc/, one named after each cache, containing the following files: "channel": A write to this file should be formatted as an ascii-encode whitespace-delimited series of tokens, terminated by a carriage return, taking the form "key expiry value\n" where expiry is in unix time, and "key" and "value" are sequences of tokens dependent on the particular cache involved. The result of such a write is to create a new entry in the corresponding cache using the provided information. For example, echo "polevault.citi.umich.edu user bfields 1074030507 13" \ >/proc/net/rpc/nfs4.nametoid/channel inserts an entry into the name-to-id cache which maps the user bfields@citi.umich.edu, on the client polevault.citi.umich.edu, to the uid 13, and let that expire at time 1074030507 (Jan. 13, 2004, 4:48pm EST). Writes of the form "key expiry\n" are also acceptable, and insert a "negative" cache entry (e.g., writing "polevault citi.umich.edu user bfields 1074030507\n" would insert an entry saying that we know of no such user.) Reads from the "channel" file block until the kernel makes a request for a new cache entry. The request it makes will have the form "key\n", where "key" is formatted in the same way as for writes. Thus a daemon can poll the channel file for reads and respond with writes to the file. "flush": writing a time to this file expires all cache entries which were last updated before the given time. So echo `date +'%s'` >/proc/net/rpc/nfs4.nametoid/channel clears out the name-to-id cache. "content": this is optional; when the file is available, it pretty-prints the contents of the file, one entry per line. What follows are rough notes on the kernel side of this. Kernel code instantiates a new cache by providing some code fragments which are passed to the DefineCacheLookup() macro, which defines a lookup function. One useful thing to know: Every item in the cache has *always* had its memory allocated by lookup, regardless of INPLACE. So when we do a lookup that creates a new entry, the entry we used to do the lookup is ours to discard. cache.c notes: meaning of CACHE_VALID: seems mainly to be used to track whether an upcall has filled in a request yet: causes an -EAGAIN return in cache check (and usually will result in an upcall being queued) causes inplace instead of swap in lookup in !INPLACE case (so the _lookup in a _parse will typically update in place). set in cache_fresh (which is called in lookup and in cache_check). extremely rough outline of cache_check: set rv to: -EAGAIN if item expired, flushed, or not cache_valid -ENOENT if negative or in above cases if cache_req not passed. 0 otherwise refresh_age = expiry_time - last_refresh; age = now - last_refresh if rv = -EAGAIN or age > refresh_age/2, set cache_pending, and, if not previously set, queue upcall. Note that a cache_put is done if the return is nonzero. Outline of lifetime of an upcall: lookup returns non-cache-valid entry w/ key info only filled in. cache_check queues upcall (if cache_req passed to it), returns -EAGAIN lookup in _parse triggers revisit using cache_fresh, which calls cache_revisit_request. For more details, see "Upcalls" and "deferral" sections below. Outline of DefineCacheLookup: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ item is what we're looking up. lock is a rw spinlock, taken for write if "set" or "new", read otherwise. find tmp in hash table maching item: !INPLACE and tmp cache_valid and set and !new: goto "otherwise", below. cache_get tmp. if set: !INPLACE and tmp cache_valid and new: insert new here, set hashed bit on new, clear on tmp, swap tmp and new (so tmp now hashed, new not) copy cache_negative bit from item to tmp, and UPDATE if not cache_negative. unlock. cache_fresh tmp (update expiry, last_refresh time, set cache_valid, run cache_revisit_request if previously invalid, clear cache_pending, calling queue_loose (removing from upcall queue) if was set before). !INPLACE and set and new, then cache_fresh new with 0 expiry. (XXX This I don't understand.) if new, cache_put new return tmp if fail to find anything: if new: insert new at head of hash chain set cache_hashed if set: copy cache_negative bit and UPDATE. unlock if set: cache_fresh return new otherwise: unlock kmalloc new cache_init new cache_get new INIT retry at first lock above. Who uses INPLACE? There are 9 caches I know of: 2 idmap caches, svc_expkey, svc_export, rsi, rsc, ip_map, and auth_domain. Only svc_export uses INPLACE. Why? Upcalls ^^^^^^^ data structures: filp->private_data is a pointer to a struct cache_reader { struct cache_queue q; int offset; /* if non-0, we have a refcnt on next request */ } struct cache_request { struct cache_queue q; struct cache_head *item; char * buf; int len; int readers; }; PDE(inode)->data is a pointer to a cache_detail, which is in part: struct cache_detail { .... struct list_head queue; struct proc_dir_entry *proc_ent; atomic_t readers; /* how many time is /chennel open */ time_t last_close; /* it no readers, when did last close */ }; where the queue above is a bunch of these: struct cache_queue { struct list_head list; int reader; /* if 0, then request */ }; The reader field tells you whether the cache_queue is embedded in a cache_request or a cache_reader. The queue is set up so that cache_readers and cache_requests are interspersed, with each cache_reader referring to the next thing that is a request. Multiple readers may read a request (which seems suboptimal). Deferral ^^^^^^^^ rqstp->rq_chandle is one of these: struct cache_req { struct cache_deferred_req *(*defer)(struct cache_req *req); }; Whenever cache_check returns -EAGAIN, it also calls net/sunrpc/cache.c:cache_defer_req(), which calls the defer method above. The defer() method is expected to take its argument, cast it to the request it's embedded in, and use that to construct a deferred request, in which it embeds a cache_deferred_req (below). Then cache_defer_req adds deferred request to fifo and hash. If at any point we have more than DFR_MAX (currently 300) deferred requests, we randomly drop this one or the oldest one, and immediately call the revisit() method with toomany=1. struct cache_deferred_req { struct list_head hash; /* on hash chain */ struct list_head recent; /* on fifo */ struct cache_head *item; /* cache item we wait on */ time_t recv_time; void *owner; /* we might need to discard all defered requests * owned by someone */ void (*revisit)(struct cache_deferred_req *req, int too_many); }; cache_revisit_request() is what is called (by means of cache_fresh()) when a lookup makes a cache item valid; there is always a lookup in the parse() method that does this when we receive a valid downcall. It looks up item in the hash, moves matches to a pending list, and then calls the revisit() method (with toomany=0). There is also a cache_clean_deferred(), which deletes all deferred requests (calling revisit() with toomany=1) whose "owner" field has a given value. This is used currently in svc_destroy() to destroy all upcalls for a given rpc server. The svcsock() code sets the defer() method to svc_defer(). This fails if the request uses more than 1 page. If the rqstp already has a deferred request associated with it (this would happen if we revisit a request and then have to defer it again, such as if a single request requires multiple upcalls), then we just use that again. Otherwise, we construct a new one, kmalloc'ing memory for the raw request data, setting owner to rqstp->rq_server, etc. The structure actually returned is a svc_deferred_req: struct svc_deferred_req { u32 prot; /* protocol (UDP or TCP) */ struct sockaddr_in addr; struct svc_sock *svsk; /* where reply must go */ struct cache_deferred_req handle; int argslen; u32 args[0]; }; The server's revisit() method is svc_revisit(), which frees the request if toomany is nonzero, but otherwise adds it to the svsk->sk_deferred list. cache item reference counting ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache_get and cache_put modify the reference count on an item. cache_put returns 1 iff the reference count has gone to zero and the CACHE_HASHED flag bit is zero, in which case the caller should free the item and related resources. This is done by the "put" method for each cache, which always has the form if (cache_put(h, cd)) { /* free item here */ } The CACHE_HASHED flag tells whether the item is currently referred to by one of the hashes; it is set in cache_lookup and cleared only in cache_clean. The cache itself does not keep a reference count on items. cache cleaning ^^^^^^^^^^^^^^ Entries are expired from the cache by cache_clean(). It uses globals current_detail and current_index. First it increments current_detail until it either falls off the list of caches or finds a cache whose "nextcheck" time we've reached. If the latter, current_index (used to index hash buckets) is set to 0 and nextcheck on the current_detail is set to 30 minutes from now. Then it searches the hash buckets in current_detail for a nonempty bucket. If it finds a nonempty bucket, it searches for a cache entry that has either expired (its expiry_time has passed) or that was last refreshed before the flush_time of the current cache. If it finds such an item, it removes it from any list of pending upcalls. If the item has a nonzero reference count, we continue and look for a different item instead. If an appropriate item is found, it is removed from this hash bucket and destroyed (by doing a cache_get, clearing CACHE_HASHED, then doing a cache_put; cache_put only destroys on the last reference count *if* CACHE_HASHED is not set.). cache_clean() uses a -1 return to indicate to the caller that it fell off the end of the list of cache_details. cache_clean() is called from do_cache_clean() every 5 jiffies, until it returns -1, at which point it waits 30 seconds. Also, cache_flush calls it until it returns -1 twice, which ensures it has run through all the caches at least once. When a process writes to one of the "flush" files, write_flush() is called, which sets flush_time on the cache in question to the provided time, sets nextcheck to the current time, and then calls cache_flush(). This ensures that all unreferenced cache entries last refreshed before the provided time are purged. (Referenced cache entries may stick around for a little while, but when someone attempts to use one, cache_check will notice that flush_time is greater than last_refresh time.) cache_flush() is also called (via cache_purge when destroying caches, and a few other places. Then there are a bunch of calls to cache_flush() from export.c, which I don't understand.