To address: Current kernel code does not depend on knowing table of all exported filesystems. Why is this? Higher-level description of export code (svc_caches/ stuff is low-level and out of date) Decide on pseudo-export design. export requirements: Must be able to service readdir on pseudofilesystem (and corresponding MNT commands to list exports), so we need a static list of paths somewhere, either as an explicit data structure or as a filesystem namespace somewhere. Would like to be able to start serving filesystems before they are all on-line; especially in the case of a server with thousands of exports and lots of disks, mounting all those filesystems, replaying journals, spinning up disks, etc. may take a long time. Note currently mountd actually tries to stat *everything* before answering a single fsid->path request. People sometimes want to be able to unmount exported filesystems. Currently you can exportfs -f, then unmount/remount. From Neil Brown: On Wednesday March 7, bfields@fieldses.org wrote: > > So could you remind me what the uses cases are here? Who is it that > requires demand loading, and why? Partly it is the principle that demand-based configuration is more flexible. Witness the various efforts to replace rc.d scripts with something event/demand based. The IP->clientname table must be demand loaded because you obviously cannot know all needed IP addresses in advance. (The rmtab experience proves that) The clientname+path->export-options table must be demand loaded because - depending a bit of how you choose client names and how complicated /etc/exports is - you either don't know all client names in advance, or computing them all is complex and wasteful. The fsid->path table could possible be made 'static', but I think demand-loading is still best. There are multiple possible fsids for some filesystems, and telling the kernel about all of them when only one will be used seems wasteful. And the filesystems may not all be available when you try to create the static table. You could update the table at every mount, but with demand-loading, you don't have to. Imagine having hundreds of filesystems on some sort of library (a CD library?) where each can be identified by a UUID which gets stored in the fsid in the filehandle. Imagine a simple extension to mountd so that a call-out were made when an unknown filehandle arrived. This callout could mount the required filesystem and export it. Maybe the library only allows 3 filesystems to be mounted at a time, so it would unmount the lease-recently-used one. From Olaf: On Thursday 08 March 2007 06:14, J. Bruce Fields wrote: > Maybe. Is this practical? Do we know of any cases of users doing > this? > Do you block forever if you try to access 4 filesystems at once? I > dunno.... IIRC SGI had a storage appliance a while back that included a tape robot, but it was hiding the details somewhere deep inside XFS. I remember seeing patches involving nfsd and dmapi (I can see you cringe, Christoph :-) Note that in real-life scenarios, we're sometimes talking about literally thousands of exported file systems. My previous employer has a customer with such a setup, using NetApp filers. We had some trouble getting the Linux client to survive in this environment, as it ran out of privileged ports way too quickly. Absurd as it may sound, this kind of setup seems to be the trend. Now think about handling a system with several thousand exported file systems on the server side - if you need to look at each file system before nfsd is ready to service requests, we're talking of a considerable delay in boot time. In the worst case we're talking about several thousand *disks* that need to be spun up, and fuses going pop-pop-pop. Short summary - if you want to scale beyond small work group servers, you need something that scales well. Demand loading the exports table does. From Olaf Kirch: I asked: > So why does demand-loading scale better? Is the worry just the kernel > memory required to store the export table for thousands of mostly > inactive exports? It means you can start serving files without having to wait for all file systems to be mounted (and having their journals replayed, etc). All you need is a way for mountd to figure out whether a file system is there already (so we can push the rootfh into the kernel) or whether it's not (so nfsd can return EJUKEBOX or defer the request) From Neil Brown: You've got to put that v4 pseudo root somewhere... It just needs cleverness in nfs-utils to auto-bind-mount things into the pseudoroot... but I guess people cannot magically unmount things then. How about this. We add an export option "follow-symlinks" so that when nfsd is asked to stat a symlink it does a 'stat' instead of an 'lstat' (effectively). Then we get mountd to make a tmpfs in /var/lib/nfs/pseudoroot which contains directories and symlinks to the various export points names in etab. This tmpfs is exported as fsid=0,follow-symlinks. Problem solved? Ofcourse if different clients get to see different exports, then we might need multiple tmpfs's in /var/lib/nfs/pseudoroot/$CLIENT/ ....