White dot for spacing only
The Dice Project


Task: Common home directory issues
Group:roger
Stage:1


Description

It has been agreed that users will have a single common home directory. The process required for this needs to be defined clearly.

"Creating common home directories can be seen as a completely self-contained and separate task, which can be tackled and completed well before the main DICE tasks. We can and should try to keep it separate from DICE as much as we can."

Note that other reports may also consider home directory issues, for example the Default and Group Environments report deals with start-up files, and the Filesystem Maps report is concerned with home directory location and mounting.

Report Changes

2002-02-14
Description of NFS vs copy options, and decision.
Description of /legacy
Inclusion of "problem areas" appendix
Other minor tweaks
2002-02-19
(Fairly major re-structuring)
Removed "move (copy) option" for unified home directories
Added "Creation of a single login directory" section
Added "How to mount /legacy filesystems" sub-section
Added exporting and access to home directories section
Added "Outstanding" actions sub-section
To skip the background & historical stuff, start reading here.

Issues

Although home directories within DICE (which it is assumed will still be accessible under /home) will be common across all sites, they will still need to be hosted at a particular site. This means that a user's current home directory will - in most cases - be identical with their DICE home directory. However, some users have more than one home directory, (at one or more sites) - how are these to be amalgamated?

Filesystem structures

Two proposals have been made for the possible locations of primary and secondary home directories - the "all-in-one", and the "sibling".

The first (the "all-in-one" option) is to locate all "secondary" home directories in the "primary" home directory, one level down and named after the secondary site. Thus, if a user has his or her primary home directory at CogSci, with secondary home directories at AI and CS, the amalgamated home directory might look something like:

        /home/<user>               - local home directory stuff
	/home/<user>/AI            - home directory stuff from AI
	/home/<user>/CS            - home directory stuff from CS
    
The second option (the "sibling") is to have legacy directories available in a separate hierarchy, /legacy, with the old domains side-by-side, and user directories mounted beneath. Thus, for the same example as above, the structure would look like:
        /home/<user>               - local home directory stuff
	/legacy/dai/home/<user>    - home directory stuff from AI
	/legacy/dcs/home/<user>    - home directory stuff from CS
    
In actual fact, all three user directories would be available under /legacy, but it is assumed that it would be usual for the primary home directory to be accessed via /home.

The final form of the legacy structure decided upon (dealt with in the Filesystem Maps report) is the "sibling" option, with a hierarchical structure, /legacy/domain (see below). This allows the implementation of a completely separate structure, without the need to interfere with users' home directories.

Given the above legacy structure, the (initial) implementation will be via auto-mounting home directories via NFS - the "move (copy) option" was discarded because of the sheer volume of data concerned and, given the resources and timescales involved, copying is just not an option.

NFS mounting

The auto-mount option obviates the need for additional disk space in the initial stages. Users would also need to be given guidelines about the length of time the automounts will be available, and - presumably - sufficient disk space would need to be made available at the primary home directory site (at some point) in order for the user to amalgamate his or her home directories without the need for NFS mounts (since NFS will disappear at some point anyway). Creation and dissemination of such guidelines and copying advice are not part of this task.

It is to be stressed that auto-mounting of secondary home directories would be a short-term provision, and that after a specified time this provision would be withdrawn. Arrangements for the exporting of home directories from various NFS servers within .cogsci, .dai, and .dcs domains to all of .informatics would need to be made (with consequent changes in firewalls for NFS traffic, etc).

To support NFS mounting of home directories, we need to update legacy netgroups and files/maps to export to DICE machines, and create automount maps in the DICE world to access them (the latter is the province of the Filesystem Maps report).

Given this, the task of creating a single common home directory becomes simpler - being reduced to:

Multiple directories at "primary" site

Some users have two "home" directories at the same site - notably the semi-autonomous Language Technology Group (LTG) at BP. Assuming BP was the primary site for such a user, the question of how to integrate the two "home directory" locations still needs to be dealt with. As stated above, a solution to this is not really necessary at this stage - it just means that any alternative "home" directories at a site will not be available under /legacy unless explicitly mounted.

There may be other wrinkles & gotchas at other sites. These need to be identified fairly early on in the process.

No obvious "primary" site

It may also be the case that students' filespace may span two or more sites, with significant usage at each site (for example, joint AI/CS students, who have a 50/50 split). In these cases, it is not always obvious which location is (should be) the primary one, and some decision needs to be taken about where the primary home directory will be.

This decision will be a unilateral one, taken by computing staff - students don't get to decide (it is assumed that a student's home directory will be deemed to be the one on the machines at the site at which he or she is based - using the office location of the student's Director of Studies if there is any uncertainty).

It is possible that a student's working patterns may change if they can log on anywhere and get the same filespace & environment. Will network performance be an issue here? (If so, we might need to consider moving some students' home directories). After consultation, it would appear that a comparison with the running of the CS1 lab in Appleton Tower (with no local servers at all, i.e. everything sited at KB) suggests it shouldn't be a problem.

Creation of a single login directory

This requires the nomination of a primary home directory for all users with home directories at more than one site, and then exporting this directory so that it is available as /home/user on all machines on which the user currently has an account.

Initially (pre-DICE), this will only affect the above-mentioned users (those with home directories at more than one site), since it is not the intention to provide login access for all users at all sites on legacy machines (this will be achieved under DICE). No additional passwd or automount map entries will be required (although some sites may choose to make additional entries) - at this stage, we are only intending to modify existing entries so that any user with multiple accounts gets the same login directory at each site at which he or she currently has an account.

In most cases, the primary home directory will be obvious (it will be the only home directory, and so no action will be required), but each site will need to check its local users to establish this (and to liaise with other sites to make sure that any given user has the same home directory everywhere). This may involve comparison of passwd files to identify multiple accounts, or possibly this information can be extracted from the Divisional database.

Further investigation revealed that account information held in the database can be used for this - taking the UID as the key, it is possible to extract all entries where the UID exists in two or more of the .dai, .dcs and .cogsci domains. A first pass at this returns 1160 accounts (including pseudo-user accounts).

Pseudo-users and admin accounts will not be included in the single login directory scheme (this may involve manual deletions from any automatically generated lists).

Nomination Of A Primary Home Directory
It is possible to extract user information from the database so as to identify users with multiple accounts. This information can then be used to contact each user and request a primary home directory nomination. This can be done in a variety of ways:

It was finally decided to opt for the last item, and lists have been published for multi-homed users at each site. These can be viewed as follows:

Multi-homed accounts at FH/SB
Multi-homed accounts at BP
Multi-homed accounts at KB

Users wishing to comment on, or request changes to, the nominated home directory should contact their local support person.

Incoming
It will be the responsibility of each site to modify local home directory automount maps so that any local users with remote primary directories mount the correct home directory locally. It will not be the responsibility of such sites (secondary login sites) to make sure that any home directories so mounted integrate into the local environment (logins at secondary sites are "unsupported"). [Note: the use of "secondary login site" applies per user - a server may be the primary login site for one user, and a secondary login site for someone else.]

This task is merely concerned with user access to this common home directory, and does not address the issues of day-to-day use, such as mail delivery & forwarding, user web space, environment, etc (these issues are being dealt with by other tasks).

It will also be necessary for each site to create local auto_legacy maps to make home directories available under /legacy (see "/legacy" below).

Outgoing
In order to set up single login directories and the /legacy hierarchy, arrangements will need to be made for each site to export all home directory partitions to all other sites within Informatics, subject to various constraints and caveats - see "Who gets home directories?" below.

/legacy

During the home directories unification period, users will need access to secondary home directories from legacy (and DICE) machines, as well as primary home directories, in order to have access to all their files.

CS-based user AI-based user CG-based user
CS files /home/<user> /legacy/dcs/home/<user> /legacy/dcs/home/<user>
AI files /legacy/dai/home/<user> /hame/<user> (legacy)
/home/<user> (DICE)
/legacy/dai/home/<user>
CG files /legacy/cogsci/home/<user> /legacy/cogsci/home/<user> /home/<user>
/legacy/cogsci/projecs/ltg/users/<user>

The table above shows how home and legacy files would appear in the DICE & legacy worlds (the view would be identical but for legacy AI machines). The table also shows an example of the un-unified "home" directories for LTG users at BP, and the retention of /hame at AI.

Since it is quite likely that users at AI will have included /hame in scripts, etc, it was thought necessary to retain this form for primary login directories in the AI legacy world. However, this will not be the case under /legacy, as this will be a new hierarchy, and we do not want references to old naming conventions (such as /hame) in the DICE world.

How to mount /legacy filesystems?
The support of /legacy will require the creation of automount maps for each legacy site (DICE maps are a separate task). If we are to make home directories accessible as, for example, /legacy/cogsci/home/<user>, then we have two options, namely:

The structure of /legacy will be decided elsewhere (it seems likely that, under DICE, the /legacy stuff will be implemented as mounts under /amd/legacy with links to /legacy/domain/home, etc).

This task is to co-ordinate the implementation of a mirror structure in the legacy world (or as close an implementation as is practically possible, just for the sake of consistency). If the structure is agreed, we don't really need to worry about how each site implements this. What we should aim for is common information from which to generate both DICE and legacy automount maps, to minimise possible synchronisation errors (which means we need methods to access this information, and tools to generate new maps from it).

Each site needs to create maps for legacy machines, such as:

(or whatever names are agreed upon). Similar maps should also exist in the DICE world. There will also need to be an update procedure for the commonly-held mounting information.

Who gets home directories?

It is probably too simplistic to ask all sites to export all user home directories to all domains (.cogsci, .dai, & .dcs). Consequently we need to decide how we control NFS exporting and mounting. There are four issues here:

At what level are home directories exported?
There is - as yet - no consensus about this, although most sites would appear to export at the partition level. It seems likely that we will continue with this arrangement, expanding exports to other sites as necessary.
How are exports controlled?
It is not a good idea to export all home directories to all machines (self-managed machines should certainly be excluded, as their security status cannot be guaranteed). Therefore, lists of approved hosts or subnets will need to be created for each site.

It appears that CS can easily generate a list of local centrally-managed machines to export to ("They just fall naturally out of the LCFG files" via 'info.groups cs-linux'), and this list can be used by the other sites. Both CogSci and AI need to generate similar lists. Each site will need to define what local machines it considers "secure" (and create a method to generate a list of such machines). The resulting lists can then be used to export home directories to all machines. For example, at BP we could use the fact that all Suns have an inventory name and are connected to local subnets, and are all centrally managed:

    % niscat hosts | egrep "U[15]|B1" | awk '{print $3,$1}' | \
      egrep '^129.215.(110|144|165|174|197)' 
    
Where are home directories exported to?
The current model is that all staff should be able to log on to all machines (except servers and self-managed machines). Student logins will be restricted (probably just to managed student machines), but student home directories may be available everywhere (so staff can check files, etc?).

If all home directories should be available on all (trusted) machines, then there needs to be only one group of machines to export to, the group of said trusted machines (however defined). If, however, different categories of user have to have their home directories exported to different groups of machines, other lists will have to be generated. For example, we might need lists for:

...which implies that staff and student directories should be on separate partitions (unless home directories are exported at the user, rather than filesystem, level)? Or do we allow selective mounting on the client side to restrict access?

How are home directories mounted?
In both the /home and /legacy cases (on legacy machines), directories can be mounted either directly in the desired location (/home/<user> or /legacy/<domain>/home/<user>), or indirectly via some "hidden" mount-point and symbolic links. No final decision has been taken about which is the preferred option.

Notes & Queries

For potentially problematic examples, see Neil & Co's list of potentially problematic situations

It is assumed that start-up files in any secondary home directories will effectively become redundant. Although logins will be preserved at "secondary" sites (users will get the new home directory), they will be informed that such accounts are "unsupported" (since start-up files may not be compatible with that site) - which is less of a workload than explicitly identifying and disabling all such accounts.

Other issues relating to home directories, which are not covered by this report, are:

New (DICE) users will not, by default, have an accessible legacy account - such an account would have to be specifically enabled (although the home directory and passwd file entry may already be present on a legacy system). How the legacy account is used (if at all) is a site-specific issue, and the site administrators may decide that it is better to leave the passwd file entry out altogether (since it is, effectively, only a cosmetic benefit - easier for COs to map UID to login name).

Whilst it will still be possible to create accounts for existing users on other legacy Suns (at other sites), it is not assumed that this will be a requirement of DICE integration or site-specific legacy support.

Summary

There should need to be little change with respect to provision of home directories on primary legacy machines, as this will essentially be the status quo (data at secondary sites may need to change, as already mentioned).

A primary home directory needs to be identified, and secondary home directories also made available, both achieved via NFS exporting & local automounting (although using different mount points). The physical integration will be achieved at a later stage (before NFS disappears).

File-sharing (NFS exporting) across all sites has to be established to enable access to a user's home directory from any location - both for primary and secondary home directories. This has firewall & security implications. It is hoped that this mechanism can be tested by setting up the /legacy structure alongside (and without affecting) existing access methods.

Timescales

It is hoped that any amalgamation of home directories can be done as a distinct sub-task, prior to their inclusion into the DICE world (the latter is not a part of the former, but follows on as a separate stage).

Phase 1 would be the collection and collation of primary & secondary home-directory information. This might take 1-2 weeks, allowing for (prompt) user (staff) feedback. This is already under way.

Phase 2 would be the creation of the /legacy structure and associated automount maps - this is a completely self-contained task. A test structure - with maps - is already in place at CogSci, but further work needs to be done to fully implement this.

Phase 3 would be to arrange for NFS exporting of home directory partitions to relevant netgroups (these netgroups need to be identified or constructed). About 1 week.

Phase 4 would be the creation of new auto_home maps. 1 week? (Some of this may be done in conjunction with the first two phases, but is more likely to be the responsibility of the Filesystem Maps group).

Dependencies

Amalgamation of (multiple) home directories on same-site legacy machines.

Provision of automount maps for DICE machines, including additional maps for secondary mounts. (Possibly another task?)

Update mechanism for NIS/NIS+ on legacy machines to enable NFS exporting to new DICE machines.

"Creating common home directories can be seen as a completely self-contained and separate task, which can be tackled and completed well before the main DICE tasks. We can and should try to keep it separate from DICE as much as we can."

Actions

Completed

2002-02-05
Re-statement of situation & re-structuring of requirements following "home directories via NFS" decision

Current

2002-02-19
Creation of skeleton (test) /legacy structure and associated maps at at least one site.
Check local users at each site to identify multi-sited accounts.

Outstanding


Appendix:
Neil & Co's potentially problematic situations

Example 1

(Neil)

User X sitting at a legacy machine wants to access files in user Y's file space. If Y is a new, purely Inf only, account (with no legacy login), then /home/Y/ (or ~Y/) isn't going to work on the legacy machine. The user will have to log into an Inf machine.

If Y is an existing legacy user, say a member of staff who keeps practical files in their home directory and has published docs to students saying "Copy the prac 1 files from ~Y/prac1/part1.tex and modify ..." then either that member of staff will have to change their docs and place these files in some suitable place available to legacy and Inf machines, or the student will have to know that depending on whether they are using an Inf or legacy machine then they may have replace ~Y with something else, perhaps /home/<Y's new UUN>/.

Example 2

(Neil)

A new 1st year AI undergrad starts in October, he's user s6666666, he's having problems with a practical and sends a mail to his tutor (an AI PhD with mainly access to a legacy machine), saying the prolog file in his home directory isn't working as expected and asks if the tutor could have a look at it.

I think the tutor (unless we've made it clear before hand that it won't work) is going to expect to be able to do something like: "less ~s6666666/prac1.pl" and have that work. They're both sitting in SB, they are both AI/Informatics users (as far as they're concerned), I think the tutor will expect it to work without first having to login to a Inf machine.

Example 3

(Jeremy)

As soon as we go to common home directories, we will break forwarding from user's secondary accounts. A worked example:

User "fred" has an account fred@dai with a ~/.forward to fred@cogsci. When his dai account starts mounting cogsci:/home/fred, ~/.forward will no longer be there (because his primary home directory doesn't have one). So at DAI he will have to ask for an alias to be put in /etc/aliases for "fred: fred@cogsci", otherwise mail to DAI will just get delivered 'locally' (be that /var/mail, ~/.mail, etc; whatever way - that doesn't want to happen).

To make the common home directories move work, for every secondary account mail address we will have to put in a system level redirect. This can be done fairly easily using the current value of the user's .forward. This will be transparent to the user (unless they are one of the Neil-like exceptions :-) It shouldn't break anything as it is legacy-to-legacy.



 : Deploy 

Mini Informatics Logo - Link to Main Informatics Page
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh
Spacing Line