White dot for spacing only
The Dice Project


Task:User web space
Group:Neil, Julieta
Stage:1


Description

Planning how user web space will be provided in the informatics domain.

Report changes

2002-02-18
Slight reorganisation of initial report. The Issues have now become the Options.
2002-03-03
Added Summary section.
Added legacy username mapping issue.
Added issue of what server modules/functionality to include.
Decisions section.
Update actions.
2002-03-15
Added note on how the user web pages may be exported and the automounter maps may work, in the Options, Physical location of files section.
Moved Actions and Timescales into the Stage 1 progress document.
Updated decision on URL
2002-03-21
Updated decision on URL, inf rather than informatics.
Updated note on automounter map, in the Options, Physical location of files section.
2002-04-15
Added Technical Details, section to cover implementation details. The first contents are on possible disk space requirements.
2002-04-26
Added specification of hardware.

Summary

We need to provide a the ability for users to publish their own web pages within the new DICE environment. Ideally this task would only consider the DICE world, but we will also consider legacy world issues.

This task needs to decide how user web pages are presented to the outside world, and how the technology behind the scenes achieves the result. We have assumed that Apache will be used as the http server, as we have significant expertise amongst the computing staff regarding this application.

Issues

Users will need to be able to publish their personal HTML within the new DICE environment. There are various things to consider in implementing this:

Options/Assessment

Physical location of user files
Where will these pages be served from:
  1. The user's home directory, like ~<user>/public_html/ at AI and CogSci (web server needs to mount home dirs of all users). Going this route means that only one disk quota is required per user, but possibly exposes the user's entire filespace to the web.
  2. Some central partition, like /public/<user>/web/ at CS, perhaps just /webpages/<user>/ (all clients need to be able to mount this). This also means an additional disk quota to manage per user. We could also ensure that the web server can not mount user home directories, so that there is some protection from faulty scripts, links etc exposing the entire user file space.

    Note If we go this route, then we need to consider how the automounter maps are generated. If all user web pages are held on a single partition on the web server, then this is fairly straight forward, /webpages/<user>/ would found at userwebhost:/disk/home/webpages/<user>/. However, if we think that at some point we won't be able/want to hold all user web pages on a single partition, then something more elaborate needs to be considered. Probably stored in the Person LDAP entry like their home directory eg:

       dn: uid=neilb, ou=People, dc=inf,dc=ed,dc=ac,dc=uk
       gecos: Neil Brown
       ...
       homePartition: cn=u8, ou=Partitions, dc=inf,dc=ed,dc=ac,dc=uk
       webPartition: cn=web1, ou=Partitions, dc=inf,dc=ed,dc=ac,dc=uk
       
    Where web1 would be defined in the Partitions unit has being a particular exported filesystem from the web user pages host, ie the same from as u8.

    Update 21/3/2002 Following the last COs meeting, it's been decided that users will get to their web space via the /public/<user>/web/ file system path, and their CGI script via /public/<user>/cgi/. To keep things simple it has also been decided to go with one large (possibly logical) partition, so that the automounter map can be generated simply. It also has performance benefits as the automounter is not required to locate the users web space on the actual web server. We will always know that the files are in a single (logical) partition.

There are other options for accessing option 2 above, a CVS interface to their web pages, as there is for the Division web server www.informatics.ed.ac.uk. Perhaps even a web/ftp interface like Yahoo!, Freeserve etc. provide. However, users are likely to see this as a step backwards, and timescales probably mean that mounting /public/<user>/web/ will be easier and more popular

URL
We can either host user web pages on the existing www.informatics web server, or have an additional web server just for user web pages.
  1. If we host personal web pages on www.informatics.ed.ac.uk, then we have to make sure that the users cannot affect the performance of this machine, it is our official presence on web. We don't want it being compromised by user CGI scripts, or users presenting personal pages as official pages. Things like user CGI scripts, and PHP scripting are not available on www.inf at present, and currently all web pages must validate as compliant HTML and are managed via CVS. These restrictions/limitation could be over come if we wanted to, and were to host user pages on www.informatics.
  2. We could follow the EUCS route and provide home pages on a URL like http://homepages.informatics.ed.ac.uk/~user/, see this paper. This would allow external users to easily identify personal web pages, from official Division web pages. We can then run this service on a separate machine, and configure apache as needs be. Should a user script/page manage to bring the server to its knees, www.informatics.ed.ac.uk will still be available for official pages. If necessary, we could allow URLs of the form www.informatics.ed.ac.uk/~user/ to be redirected to homepages.informatics.ed.ac.uk/~user/.
Legacy considerations
What should happen to legacy user web pages? It would seems unreasonable for them just to cease to exist and have all all requests for www.legacy.ed.ac.uk/~user/ automatically redirected to www.inf.ed.ac.uk/~user/. However this would be the easiest from our point of view, as we try to deprecate the use of www.legacy.ed.ac.uk/~user/.

We could try to make it "readonly" so that the users can no longer maintain it, and force them to move to new service. But then how to the users update the legacy pages saying "go look here now". Would it even be technically feasible?

Teaching material in staff's legacy web space. Currently there is a substantial amount of teaching material held in staff personal pages. We don't want to perpetuate this, as all teaching material should be migrated to the official www.informatics web service.

Username mapping. If a legacy site decides to unify, or remove their legacy password file, then this will impact on www.legacy.ed.ac.uk/~<user>. Currently only Computer Science is considering unifying usernames within their legacy domain. For example, all students, other than this years intake, who's username is currently of the form 'nrb', has a personal web page available as www.dcs.ed.ac.uk/~nrb/, if their usernames are unified to the form 's0123456' in the DCS legacy world, then their web page will also change to www.dcs.ed.ac.uk/~s0123456/. This may not be an issue for undergrads, but may be for final year PhDs. Perhaps aliases or redirects could be put in so that both old and new usernames will work.

If a legacy site decides to remove all user accounts from the legacy password file, then continuing to provide www.legacy.ed.ac.uk/~user/ will depend on how a site implements the Apache UserDir directive.

To encourage people to migrate their pages to the new service, perhaps when people browse to a legacy URL, some form of nuisance-ware could be employed to pop up a message saying that the page should be moved, and encourage the browser to mail the owner to move the page. How practical it would be to implement this and the political ramifications, problably means this is a non-starter.

Apache Functionality
Currently the three legacy sites provide/have different features to users, for example: whether users can run their own CGI scripts, PHP, SSIs, HTTPS. We need to decide which features we will support/provide. If we don't provide PHP for example, then users using this on a legacy service may be reluctant to move to the new service. Obviously security will be an issue here.
Do we need to?
For the sake of completeness, do we need/want to provided personal web space, where there are already services like Yahoo! and Geocities that provide personal web space for free. From a students point of view, this could be beneficial, as when they leave the Division their web pages will continue on these free services, rather than being removed when they leave the Division.

Decisions

Location of files
Hosting user web pages on the actual web server will give us performance benefits, some resilience to network failures and perhaps most importantly, some extra security (especially if the server isn't allowed to mount user home directories, though this may cause computing staff problems when maintaining the server). This user web space must be exported and mountable by all DICE client machines. How this appears to the user is still up for decision, but something like CS's /public/<user>/web/ would suffice.

We need to consider the implications of how the automounter map will work. At CS all user web pages are on a single partition on the web server, which maps map generation simple. If this were to change to multiple partitions, then some thing more flexible is needed. (See the note above).

Providing a CVS, FTP or Web interface to this area would be more effort to implement and maintain, and would probably be seen as a step backwards by the users. Though there is an extra management effort to maintain a users home directory, as well as this extra web space, it does allow us to see all the users that have web pages at once.

URL
Again for security, and not to compromise the performance of the existing www.informatics.ed.ac.uk, a separate web server for user web pages is desirable. After some discussion on the cos@inf mailing list, it was decided to follow the EUCS's lead regarding the URL. This means the official URL that users should publicise is: http://homepages.inf.ed.ac.uk/user/. However various variations of this will also be accepted. Namely:

These will be implemented via redirects so that browser will display the official URL despite having perhaps been initial directed to one of the alternatives.

Again, similarily with what the EUCS provide. URLs of the form www.informatics.ed.ac.uk/~user/ will be redirected to offical homepages.inf.ed.ac.uk/user/. We wouldn't want to encourage this form of URL, but staff may prefer to use this "more professional" looking form.

Legacy
Legacy user web servers will probably continue as they currently do, though users will be encouraged to migrate to the new service. How this "encouragement" will be achieved is yet to be decided. After a certain length of time, a drastic solution would be to say that all www.legacy.ed.ac.uk/~user/ pages will be automatically redirected to homepages.inf.ed.ac.uk/~user.
Apache
We need to provide SSIs and user CGI scripts as a minimum. By locating the user web pages and scripts on the web server, we can control whether all, or some users, are allowed to run CGI scripts.

Extra functionality like PHP, mod_perl, HTTPS are still up for discussion. The likelihood is that PHP (which version) and HTTPS will be provided.

Actions and Timescales

The actions and timescales sections have moved and been renamed into the Stage 1 progress document.

Dependencies


Technical Details

The report ends here. What follows are various technical bits and pieces that need to be noted somewhere, so we don't forget!

Disk Space

We need to estimate how much disk space will be required to host all users personal web pages. There has not been a decisiion on the actual web quota that certain classes of user will get, and the following numbers of users in each class are rough estimates, but it will get the ball rolling:
Groups of UserNumber in that groupPossible Web Quota (MB)Possible Web Quota (MB)Possible Web Quota(MB)
UG1300215
UG22002110
UG317010520
UG412010540
MSc80201040
PhD180201050
Staff2605050100
Total (MB)22,10017,55049,900

Assuming that these figures are in the right ballpark, then a 50GB disk would be large enough, but something nearer 100GB would allow us to be more flexible.

A quick trawl of /home/<user>/public_html on AI and CogSci machines and a look at /public/ at finds the following space being used:

Giving a rough total of approximately 12GB of user web files currently in the Division.

Hardware Specification

A reasonable hardware specification for the user web server would be a minimum of: Some of the details are still required, like IDE or SCSI disk, 100Mbit or 1Gbit network interface, and hopefully someone who knows more about this sort of stuff will be better placed to complete the spec. A quick look at the CS and Informatics web server network traffic, seems to show that 100Mbit network interface be a sufficient, as www.inf averages about 2Mbps usage, ignoring backup traffic.


 : Deploy 

Mini Informatics Logo - Link to Main Informatics Page
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh
Spacing Line