|
| Task: | Network |
| Group: | gdmr,cms,jmho |
| Stage: | 1 |
2002/03/20 16:34:05
Meantime it's important not to assume that machines are necessarily installed in their final network location!
KB is reasonably OK. There are enough free switch ports for up to 40 new 100baseTX connections, though there is little room for expanding the number of Gbit-attached machines. This may need to be addressed in the medium term.
FH might need another card in the server-room switch, but apart from that should be OK on the 100baseTX front. Again, Gbit-connected machines may need to be considered later.
SB hasn't quite had all its switches fully deployed yet. We'll need to keep an eye on the server-area 100baseTX count. Again, Gbit-connected machines may need to be considered later.
BP needs some work to bring things fully up to spec. Once that's done there should be sufficient network space for full DICE deployment. There is some concern about the amount of physical space available in the machine room.
The KB EdLAN link has been upgraded to 1000baseSX. We're waiting to hear back from EUCS on the best way to upgrade FH, SB, BP -- they would have to be 1000baseLX over SM, but it's not clear yet if there are fibres in place and where they should be terminated.
Moving to OSPF would make some things possible which aren't with RIP, and some other things easier. In particular, RIP(v1) assumes idential subnet masks; and feeding the same subnets into EdLAN using RIP from several places interacts badly with its internal routing. On the other hand, OSPF does make some things, notably route filtering, more complicated. Adopting OSPF would have to be coordinated with EUCS.
Emergency "JCB-proofing" links to a site could be wireless, MegaStream, ISDN, modem or whatever happened to be available at the time. The University phones system doesn't offer ISDN. BT's ISDN2e lines are roughly 300 pounds each installation charge, and 100 pounds quarterly rental; other suppliers might be around too.
For KB it is also intended to redeploy the former EdLAN link components now that the main link has been upgraded to Gbit, as the existing CS-EE 10base2+fibre IP-level backup connection will cease to be on March 25th, resulting in a 100Mb ether-level connection under the control of Spanning Tree Protocol. This will provide some redundancy against switch faults at KB without the firewalling complications of the current setup.
There are several big problems with running lots of subnets between the sites:
Keeping distinct subnets at each site solves those problems, but pushes all the inter-site traffic through the "front-door" filtering routers. Those would therefore need to be fast. There would also have to be at least a second router at each site for redundancy, though it could probably be smaller. Options: iptables on Linux; ipfilter on big Suns; Cisco or equivalent, as EE are expecting to use for the new DEE network. We really don't like having dependencies on expensive one-off boxes, which indicates against pure switch/router solutions. IPfilter is mature and proven. Suns are more expensive than PCs, though, but Ultra-5s or Blade-100s with GigaSwift cards might suffice. IPtables is still very definitely under development. On the other hand, the routing architecture for this design would be straightforward, and could use RIP (or perhaps RIPv2) throughout. There would be some load-sharing, though appropriate kernel support in the hosts being routed would be required in order to achieve maximal effect.
There's a diagram of what this would look like below. The semi-circles represent (one or more parallel) routers. Note that Appleton Tower would continue to run as a satellite site -- for simplicity the existing Sun routers at AT would be relocated or retired, and there would then be no inter-VLAN routing done there at all (hence there's no router shown there in the diagram). However, it makes a lot of sense to run the lab as a satellite of SB rather than of KB, particularly if SB's upgraded Gbit connection were to be terminated at AT rather than OC, so that's what's shown.
One thing to be aware of with this approach (though one we do have at present with the "wireless" wire) is that any subnet which were to be shared across sites for whatever reason could only be routed at one site, and so all machines on that subnet would appear to be part of that routing site. For low-traffic or "unsupported" subnets that might not be a problem.
Bear in mind that this is the logical diagram. Physically the connections to the transit subnet run as VLANs over the same pieces of fibre as sites' external connections, so this arrangement doesn't give any additional redundancy against external connectivity problems. That would have to come through completely separate connections.
Routing and filtering are both more complicated under this scheme but there are several compensatory benefits:
Note that subnets shared across sites still present some problems, though of a different nature. By using OSPF we would be able to advertise routes through more than one of our external routers. Against that, the route that any packet took to a host on a shared subnet would not necessarily be optimal, so we would not want there to be much external traffic to such hosts (the transit routers don't count here, as there should be very little external traffic sent directly to them, if any). Also, partitioning of the VLAN at EdLAN would also result in some hosts on the subnet being unable to communicate with the others.
This option has its attractions, but we would need to liaise with EUCS to make it work. Initial discussions have been promising, and several implementation strategies appear possible. Which one is adopted would depend mainly on how they interacted with the EdLAN routers, as they all involve roughly the same amount of work at our end. Specifically, do we run in one OSPF area spanning all the sites or several; and do we share a common OSPF incarnation in the EdLAN routers with the rest of EdLAN, or do they run a separate incarnation for us? There may also be implications for aggregation and stubbiness, and for route filtering, but these shouldn't cause major problems either way.
We are aiming towards this scheme, though with the option of adopting the second if either the costs turn out such that its simplicity wins or the necessary routing framework can not be set up jointly with EUCS.
Clearly the transit subnet model proposed above implies that the filter rulesets should be unified, given that packets for any site can, in principle, arrive through any other. Complete unification would not be necessary if the distinct subnets model were adopted, as there would be no point in a site's filter rules admitting all traffic destined for the other sites if those other sites' machines would not normally be accessible through the site (though JCB-proof paths might require that at least some other-site rules be incorporated). Of course, all sites' filter rules would accept all traffic originating from the other sites.
At present KB (and AT) filters are generated by combining rule files and executable scripts, each of which is designed to incorporate the rules necessary for a particular filtering task; while other sites are using more conventional static configuration files. Neither of these mechanisms would be suitable as-is: KB's generated rulesets are quite site-specific, while static files appear to be too inflexible. This area will require more investigation, though one possible approach would be to extend the KB mechanism to produce meta-ruleset scripts which would be suitable for execution at each site to generate the eventual site-specific rulesets.
A complicating factor is that the best choice for perimeter filter software appears to be IPfilter, which does not run on current DICE platforms. Perhaps generating the meta-ruleset scripts on DICE machines for subsequent execution on the filter machines would suffice. This requires more thought.
Whatever network model and filtering software is used, the potential for asymmetric routing paths makes it desirable that the rulesets used be state-free in general. This is unfortunate, as stateful rulesets can be somewhat easier to write, but the alternative risks connections being broken as the underlying routing shifts. However some statefulness would certainly be beneficial for transitory connections such as DNS queries and xdm's chooser mechanism. (The alpha-test versions of IPfilter 4.0 contain state-synchronisation code, so eventually all the perimeter filters could perhaps share state. This is still some way off however!)
The initial proposal would be to create a unified ruleset based on the union of the existing ones, with site-specific hooks if the distinct subnets model is adopted. This would then be reviewed and adjusted before deployment, and could of course be altered in the light of experience.
On the assumption that legacy machines would be brought inside the DICE perimeter, the unified ruleset would also have to take account of legacy machine requirements.
In addition to establishing a sound perimeter, it would certainly be desirable for some internal machines to apply their own additional network filtering. Some additional internal firewalling of groups of machines might also be required; how this would be implemented remains to be considered.
Note also that there are other network-based access control mechanisms in place. Two which will certainly have to be reviewed are: the NFS share permissions, as it is the intention that all filesystems should be mountable everywhere; and TCP-wrapper rules, so as to ensure that consistent rules are applied across all sites.
It is proposed in the first instance that the existing "makeDNS" program, as used for .dcs and fairly extensively around the rest of the University, be used to generate the DNS zone files. This utility transforms the well-understood /etc/hosts format into files suitable for feeding to a DNS master. It does have some limitations, and is certainly in need of an overhaul, but it should serve well enough for the first phase of the project. It is assumed that some form of remote file editing mechanism will be available to simplify the process, at least from its user's point of view. Note however that the suggested use of a hosts-format source file from which the DNS zones are generated does not imply any commitment to making the information available in such a format outside the DNS system itself.
The .inf space is currently managed jointly with .dcs; this should be split apart as soon as practicable.
None of the Informatics zones is signed at present. Doing so is not a stage-1 task, but should be considered again early in stage 2.
The existing dns object has been upgraded for bind9 and ported into DICE using "minimal conversion". Full ngeneric conversion has yet to be done, one of the problems being maintaining backward compatibility with the KB legacy Suns which also use the same code to configure their DNS.
DNS service is a cheap operation for modern machines, and there is really little reason not to run slave nameservers on all DICE systems. In any case, full end-to-end DNSsec implies that response signatures be checked as close to the calling application as possible. Dropping Hesiod will reduce the size of the zones carried considerably; and some optimisation in terms of the reverse zones would be possible though perhaps hardly worthwhile.
There are also advantages for establishing central servers at each site to carry all the DICE and legacy zones, and to act as caches through which most other machines would be configured to forward external queries. This might be done on the sites' network infrastructure machines.
At least two legacy names, dns.dcs.ed.ac.uk and dns2.dcs.ed.ac.uk, are widely known and will require to be perpetuated more-or-less indefinitely. MX and other backward-compatibility records will also be required for .dcs, .dai and .cogsci, and probably other legacy domains too. These domains could, of course, be served from the DICE nameservers; there's no particular reason to set up anything separate. However, the widely-known addresses for these machines are on external subnets, and at least in the first instance only Suns are likely to have sufficient protection available to them.
In the longer term there is the question of whether addresses are a property of the network, being given by it to machines, or whether they do actually belong to the machines and so come from lcfg. This question is likely to result in considerable debate, and isn't addressed further here! One argument in favour of the former viewpoint is that layering in networks is a good thing from the understandability and maintainability point of view.
Perhaps this is a stage-2 task, but if so it's one that's worth investigating early on.
Local network monitoring will also be performed, as is currently done, with common index pages pointing out to generated site-specific pages.
KB switches are currently managed using my package.
SB and FH switches are managed using my package, but need to be split apart. Some of the link names could do with being a little more descriptive.
BP isn't managed using any package yet; but that'll come when the network there is overhauled.
KB, FH and SB are all currently monitored from KB, but will move to being site-local once the switch configurations are moved to DICE.
The existing time synchronisation network has three stratum-2 servers at KB, with all the other machines using them as time sources. (Serving NTP is a lightweight operation, as the daemons ramp the interval between queries up to several tens of minutes once everything has stabilised, so this doesn't cause any load problem.)
For the initial DICE deployment it is proposed to keep this existing setup. Once the dust has settled, the intention is to (logically) disperse the S2 servers across the Division sites for robustness, possibly also adding a fourth.
Two other aspects of our NTP net are adequate for the purpose for now but should be revisited again later:
The "wireless" network is somewhat anomalous, as it currently exists across all four sites but is routed only at KB. As the use of a VPN endpoint is to be considered in stage 2, which will cover the wireless network as well as external access, the existing setup will be retained for now.
The fall-back position, should the transit network not be possible for some reason, is to put all the traffic through the main "external" routers. If necessary, faster hardware might have to be thrown in, though this could be decided later in the light of experience.
|
Please contact us with any
comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh |
|