diff options
Diffstat (limited to 'docs/infrastructure.txt')
-rw-r--r-- | docs/infrastructure.txt | 110 |
1 files changed, 110 insertions, 0 deletions
diff --git a/docs/infrastructure.txt b/docs/infrastructure.txt new file mode 100644 index 0000000..dc6a57e --- /dev/null +++ b/docs/infrastructure.txt @@ -0,0 +1,110 @@ +Networks +======== + 192.168.11.0/24 (18-port IB switch): Legacy network, non-production systems including storage + 192.168.12.0/24 (12-port IB swotch): KATRIN Storage network + 192.168.13.0/24 (12-port IB switch): HPC Cloud & Computing network + 192.168.26.0/24 (Ethernet): Infrastructure network (OpenShift nodes and everything else) + 192.168.16.0/22 External IPs for testing and production + 192.168.111.0/24 (OpenVPN): Gateway to Katrin network using Master1 tunnel + 192.168.112.0/24 (OpenVPN): Gateway to Katrin network using Master2 tunnel + + 192.168.212.0/24 + 192.168.213.0/24 + 192.168.226.0/24 (Ethernet): Staging network (Virtual OpenShift and other nodes) + 192.168.216.0/22 External IPs for staging + 192.168.221.0/24 (OpenVPN): Gateway to Katrin network using staging Master1 tunnel + 192.168.222.0/24 (OpenVPN): Gateway to Katrin network using staging Master2 tunnel + +KIT resources +============= + - ipekatrin*.ipe.kit.edu Cluster nodes + - ipekatrin[1:2].ipe.kit.edu Master nodes with fixed IPs (one could be dead) + + katrin[1:2].ipe.kit.edu Virtual IPs assigned to master nodes (HA) + + kaas.kit.edu (katrin.ipe.kit.edu) DNS-based load balancer between katrin[1:2].ipe.kit.edu + + *.kaas.kit.edu (*.katrin.ipe.kit.edu) Default application domain? + - katrin.kit.edu Apache/mod_proxy pod (In DNS put CN to katrin.ipe.kit.edu) + + + openshift.ipe.kit.edu Gateway (VIPS) to staging cluster (Just one IP migrating between 2 nodes) + - *.openshift.ipe.kit.edu Default application domain for staging cluster + +Storage +======= + LVM VGs + VolGroup00 + -> LogVol*: System partitions + -> docker-pool: Docker storage + Katrin + -> Heketi PD (we reserve space, but do not configure heketi so far) + -> vg_* + -> Heketi-managed Gluster Volumes + -> Katrin (mounted at '/mnt/ands') + -> Space for manually-managed Gluster Bricks + -> Storage for Galera / Cassandra / etc.? + + Gluster Volume Types: + tmp: disitribute ? Various data which should be preserved, but not critical if lost or temporarily inaccessible (logs, etc.) [ check if we can still write if one brick is gone ] + cfg: replica=3 Small and critical data sets (configs, sources, etc.) + cache: replica+arbiter Large re-generatable data which anyway should be always available [ potentially we can use disperse to save space ] + data: replica+arbiter Very large and critical data + db: dispersed A few very large files, like large single-table database (ADEI many tables) + + Scalling storage: + cfg: 3 nodes is enough + cache/data: [d][d][a] => [da][d ][ad][ d] => [d ][d ][ d][ d][aa] => further increas in pairs, at some point add second arbiter node + + Gluster Volumes: + provision cfg /mnt/provision Provisioning volume which is not expected to be mounted in the containers (temporarily may contain secret information, etc.) + openshift cfg /mnt/openshift Multi-purpose: Various small size configurations (adei, apache, etc.) + temporary tmp /mnt/temporary Multi-purpose: Various logs & temporary files + ?adei cfg /mnt/adei/adei + adei-db cache /mnt/adei/db + adei-tmp tmp /mnt/adei/tmp + katrin-mysql data /mnt/katrin/mysql + katrin-data cfg /mnt/katrin/archive + katrin-kali cache /mnt/katrin/storage + katrin-tmp tmp /mnt/katrin/workspace + + OpenShift Volumes: + etc cfg/ro openshift Various configurations (ADEI & Apache configs, other stuff in etc.) + src cfg/ro openshift Interpreted source files + log tmp/rw tmp Suff in /var/log + tmp tmp/rw tmp Various temporary files + adei-db data/rw adei-db ADEI cache database and a few primary source [ will take ages to regenerate, so we can't consider it as dispensable cache really ] + adei-tmp tmp/rw adei-tmp ADEI, Apache, and Cron logs [Techically we have also downloads here which are more cache when tmp... But I think it is fine for now...] + adei-cfg cfg/ro adei? ADEI & Apache configs + adei-src cfg/ro adei? ADEI sources + katrin-mysql cfg/rw katrin-mysql KATRIN Database with configurations, etc. + katrin-data data/rw katrin-data KATRIN data archives, all primary raw data from Orca, etc. + katrin-kali cache/rw katrin-kali Generated ROOT files [ Can we make this separation? Marco uses hardlinks ] + katrin-proc tmp/rw katrin-proc Data processing volume (inbox, etc.) + +Services +======== + - Keepalived + - OpenVPN + - Gluster + - MySQL Galera (?) + - Cassandra (?) + - oVirt (?) + - OpenShift Master / Node + - Heketi + - Apache Router + - ADEI Services + - Apache Spark & etc. + +Inventories +=========== + - staging & production will be operating in parallel (staging in vagrant and production on bare-metal) + - testing is just pre-production tests which will be removed once production is running + +Labels +====== + - We specify if node is master and provides fat storage for glusterfs + - All nodes currently in 'infra' region (for example, student computers will be non-infra nodes; nodes outside of KIT as well) + - The servers in cellar are in 'default' zone (if we put something in the 4th floor server room, we would define a new zone there) + +Computing +========= + - Define CUDA nodes and OpenCL nodes + - Intel Xeon Phi is replaced by new Tesla in the ipepdvcompute2 + - Gen1 UFO servers does not support "Above 64G decoding" and can't run Xeon Phi. May be we can put it in new Phi server. |