1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
|
# Steps to setup KDB infrastructure in OpenShift
Web interface: https://kaas.kit.edu:8443/console/
Commandline interface:
```
oc login kaas.kit.edu:8443
oc project katrin
```
## Overview
The setup uses (at least) three containers:
* `kdb-backend` is a MySQL/MariaDB container that provides the database backend
used by KDB server. It hosts the `katrin` and `katrin_run` databases.
* `kdb-server` runs the KDB server process inside an Apache environment. It
provides the web interface (`kdb-admin.fcgi`) and the KaLi service
(`kdb-kali.fcgi`).
* `run-processing` periodically retrieves run files from several DAQ machines
and adds the processed files to the KDB runlist. This process could be
distributed over several containers for the individual systems (`fpd` etc.)
> The ADEI server hosting the `adei` MySQL database runs in an independent project with hostname `mysql.adei.svc`.
A persistent storage volume is needed for the MySQL data (volume group `db`)
and for the copied/processed run files (volume group `katrin`). The latter one
is shared between the KDB server and run processing applications.
## MySQL backend
### Application
This container is based on the official Redhat MariaDB Docker image. The
OpenShift application is created via the CLI:
```
oc new-app -e MYSQL_ROOT_PASSWORD=XXX --name=kdb-backend registry.access.redhat.com/rhscl/mariadb-101-rhel7
```
Because KDB uses two databases (`katrin`, `katrin_run`) and must be permitted
to create/edit database users, it is required to define a root password here.
### Volumes
This container needs a persistent storage volume for the database content. In
OpenShift this is done by removing the default storage and adding a persistent
volume `kdb-backend` for MySQL data: `db: /kdb/mysql/data -> /var/lib/mysql/data`
### Final steps
It makes sense to add readiness/liveness probes as well: TCP socket, port 3306.
> It is possible to access the MySQL server inside a container: `mysql -h kdb-backend.katrin.svc -u root -p -A`
## KDB server
### Application
The container is created from a `Dockerfile` available in GitLab:
https://nuserv.uni-muenster.de:8443/katrin-git/Dockerfiles/tree/kdbserver
The app is created via the CLI, but manual changes are necessary later on:
```
oc new-app https://nuserv.uni-muenster.de:8443/katrin-git/Dockerfiles.git --name=kdb-server
```
> The build fails because the branch name and user credentials are not defined.
The build settings must be adapted before the image can be created.
* Set the git branch name to `kdbserver`.
* Add a source secret `katrin-gitlab` that provides the git user credentials,
i.e. the `katrin` username and corresponding password for read-only access.
When a container instance (pod) is created in OpenShift, the main script
`/run-httpd.sh` starts the Apache webserver with the KDB fastcgi module.
### Volumes
Just like the MySQL backend, the container needs persistent storage enabled: `katrin: /data -> /mnt/katrin/data`
### Config Maps
Some default configuration files for the Apache web server and the KDB server
installation are provided with the Dockerfile. The webserver config should
work correctly as it is. The main config must be updated so that the correct
servers/databases are used. A config map `kdbserver-config` is created with
mountpoint `/config` in the container:
* `kdbserver.conf` is the main config for the KDB server instance. For the
steps outlined here, it should contain the following entries:
```
sql_server = kdb-backend.katrin.svc
sql_adei_server = mysql.adei.svc
sql_katrin_dbname = katrin
sql_run_dbname = katrin_run
sql_adei_dbname = adei_katrin
sql_user = root
sql_password = XXX
sql_adei_user = katrin
sql_adei_password = XXX
use_adei_cache = true
adei_service_url = http://adei-katrin.kaas.kit.edu/adei
adei_public_url = http://katrin.kit.edu/adei-katrin
```
* `log4cxx.properties` defines the terminal/logfile output settings. By default,
all log output is shown on `stdout` (and visible in the OpenShift log).
> Files in `/config` are symlinked to the respective files inside the container by `/run-httpd.sh`.
### Database setup
The KDB server sources provide a SQL dump file to initialize the database. To
create an empty database with all necessary tables, run the `mysql` command:
```
mysql -h kdb-backend.katrin.svc -u root -p < /src/kdbserver/Data/katrin-db.sql
```
Alternatively, a full backup of the existing database can be imported:
```
tar -xJf /src/kdbserver/Data/katrin-db-bkp.sql.xz -C /tmp
mysql -h kdb-backend.katrin.svc -u root -p < /tmp/katrin-db-bkp.sql
```
> To clean a database table, execute a MySQL `drop table` statement and re-initialize the dropped tables from the `katrin-db.sql` file.
### IDLE storage
IDLE provides a local storage on the server-side file system. An empty IDLE
repository with default datasets is created by executing this command:
```
/opt/kasper/bin/idle SetupPublicDatasets
```
This creates a directory `.../storage/idle/KatrinIdle` on the storage volume
that can be filled with contents from a backup archive. The `oc rsync` command
allows to transfer files to a running container (pod) in OpenShift.
> After restoring one should fix all permissions so that KDB can access the data.
### Final steps
Again a readiness/liveness probe can be added: TCP socket, port 80.
To make the KDB server interface accessible to the outside, a route must be
added in OpenShift: `http://kdb.kaas.kit.edu -> kdb-server:80`
> The web interface is now available at http://kdb.kaas.kit.edu/kdb-admin.fcgi
## Run processing
### Application
The setup for the run processing service is similar to the KDB server, with
the container being created from a GitLab `Dockerfile` as well:
https://nuserv.uni-muenster.de:8443/katrin-git/Dockerfiles/tree/inlineprocessing
The app is created via the CLI, but manual changes are necessary later on:
```
oc new-app https://nuserv.uni-muenster.de:8443/katrin-git/Dockerfiles.git --name=run-processing
```
> The build fails because the branch name and user credentials are not defined.
The build settings must be adapted before the image can be created.
* Set the git branch name to `inlineprocessing`.
* Use the source secret `katrin-gitlab` that was created before.
#### Run environment
When a container instance (pod) is created in OpenShift, the main script
`/run-loop.sh` starts the main processing script `process-system.py`. It
is executed in a continuous loop with a user-defined delay. The script
is configured by the following environment variables that can be defined
in the OpenShift configuration:
* `PROCESS_SYSTEMS` defines one or more DAQ systems configured in the file
`ProcessingConfig.py`: `fpd`, `mos`, etc.
* `PROCESS_FLAGS` defines additional options passed to the script, e.g.
`--pull` to automatically retrieve run files from configured DAQ machines.
* `REFRESH_INTERVAL` defines the waiting time between consecutive executions.
Note that the `/run-loop.sh` script waits until `process-system.py` finished
before the next loop iteration is started, so the delay time is always
included regardless of how long the script takes to process all files.
### Volumes
The run processing stores files that need to be accessible by the KDB server
application. Hence, the same persistent volume is used in this container:
`katrin: data -> /mnt/katrin/data`
To ensure that all processes can read/write correctly, the file permissions are
relaxed (this can be done in an OpenShift terminal or remote shell):
```
mkdir -p /mnt/katrin/data/{inbox,archive,storage,workspace,logs,tmp}
chown -R katrin: /mnt/katrin/data
chmod -R ug+rw /mnt/katrin/data
```
### Config Maps
Just like with the KDB server, a config map `run-processing-config` with
mountpoint `/config` should be added, which defines the configuration of the
processing script:
* `ProcessingConfig.py` is the main config where the DAQ machines are defined
with their respective storage paths. The file also defines a list of
processing steps to be executed for each run file; these steps may have
to be adapted where necessary.
* `datamanager.cfg` defines the interface to the KaLi web service. It must be
configured so that the KDB server instance from above is used:
```
url = http://kdb-server.katrin.svc/kdb-kali.fcgi
user = katrin
password = XXX
timeout_seconds = 300
cache_age_hours = -1
```
* `rsync-filter` is applied with the `rsync` command that copies run files
from the DAQ machines. It can be adapted to exclude certain directories,
e.g. old run files that do not need to be processed.
* `log4cxx.properties` configures terminal/logfile output, see above.
> Files in `/config` are symlinked to the respective files inside the container by `/run-loop.sh`.
#### SSH keys
A second config map `run-processing-ssh` is required to provide SSH keys that
are used to authenticate remote connections to the DAQ machines. The map with
mountpoint `/.ssh` should contain the files `id_dsa`, `id_dsa.pub` and
`known_hosts` and must be adapted as necessary.
> This assumes that the SSH credentials have been added to the respective machines beforehand!
> The contents of `known_hosts` should be updated with the output of `ssh-keyscan` for the configured DAQ machines.
### Notes
The script `/run-loop.sh` pulls files from the DAQ machines and processes
them automatically, newest first. Where necessary, run files can be copied
manually (FPD example; adapt the options and `rsync-filter` file as required):
```
rsync -rltD --verbose --append-verify --partial --stats --compare-dest=/mnt/katrin/data/archive/FPDComm_530 --filter='. /opt/processing/system/rsync-filter' --log-file='/mnt/katrin/data/logs/rsync_fpd.log' katrin@192.168.110.76:/Volumes/DAQSTORAGE/data/ /mnt/katrin/data/inbox/FPDComm_530
```
If runs were not processed correctly, one can trigger manual reprocessing
from an OpenShift terminal (with run numbers `START`, `END` as necessary):
```
./process-system.py -s fpd -r START END
```
|