summaryrefslogtreecommitdiffstats
path: root/docs/troubleshooting.txt
diff options
context:
space:
mode:
authorSuren A. Chilingaryan <csa@suren.me>2024-07-29 22:32:00 +0200
committerSuren A. Chilingaryan <csa@suren.me>2024-07-29 22:32:00 +0200
commit4175af7f92ad7357b83ceb56f2a6d42a8243cd80 (patch)
treebb4af8cbb7495c179b0e257ca337f10132f63170 /docs/troubleshooting.txt
parent0fbe7da54cd41846d7debfc49d25397ad8fc69a0 (diff)
downloadands-master.tar.gz
ands-master.tar.bz2
ands-master.tar.xz
ands-master.zip
Update documentation & usersHEADmaster
Diffstat (limited to 'docs/troubleshooting.txt')
-rw-r--r--docs/troubleshooting.txt14
1 files changed, 13 insertions, 1 deletions
diff --git a/docs/troubleshooting.txt b/docs/troubleshooting.txt
index 315f9f4..0621b25 100644
--- a/docs/troubleshooting.txt
+++ b/docs/troubleshooting.txt
@@ -151,8 +151,17 @@ nodes: domino failures
* This might continue infinitely as one node is gets disconnected after another, pods get rescheduled, and process never stops
* The only solution is to remove temporarily some pods, e.g. ADEI pods could be easily removed and, then, provivisioned back
-pods: very slow scheduling (normal start time in seconds range), failed pods, rogue namespaces, etc...
+pods: failed or very slow scheduling (normal start time in seconds range), failed pods, rogue namespaces, etc...
====
+ - LSDF mounts might cause pod-scheduling to fail
+ * It seems OpenShift tries to index (chroot+chmod) files on mount and timeouts if LSDF volume has too many small files...
+ * Reducing number of files with 'subPath' doesn't help here, but setting more specific 'networkPath' in pv helps
+ * Suggestion is to remove fsGroup from 'dc' definition, but it is added automatically if pods use network volumes,
+ setting volume 'gid' (cifs mount parameters specified in 'mountOptions' in pv definition) to match fsGroup doesn't help either
+ * Timeout seems to be fixed to 2m and is not configurable...
+ * Later versions of OpenShift has 'fsGroupChangePolicy=OnRootMismatch' parameter, but it is not present in 3.9
+ => Honestly, solution is unclear besides reducing number of files or mounting a small share subset with little fieles
+
- OpenShift has numerous problems with clean-up resources after the pods. The problems are more likely to happen on the
heavily loaded systems: cpu, io, interrputs, etc.
* This may be indicated in the logs with various errors reporting inability to stop containers/processes, free network
@@ -450,3 +459,6 @@ Various
- IPMI may cause problems as well. Particularly, the mounted CDrom may start complaining. Easiest is
just to remove it from the running system with
echo 1 > /sys/block/sdd/device/delete
+
+ - 'oc get scc' reports the server doesn't have a resource type "scc"
+ Delete (will be restarted) 'apiserver-*' pod in the 'kube-service-catalog' namespace