A monitoring dashboard disappeared after a routine GKE node upgrade, revealing a problem with resource management. The NGINX Ingress controller pod was evicted and restarted, but the Grafana ingress resource vanished, making the service inaccessible. Investigation showed the ingress disappeared due to a missing TLS secret and an admission webhook failure. The TLS secret was manually created and not managed by any controller, causing the webhook to reject the ingress during the controller's restart. The issue stemmed from the use of `kubectl apply` without any reconciliation mechanism, failing to recreate the ingress. The solution was migrating to Helm charts, which store release state and can recreate missing resources. Helm ensures consistent resource management and manages dependencies efficiently. Similar problems can be avoided by managing secrets with operators and understanding admission webhooks. Helm-managed resources, operator-managed resources, and resources with owner references survive pod evictions. Manually applied resources and those referencing missing dependencies are more vulnerable. The incident highlighted the importance of using tools like Helm, Kustomize, or GitOps for production resources.
dev.to
dev.to
