Notifications when Kubernetes (Cron)Jobs fail?
What do you do when you have CronJobs running in your Kubernetes cluster and want to know when a job fails? Do you manually check the execution status? Painful. Or do you perhaps rely on roundabout Prometheus queries, adding unnecessary overhead? Not ideal… But worry not! Instead, let me suggest a way to immediately receive notifications when jobs fail to execute, using two nifty tools:
- cmaster11/Overseer — an open-source monitoring tool.
- Notify17 — a notification app that lets you receive notifications on Android/iOS and web.
Note: cmaster11/Overseer is a heavily modified fork of the amazing skx/Overseer tool, e.g. with added support for Kubernetes eventing. All original credits for this tool go to skx!
Brief tech excursion: Kubernetes events
The underlying trick we will use is watching the stream of Kubernetes events. (A list of basic events can be found in the Kubernetes source code.)
Try running the following command in your cluster:
kubectl get events --all-namespaces
Most likely, you will see some interesting events happening. In my stream, I see a job that failed to create a pod. Womp womp.
50s Normal Pulling Pod pulling image "alpine"
23s Normal Pulled Pod Successfully pulled image "alpine"
23s Normal Created Pod Created container
23s Normal Started Pod Started container
2m39s Normal SuccessfulCreate Job Created pod: test-74rz4
22s Warning BackoffLimitExceeded Job Job has reached the specified backoff limit
You might notice that one of the events is BackoffLimitExceeded
. This event is generated whenever a Job
fails and there are no more retries available. This is the event we're going to watch with Overseer.
Overseer
Overseer can easily be run in Kubernetes using the provided example. More specifically, we will use the following files:
000-namespace.yaml
: the Overseer KubernetesNamespace
resource.redis.yaml
: the database where the alerts/found events will be stored.001-service-account-k8s-event-watcher.yaml
: a service account that lets Overseer watch Kubernetes events.overseer-k8s-event-watcher.yaml
: the Overseer worker that will watch for new Kubernetes events.overseer-bridge-webhook-n17.yaml
: the notification system to inform us about found events.
To start, we’ll set up the core of Overseer with the following commands:
kubectl apply -f https://raw.githubusercontent.com/cmaster11/overseer/3f8ee2bbc1e5452d292e14c8b3e78960385b7ac9/example-kubernetes/000-namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/cmaster11/overseer/3f8ee2bbc1e5452d292e14c8b3e78960385b7ac9/example-kubernetes/redis.yaml
kubectl apply -f https://raw.githubusercontent.com/cmaster11/overseer/3f8ee2bbc1e5452d292e14c8b3e78960385b7ac9/example-kubernetes/001-service-account-k8s-event-watcher.yaml
kubectl apply -f https://raw.githubusercontent.com/cmaster11/overseer/3f8ee2bbc1e5452d292e14c8b3e78960385b7ac9/example-kubernetes/overseer-k8s-event-watcher.yaml
You can monitor the process (in Linux) with:
watch kubectl -n overseer get pod
When all pods are up and running, let’s proceed with the notifier!
Notify17
To set up the notifier:
- Create a Notify17 account, it’s free!
- Next, create a notification template from the dashboard by pressing the import button and pasting the following configuration:
Once you’ve imported the template, save it by clicking the Save button.
The last step is to set up Overseer’s webhook bridge.
Copy the file https://github.com/cmaster11/overseer/blob/3f8ee2bbc1e5452d292e14c8b3e78960385b7ac9/example-kubernetes/overseer-bridge-webhook-n17.yaml to a local directory and replace REPLACE_TEMPLATE_API_KEY
with your notification template API key. Then apply the file with kubectl apply -f FILE_PATH
.
And we’re done!
Test
To test the whole system, you can try to apply the failing job example file:
kubectl apply -f https://raw.githubusercontent.com/cmaster11/overseer/master/example-kubernetes/example-failing-job/job-fail.yaml
The job will fail and in a few seconds Overseer should generate an alert and send it through Notify17!
P.S. If something doesn’t work, remember that kubectl get pod
and kubectl logs POD_NAME
are your friends.
Cleanup
To clean up Overseer, just delete its namespace with:
kubectl delete ns overseer
Written with StackEdit.
Follow us on Twitter 🐦 and Facebook 👥 and join our Facebook Group 💬.
To join our community Slack 🗣️ and read our weekly Faun topics 🗞️, click here⬇