Kubernetes Monitoring at a Glance
To start this off, I’d like you to picture the last time you experienced an outage for anything that you manage . How did you find out there was an outage? What did you do to fix the issue? Did you have any advance warning there may be an issue, or anything to go off to find out a cause?
Hopefully by the end of this, I’ll be able to show you how you can answer those questions in the context of a Kubernetes cluster. I’ll be talking technical, but I will be glancing over some of the very in-depth stuff, and I’ll be using just the one example I have in my lab. With that said, let’s go!
Prometheus #
Prometheus is one of the most flexible bits of monitoring software I’ve seen, and that’s since there’s so many libraries for devs to implement into their programs to allow prometheus to scrape metrics. In short, it collects a bunch of information from whatever you point it to, and puts it in a time-series database. That means that at any specific time, you can find exactly what state your stuff is in. By itself however, it’s not the most useful thing. That’s why we need something to actually expose those metrics.
Node Exporter & Kube-State-Metrics #
As I alluded to just then, Node Exporter, and Kube-State-Metrics , do exactly what they say on the tin. One exports information about nodes, and the other gets a lot of information about the staßte of the cluster and what’s inside it. The great thing about both of these is that they consume little to no resources. As it stands, kube-state-metrics and node-exporter are using 120MB RAM and 25 millicores combined while grabbing metrics every 10 seconds!
Grafana #
So you’ve got the metrics, and you’ve got the things that expose the metrics, but how do you actually see them? This is where Grafana comes to the rescue! Above you’ll see a dashboard that uses some metrics from both Node Exporter and Kube State Metrics, and they do really look quite nice.
Grafana can take inputs from quite a few different data-sources, and graph out whatever you’d like. It can also handle alerting as well. I personally have it sending messages through my Telegram bot whenever my resources usage gets too high, or when my NGINX ingress controller gets below a 99% connection acceptance rate for the last 10 minutes.
You really can graph this out to your hearts content, but it does make it quite hard to get started. When you’ve got a million different options, it can be quite hard to pick just the one. That’s why I’ve got a bit of a setup guide below.
Setup #
Now for this, I have my own Github Repo which’ll have all of the configs. There’s a couple of pre-requisites here:
- Have your own kubernetes cluster setup (Setup guide here if you don’t yet!)
- If you want to have this setup publicly, you’ll need to have an ingress setup. I personally use nginx ingress, and have added the service configs in that repo as well.
One thing to note too, these configs are not production ready. However, I have used persistent volumes where it makes sense just so you don’t lose monitoring data whenever you kill a deployment. It’s not completely needed, but I’d definitely recommend it if you’re wanting to have this up for longer than a few days at a time.
So firstly, you’ll want to clone my repo!
# Make sure you're in a folder you're wanting to save these files in!
mkdir lhc-monitoring-public && cd lhc-monitoring-public
git clone https://github.com/LiamHardman/lhc-monitoring-public.git
When you’ve cloned them, I’d definitely recommend opening up these files and taking a look at them and seeing how you might set this up differently. The good (and bad) thing with Kubernetes is that you can set this up in many different ways. Would you set these up using Helm, or something like Kustomize? For simplicity, I just have them as vanilla Kubernetes manifests, but feel free to send me a message if you have a better way of doing things!
First, we’ll apply node-exporter and kube-state-metrics:
kubectl apply -f node-exporter/ && kubectl apply -f kube-state-metrics/
You’ll see with kube-state-metrics we’ve got a few more files than in node-exporter. Here, we’ve got a cluster-role along with a Service Account. That’s because kube-state-metrics does collect quite a lot of information from other containers on the cluster, and needs some permissions to be able to do that. Now let’s get Prometheus setup, and we can use it to test if our metrics are getting picked up!
kubectl apply -f prometheus/
It may take a little bit for this one to spin up just since we’ve got a PersistentVolumeClaim. For me, it takes an extra 30 seconds or so compared to a pod that has no permanent storage requirements.
Assuming you don’t have an nginx ingress controller, you’ll need to perform the below command to access prometheus:
kubectl port-forward svc/prometheus-service -n monitoring 8090:8090
When this is run, you should be able to go to localhost:8090 in a browser and see a prometheus screen. Head on over to Status -> Targets and you should see kube-state-metrics and node-exporter showing as up. The amount of pods up for each one may differ for you since we’re using a Daemonset for node-exporter, but I have 2/2 for node-exporter, and 1/1 for kube-state-metrics.
If they’re not showing as up, you’ll need to take a look at your service configs, and check if your pods are running.
So all of this is quite cool, but there’s no point if we can’t actually visualize it, so lets get Grafana installed!
kubectl apply -f grafana/
Again, if you don’t have nginx ingress controller setup, you’ll need to perform a port-forward, this time on port 3000 and for the grafana service:
kubectl port-forward svc/grafana -n monitoring 3000:3000
You also won’t need to apply the grafana-ext.yml file if you’re not using an ingress controller for inbound internet connectivity
Fingers crossed, you should get a prompt to login with the default grafana credentials of admin:admin , and to change that password.
Then, you can get started applying your dashboards. The prometheus datasource should already be configured for you, and I’ve included some JSON files in the dashboards folder that should get you started.
To apply them, go to Dashboards -> New -> Import and then paste in the JSON files. I’d definitely recommend you take a look at some other dashboards since there’s a lot out there, and I only took the ones that look nicest to me. Everyone’s got their own tastes, and their own requirements.
Conclusion #
So now you’ve got your own monitoring stack (or at least know how to get one up now!) and you’ve had a bit of an overview on what there is. There’s a heck of a lot more though, I didn’t even touch Promtail or Loki! Watch this space, over the next few months I’ll be writing more here and there when I learn more, and I’ll always be happy to share!