My New Server Rack: Nouki

GNU+Linux Gentoo PostgreSQL Prometheus Systemd ZFS — Published on .

After setting up mieshu, nouki is the next server to work on in my home rack. Nouki is intended to live as my main database server, mainly for PostgreSQL, but perhaps later on in life MySQL if I ever want a service that doesn’t support superiour databases.

The setup for nouki is much simpler in that regard, the base system is almost identical. This server has ZFS with 2 NVMe disks running in a mirror configuration. It is also a Gentoo based system, and again with systemd rather than openrc. The experience of systemd with mieshu was much less painful than I anticipated. It would seem that it has had time to mature, though I still dislike how it kills diversity in init/service managers on GNU+Linux.

Both PostgreSQL and ZFS have received some tweaking to run more smoothly. I’m no DBA, so if you see anything silly in here, do let me know so I can improve my life.

For ZFS, tweaking was rather minimal. I’ve made a seperate dataset for PostgreSQL to use, with recordsize=8K as option. For PostgreSQL, I’ve altered a bit more. First and foremost, the pg_hba.conf to allow access from machines in my tinc-based VPN.

host    all             all             10.57.0.0/16            scram-sha-256

The postgresql.conf file received the following treatment, based solely on the guidance provided by PGTune.

listen_address = 10.57.101.20
max_connections = 200
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 5242kB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 12
max_parallel_workers_per_gather = 4
max_parallel_workers = 12
max_parallel_maintenance_workers = 4

With this, PostgreSQL seems to perform very well on this machine, applications using it are noticably faster. Sadly I have no timings from when it all ran on my desktop, so I cannot make an exact statement on how much faster everything has become.

Additionally, I wanted to start gathering metrics of my machines and services, so I can start thinking about dashboards and alerts. I’ve chosen to use the current industry standard of Prometheus for this. Since I consider Prometheus to be a database for metrics, it has been deployed on my database server as well.

Prometheus is currently set to scrape metrics from the node_exporter and postgresql_exporter, and seems to work fine. I expect I may need to tweak it in the future to configure how long I want metrics to be available, since I’ve seen it use quite a large amount of memory when storing a large amount of metrics for a very long time.

To actually see the metrics and have alerts, I currently intend to go with Grafana. I already have ntfy running, and it appears relatively simple to mold Grafana alerts into ntfy notifications. To do this properly, I will require some machines to handle regular workloads. Most likely these will be Intel NUCs, or similar machines, as they draw very little power for reasonable performance. Raspberry Pi units would be cheaper, but also seem vastly less powerful, and I’d need to ensure all my intended workloads can run on ARM which could become a nuisance very quickly.

As I already have an Intel NUC to play with, that’s what I’ll be doing for the coming few days to see if this can work for my desires. Perhaps I can try out a highly available cluster setup of K3s in the near future!