Part 1: Monitoring Synology DS418 with AWS Prometheus

I recently purchased a Synology DS418 NAS to automate my family’s photo, document and device backups. Each of us has a computer, iPhone, and iPad, and it has become a pain to manually backup all of our data from each of these devices.

Specs

  • Synology DS418

It was was easy to setup the NAS and install the 4 hard drives, however, within a week of setup and beginning the data backups, one of the drives went into an unrecoverable and abnormal state. I received emails like this:

Storage pool 1 (raid10) on ds418_backup has degraded (total number of drives: 4; number of active drives: 3).

Which is fine, but I want to have some historical monitoring data and be able to setup my own custom alerts. (Aside: to fix this issue I swapped the bad drive in with a drive in another bay to see if it was a bay or drive issue, but neither have produced any errors since the swap.)

I use Prometheus/Grafana with Alertmanager for monitoring and alerting at my current job as a Platform Engineer and I recently found that AWS has a Managed Service for Prometheus. I would like to experiment with the managed AWS service and this seems like a good opportunity to do so.

I planned to run a node-exporter and after downloading the most recent arm64 archive, it started up without an issue.

pjames@ds418_backup:/volume7/apps$ wget https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.linux-arm64.tar.gz
pjames@ds418_backup:/volume7/apps$ tar zxvf node_exporter-1.1.2.linux-arm64.tar.gz
pjames@ds418_backup:/volume7/apps$ ./node_exporter-1.1.2.linux-arm64/node_exporter
level=info ts=2021-03-07T22:09:08.406Z caller=node_exporter.go:178 msg="Starting node_exporter" version="(version=1.1.2, branch=HEAD, revision=b597c1244d7bef49e6f3359c87a56dd7707f6719)"
level=info ts=2021-03-07T22:09:08.406Z caller=node_exporter.go:179 msg="Build context" build_context="(go=go1.15.8, user=root@f07de8ca602a, date=20210305-09:32:30)"
level=info ts=2021-03-07T22:09:08.407Z caller=filesystem_common.go:74 collector=filesystem msg="Parsed flag --collector.filesystem.ignored-mount-points" flag=^/(dev|proc|sys|var/lib/docker/.+)($|/)
level=info ts=2021-03-07T22:09:08.408Z caller=filesystem_common.go:76 collector=filesystem msg="Parsed flag --collector.filesystem.ignored-fs-types" flag=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
level=info ts=2021-03-07T22:09:08.409Z caller=node_exporter.go:106 msg="Enabled collectors"
level=info ts=2021-03-07T22:09:08.409Z caller=node_exporter.go:113 collector=arp
level=info ts=2021-03-07T22:09:08.409Z caller=node_exporter.go:113 collector=bcache
level=info ts=2021-03-07T22:09:08.409Z caller=node_exporter.go:113 collector=bonding
level=info ts=2021-03-07T22:09:08.409Z caller=node_exporter.go:113 collector=btrfs
level=info ts=2021-03-07T22:09:08.409Z caller=node_exporter.go:113 collector=conntrack
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=cpu
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=cpufreq
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=diskstats
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=edac
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=entropy
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=fibrechannel
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=filefd
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=filesystem
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=hwmon
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=infiniband
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=ipvs
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=loadavg
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=mdadm
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=meminfo
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=netclass
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=netdev
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=netstat
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=nfs
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=nfsd
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=powersupplyclass
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=pressure
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=rapl
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=schedstat
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=sockstat
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=softnet
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=stat
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=textfile
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=thermal_zone
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=time
level=info ts=2021-03-07T22:09:08.410Z caller=node_exporter.go:113 collector=timex
level=info ts=2021-03-07T22:09:08.411Z caller=node_exporter.go:113 collector=udp_queues
level=info ts=2021-03-07T22:09:08.411Z caller=node_exporter.go:113 collector=uname
level=info ts=2021-03-07T22:09:08.411Z caller=node_exporter.go:113 collector=vmstat
level=info ts=2021-03-07T22:09:08.411Z caller=node_exporter.go:113 collector=xfs
level=info ts=2021-03-07T22:09:08.411Z caller=node_exporter.go:113 collector=zfs
level=info ts=2021-03-07T22:09:08.411Z caller=node_exporter.go:195 msg="Listening on" address=:9100
level=info ts=2021-03-07T22:09:08.411Z caller=tls_config.go:191 msg="TLS is disabled." http2=false

I added an upstart init script at /etc/init/ to enable the node_exporter to start up if the server reboots. Here is my simple startup script.

#start this service on startup
start on startup
# exec the process. Use fully formed path names so that there is no reliance on $PATH
exec /volume7/apps/node_exporter-1.1.2.linux-arm64/node_exporter

Now, I am able to start/stop the node_exporter with start node_exporter and stop node_exporter, and it should startup on a reboot of the NAS. Now, going to my browser at http://{NAS_IP}:9100/metrics, I can see metrics being published. Success!

This concludes Part 1. In Part 2, I will work on connecting my NAS Prometheus metrics to the AWS Managed Prometheus Service.