Prometheus: unexpected end of JSON input

In this blog post I will show you how I solwed the Prometheus: unexpected end of JSON input error.

The current log shows that tha prometheus can not start.

# systemctl status prometheus.service
● prometheus.service - Monitoring system and time series database
   Loaded: loaded (/lib/systemd/system/prometheus.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2022-04-27 11:31:59 CEST; 9min ago
     Docs: https://prometheus.io/docs/introduction/overview/
  Process: 3428 ExecStart=/usr/bin/prometheus $ARGS (code=exited, status=1/FAILURE)
 Main PID: 3428 (code=exited, status=1/FAILURE)

Apr 27 11:31:59 prometheus systemd[1]: prometheus.service: Main process exited, code=exited, status=1/FAILURE
Apr 27 11:31:59 prometheus systemd[1]: prometheus.service: Failed with result 'exit-code'.
Apr 27 11:31:59 prometheus systemd[1]: prometheus.service: Service RestartSec=100ms expired, scheduling restart.
Apr 27 11:31:59 prometheus systemd[1]: prometheus.service: Scheduled restart job, restart counter is at 5.
Apr 27 11:31:59 prometheus systemd[1]: Stopped Monitoring system and time series database.
Apr 27 11:31:59 prometheus systemd[1]: prometheus.service: Start request repeated too quickly.
Apr 27 11:31:59 prometheus systemd[1]: prometheus.service: Failed with result 'exit-code'.
Apr 27 11:31:59 prometheus systemd[1]: Failed to start Monitoring system and time series database.

The error suggests there’a a broken JSON file to find.

# journalctl -u prometheus.service -f
-- Logs begin at Thu 2022-04-21 14:20:44 CEST. --
Apr 27 11:31:59 prometheus prometheus[3428]: level=info ts=2022-04-27T09:31:59.438273038Z caller=main.go:517 msg="Scrape manager stopped"
Apr 27 11:31:59 prometheus prometheus[3428]: level=error ts=2022-04-27T09:31:59.438550231Z caller=main.go:686 err="opening storage failed: block dir: \"/var/lib/prometheus/metrics2/01FYVDSN8KYK9SEFT6EAD7KRA6\": unexpected end of JSON input"
Apr 27 11:31:59 prometheus systemd[1]: prometheus.service: Main process exited, code=exited, status=1/FAILURE
Apr 27 11:31:59 prometheus systemd[1]: prometheus.service: Failed with result 'exit-code'.
Apr 27 11:31:59 prometheus systemd[1]: prometheus.service: Service RestartSec=100ms expired, scheduling restart.
Apr 27 11:31:59 prometheus systemd[1]: prometheus.service: Scheduled restart job, restart counter is at 5.
Apr 27 11:31:59 prometheus systemd[1]: Stopped Monitoring system and time series database.
Apr 27 11:31:59 prometheus systemd[1]: prometheus.service: Start request repeated too quickly.
Apr 27 11:31:59 prometheus systemd[1]: prometheus.service: Failed with result 'exit-code'.
Apr 27 11:31:59 prometheus systemd[1]: Failed to start Monitoring system and time series database.

A google for the error turned up:

Hunt for root cause

  • There’s enough free space to write files in this location.
  • I don’t use NFS; the data is on XFS
  • The previous (/var/log/prometheus/) log contains a clean startup

FIX

meta.json is empty for the recent directory, so I’ll delete.

ls -alh /var/lib/prometheus/metrics2/*/meta.json | awk '{print $5" - "$9}'
930 - /var/lib/prometheus/metrics2/01FXNNACW65B4XZ1P86W5C6MKT/meta.json
930 - /var/lib/prometheus/metrics2/01FXQK464WAPXAMHDE3RWXGCB0/meta.json
930 - /var/lib/prometheus/metrics2/01FXSGXCRGP0DE8ZQ4ZRND552C/meta.json
930 - /var/lib/prometheus/metrics2/01FXVEQ4AV8EH9EDMEKV1TRKZ3/meta.json
930 - /var/lib/prometheus/metrics2/01FXXCGMJZERBJCKVAQSYM4WCG/meta.json
930 - /var/lib/prometheus/metrics2/01FXZAA41ZAZYRHN415FE3CX56/meta.json
930 - /var/lib/prometheus/metrics2/01FY183K7PSVG2ZACXVTG6DJXM/meta.json
930 - /var/lib/prometheus/metrics2/01FY35XCQTFN1T931BT98KK99G/meta.json
930 - /var/lib/prometheus/metrics2/01FY53PVTA9NGW1YQQ9868NSTQ/meta.json
930 - /var/lib/prometheus/metrics2/01FY71GCRJF9202770248J335E/meta.json
930 - /var/lib/prometheus/metrics2/01FY8Z9TXDP194BXKEH3FAPN8P/meta.json
930 - /var/lib/prometheus/metrics2/01FYAX39CBRAX0VS7E93XZQ2FZ/meta.json
930 - /var/lib/prometheus/metrics2/01FYCTWT4M5QV6A0W1P34A02E1/meta.json
930 - /var/lib/prometheus/metrics2/01FYERPMJ38ZDN3XB4DQTNDTB9/meta.json
930 - /var/lib/prometheus/metrics2/01FYGPG00KFZRSVBV4JM9NFRRH/meta.json
930 - /var/lib/prometheus/metrics2/01FYJM9TK6TDTNQ64C399A46KZ/meta.json
930 - /var/lib/prometheus/metrics2/01FYMJ36ZTYFV09RWRYY3BN7PV/meta.json
930 - /var/lib/prometheus/metrics2/01FYPFWJRB5NAQACM4H2D8AK01/meta.json
930 - /var/lib/prometheus/metrics2/01FYRDP2N3EJ4XRPHKPXS6H49C/meta.json
930 - /var/lib/prometheus/metrics2/01FYTBFV9Q5XQ38WYF26HHKE5H/meta.json
308 - /var/lib/prometheus/metrics2/01FYV026RKCB7XRKNAGDG4ZW0D/meta.json
732 - /var/lib/prometheus/metrics2/01FYV02K8P30TVW73Y4RYZ3DH5/meta.json
308 - /var/lib/prometheus/metrics2/01FYV6XY0J1RW12NXC4FTMRDN0/meta.json
0 - /var/lib/prometheus/metrics2/01FYVDSN8KYK9SEFT6EAD7KRA6/meta.json
rm -rf /var/lib/prometheus/metrics2/01FYVDSN8KYK9SEFT6EAD7KRA6

Conclusion

I don’t think my guests are always going down cleanly at shutdown. Prometheus was compacting it’s data when KVM guest went away.

Looking at issue 4058, it’s noted that the code doesn’t do a sync before closing the file; it does everywhere else, and I guess this is the fix that’ll be in 2.9.0. That might have helped, though it’s a case, here, of putting on a tourniquet before shooting oneself in a foot.