Common symptoms
- Requests timing out from clients or load balancer
- 500 or 502 responses from Pinpoint
- Sudden increase in errors on the Hetzner load balancer or Grafana dashboards
Quick health checks
- Check load balancer & infra metrics
- Open the Hetzner Cloud / load balancer dashboard in Grafana:
https://grafana.devops.arabiaweather.com/d/d351478f-c755-4390-b0ee-f07b9a9081e12/hetzner-cloud?orgId=1&from=now-30m&to=now&timezone=browser&var-datasource=ef5oqqahf6nlse&var-servers=114907978&var-servers=114908447&var-load_balancers=5264917
- Open the Hetzner Cloud / load balancer dashboard in Grafana:
- Hit a known forecast API
- Use a representative forecast call and check the status code and latency:
- Check ModMS status
- UI:
http://modms.devops.arabiaweather.com/web/ - Quick model freshness query:
http://modms.devops.arabiaweather.com/q?fields=model%20|%20last_updated&from=models
- UI:
Typical root causes
- ModMS issues
- ModMS
/webnot reachable or very slow qquery failing or showing very oldlast_updatedvalues
- ModMS
- Redis problems
- Cluster issues on
node03.cluster.devops.arabiaweather.com(cache or overrides unavailable)
- Cluster issues on
- Geos issues
- Geos servers or location data unavailable, especially during Pinpoint bootup
Null data responses
Sometimes Pinpoint returnsnull data for specific parameters while the request itself succeeds.
In most cases this is caused by missing or incomplete data in ModMS for the underlying model runs.
How to debug null data
- Identify the parameter
- From the client response, note which parameter(s) are
null.
- From the client response, note which parameter(s) are
- Check parameter configuration
- In the Pinpoint repo, locate the parameter definition in:
Pinpoint/configs/parameters/*.json - Confirm which model(s) and fields this parameter depends on.
- In the Pinpoint repo, locate the parameter definition in:
- Verify ModMS has data for those models
- Open
http://modms.devops.arabiaweather.com/web/and check the last updated time for the relevant model. - On
n01.modms(or the main ModMS data node), check NetCDF runs under/data/ncfor that model:
- Open
- Ensure you have yesterday’s run and today’s run (for example, previous cycle +
06run). - If only the latest run (e.g.
06) exists without the previous run, forecast data can be unavailable for hours before that run time.
- If runs are missing
- Treat this as a ModMS ingestion issue:
- Investigate Aggregator/Indexer for that model.
- Use the ModMS operations and GlusterFS docs for deeper debugging.
- Treat this as a ModMS ingestion issue:
Step-by-step debugging
- Check load balancer & node health
- Use the Hetzner Cloud Grafana dashboard to confirm traffic levels, error rates, and per-node health.
- Test Pinpoint directly
- Run the forecast API
curlabove and note:- HTTP status (
200vs4xx/5xx) - Response time
- HTTP status (
- Run the forecast API
- If status is 500/502 or very slow, check ModMS
- Open
http://modms.devops.arabiaweather.com/web/ - Run the models query to confirm data freshness:
- Open
- Validate Redis
- From one of the Pinpoint nodes, verify you can reach the Redis cluster on
node03.cluster.devops.arabiaweather.com:6379. - Check for Redis-side errors or resource exhaustion.
- From one of the Pinpoint nodes, verify you can reach the Redis cluster on
- Validate Geos & locations
- Confirm Geos endpoints are reachable from a Pinpoint node.
- If Pinpoint recently restarted and cannot load locations, you may see failures early in boot.
- If upstreams are healthy but issues persist
- Check the per-node Pinpoint logs and HAProxy metrics (via Grafana Loki dashboards) for spikes in errors or restarts.
- Consider rolling Pinpoint node restarts via Docker Compose as described in the deployment docs.

