Skip to main content

Common symptoms

  • Requests timing out from clients or load balancer
  • 500 or 502 responses from Pinpoint
  • Sudden increase in errors on the Hetzner load balancer or Grafana dashboards

Quick health checks

  • Check load balancer & infra metrics
    • Open the Hetzner Cloud / load balancer dashboard in Grafana:
      https://grafana.devops.arabiaweather.com/d/d351478f-c755-4390-b0ee-f07b9a9081e12/hetzner-cloud?orgId=1&from=now-30m&to=now&timezone=browser&var-datasource=ef5oqqahf6nlse&var-servers=114907978&var-servers=114908447&var-load_balancers=5264917
  • Hit a known forecast API
    • Use a representative forecast call and check the status code and latency:
curl -i "http://pinpoint.devops.arabiaweather.com/api/v1.2/forecast/201034068?start=now&interval=264&parameters=surface.temperature,clouds.total_cover,clouds.low_cover,clouds.medium_cover,clouds.high_cover&hourly=true&overrides=true&model=aifs"
  • Check ModMS status
    • UI: http://modms.devops.arabiaweather.com/web/
    • Quick model freshness query:
      http://modms.devops.arabiaweather.com/q?fields=model%20|%20last_updated&from=models

Typical root causes

  • ModMS issues
    • ModMS /web not reachable or very slow
    • q query failing or showing very old last_updated values
  • Redis problems
    • Cluster issues on node03.cluster.devops.arabiaweather.com (cache or overrides unavailable)
  • Geos issues
    • Geos servers or location data unavailable, especially during Pinpoint bootup

Null data responses

Sometimes Pinpoint returns null data for specific parameters while the request itself succeeds. In most cases this is caused by missing or incomplete data in ModMS for the underlying model runs.

How to debug null data

  1. Identify the parameter
    • From the client response, note which parameter(s) are null.
  2. Check parameter configuration
    • In the Pinpoint repo, locate the parameter definition in:
      Pinpoint/configs/parameters/*.json
    • Confirm which model(s) and fields this parameter depends on.
  3. Verify ModMS has data for those models
    • Open http://modms.devops.arabiaweather.com/web/ and check the last updated time for the relevant model.
    • On n01.modms (or the main ModMS data node), check NetCDF runs under /data/nc for that model:
cd /data/nc/<model_name>/
ls
  • Ensure you have yesterday’s run and today’s run (for example, previous cycle + 06 run).
  • If only the latest run (e.g. 06) exists without the previous run, forecast data can be unavailable for hours before that run time.
  1. If runs are missing
    • Treat this as a ModMS ingestion issue:
      • Investigate Aggregator/Indexer for that model.
      • Use the ModMS operations and GlusterFS docs for deeper debugging.

Step-by-step debugging

  1. Check load balancer & node health
    • Use the Hetzner Cloud Grafana dashboard to confirm traffic levels, error rates, and per-node health.
  2. Test Pinpoint directly
    • Run the forecast API curl above and note:
      • HTTP status (200 vs 4xx/5xx)
      • Response time
  3. If status is 500/502 or very slow, check ModMS
    • Open http://modms.devops.arabiaweather.com/web/
    • Run the models query to confirm data freshness:
curl -i "http://modms.devops.arabiaweather.com/q?fields=model%20|%20last_updated&from=models"
  1. Validate Redis
    • From one of the Pinpoint nodes, verify you can reach the Redis cluster on node03.cluster.devops.arabiaweather.com:6379.
    • Check for Redis-side errors or resource exhaustion.
  2. Validate Geos & locations
    • Confirm Geos endpoints are reachable from a Pinpoint node.
    • If Pinpoint recently restarted and cannot load locations, you may see failures early in boot.
  3. If upstreams are healthy but issues persist
    • Check the per-node Pinpoint logs and HAProxy metrics (via Grafana Loki dashboards) for spikes in errors or restarts.
    • Consider rolling Pinpoint node restarts via Docker Compose as described in the deployment docs.