| Sams Server Raw Data Storage | | sams.devops.arabiaweather.com | Disk filling with raw data | Satellite data accumulation | Regularly delete old raw data | cd /data/raw-data
# remove old data |
| Sams Log Files | Sensu Alert | sams.devops.arabiaweather.com | Excessive log file growth | Log files grew too large on the Sams server | Truncated log files | df -h
find / -xdev -type f -size +500M 2>/dev/null
truncate -s 0 /var/lib/docker/containers/<container-id>/*.log |
| Historical Server pengine Cronjob | | htz-historical-01 (144.76.56.17) | High memory usage | pengine cronjob consuming excessive memory | Killed all pengine processes | killall -9 pengine |
| Redis Sentinel Alert | | cluster-n03, cluster-n04 | Unnecessary monitoring alert | Only two Redis nodes; Sentinel not required | Disabled Redis Sentinel alert | No command specified |
| Redis Ping Failure (node03 → node04) | | cluster-n03, cluster-n04 | Slave-to-master ping failure | node03 slave couldn’t ping node04 master on port 3680 through HAProxy | Restarted HAProxy on node03 | redis-cli info replication
systemctl reload haproxy |
| Bader 1 | | bader-deploy (85.10.197.28) | Containers unhealthy | Docker containers showing unhealthy status | Restart unhealthy containers | docker container ls -a
# restart unhealthy container |
| Bader 2 | | bader.arabiaweather.com | Restart order causing lost events | Removed last_state and cleared processed map keys; aggregator republished before engine was ready | Restart engine first, then aggregator | cd data/aggregator-downloads/
rm last_state
redis-cli keys 'maps:processed_files:*' | xargs redis-cli del
systemctl restart engine.service
systemctl restart aggregator.service |