Skip to main content
This document tracks common operational issues, their resolutions, and troubleshooting procedures for ModMS and Pinpoint services.

Operational Issues Log

ComponentExampleServerIssueDetailsAction TakenCommand
Sams Server Raw Data Storagesams.devops.arabiaweather.comDisk filling with raw dataSatellite data accumulationRegularly delete old raw datacd /data/raw-data
# remove old data
Sams Log FilesSensu Alertsams.devops.arabiaweather.comExcessive log file growthLog files grew too large on the Sams serverTruncated log filesdf -h
find / -xdev -type f -size +500M 2>/dev/null
truncate -s 0 /var/lib/docker/containers/<container-id>/*.log
Historical Server pengine Cronjobhtz-historical-01 (144.76.56.17)High memory usagepengine cronjob consuming excessive memoryKilled all pengine processeskillall -9 pengine
Redis Sentinel Alertcluster-n03, cluster-n04Unnecessary monitoring alertOnly two Redis nodes; Sentinel not requiredDisabled Redis Sentinel alertNo command specified
Redis Ping Failure (node03 → node04)cluster-n03, cluster-n04Slave-to-master ping failurenode03 slave couldn’t ping node04 master on port 3680 through HAProxyRestarted HAProxy on node03redis-cli info replication
systemctl reload haproxy
Bader 1bader-deploy (85.10.197.28)Containers unhealthyDocker containers showing unhealthy statusRestart unhealthy containersdocker container ls -a
# restart unhealthy container
Bader 2bader.arabiaweather.comRestart order causing lost eventsRemoved last_state and cleared processed map keys; aggregator republished before engine was readyRestart engine first, then aggregatorcd data/aggregator-downloads/
rm last_state
redis-cli keys 'maps:processed_files:*' | xargs redis-cli del
systemctl restart engine.service
systemctl restart aggregator.service

Known Instability Issues