General Troubleshooting
This page is a high-level operations runbook. For detailed architecture and deployment docs, see: If Pinpoint or ModMS returns5xx errors, start with the Failures & Downtime sections in:
Once you have identified whether the issue is primarily with Pinpoint or ModMS, follow the detailed steps in those FAQs and architecture pages before escalating further.
When Pinpoint is down
If the issue is primarily user-facing errors from Pinpoint:- Start with the Failures & Downtime section in the Pinpoint FAQ.
- Use the recommended Grafana dashboards and API checks there to determine if the problem is:
- Load balancer / infrastructure
- Pinpoint containers or per-node issues
- Upstream ModMS, Redis, or Geos
- If ModMS is suspected, continue with the ModMS sections below.
When ModMS is down
If ModMS itself is unhealthy (UI down, stale models, or Pinpoint 5xx traced to ModMS):- Follow the Failures & Downtime section in the ModMS FAQ.
- For storage and ingestion specifics, see these ModMS pages:
When both are degraded
Sometimes both Pinpoint and ModMS show issues (for example, high error rates, timeouts, and stale data):- Use Pinpoint’s Failures & Downtime flow to confirm whether upstreams are the root cause.
- If ModMS is the bottleneck, prioritize restoring:
- GlusterFS and
/datamounts - ModMS Server, Indexer, and critical Aggregators
- GlusterFS and
- Once ModMS is stable and models are fresh, validate Pinpoint again from its FAQ checklist.
High-level debugging flow
Servers & access
For an up-to-date view of servers and environments, refer to: Use your local SSH configuration (for example entries in~/.ssh/config) to reach these nodes, following team security guidelines.
