Skip to main content
This document describes how to verify, troubleshoot, and recover ModMS and Pinpoint services when issues arise.

General Troubleshooting

This page is a high-level operations runbook. For detailed architecture and deployment docs, see: If Pinpoint or ModMS returns 5xx errors, start with the Failures & Downtime sections in: Once you have identified whether the issue is primarily with Pinpoint or ModMS, follow the detailed steps in those FAQs and architecture pages before escalating further.

When Pinpoint is down

If the issue is primarily user-facing errors from Pinpoint:
  • Start with the Failures & Downtime section in the Pinpoint FAQ.
  • Use the recommended Grafana dashboards and API checks there to determine if the problem is:
    • Load balancer / infrastructure
    • Pinpoint containers or per-node issues
    • Upstream ModMS, Redis, or Geos
  • If ModMS is suspected, continue with the ModMS sections below.

When ModMS is down

If ModMS itself is unhealthy (UI down, stale models, or Pinpoint 5xx traced to ModMS):

When both are degraded

Sometimes both Pinpoint and ModMS show issues (for example, high error rates, timeouts, and stale data):
  1. Use Pinpoint’s Failures & Downtime flow to confirm whether upstreams are the root cause.
  2. If ModMS is the bottleneck, prioritize restoring:
    • GlusterFS and /data mounts
    • ModMS Server, Indexer, and critical Aggregators
  3. Once ModMS is stable and models are fresh, validate Pinpoint again from its FAQ checklist.

High-level debugging flow

Servers & access

For an up-to-date view of servers and environments, refer to: Use your local SSH configuration (for example entries in ~/.ssh/config) to reach these nodes, following team security guidelines.