Root Cause
The server scans all NetCDF files at startup to build its metadata index. As the number of files increases over time, this scanning process takes longer. Why this happens:- The server needs to index all available model runs
- Metadata is extracted from each NetCDF file
- This index is used for efficient querying during runtime
Current Limitation
Lazy loading of NetCDF files and their metadata is not currently implemented, so all files must be scanned at startup.A different approach could be taken to load NetCDF files and their metadata lazily, but this is not currently implemented. This would significantly reduce startup time for large datasets.
Recommended Solution
When doing a cold startup, use a gradual scaling approach: Cold Start Strategy:- Start with minimal replicas (recommended: 2)
- Wait for initialization - Let those replicas be ready and fully initialized
- Scale up gradually - Scale up to the desired number of replicas once the initial ones are operational
- The initial load is distributed across multiple instances
- Users can access the service while additional replicas are starting
- System resources are used more efficiently
- No single point of failure during startup
Performance Impact
Factors affecting startup time:- Number of NetCDF files in storage
- File sizes and complexity
- Storage I/O performance
- Server hardware resources
- Small datasets (< 1000 files): ~30 seconds
- Medium datasets (1000-10000 files): ~1-2 minutes
- Large datasets (> 10000 files): ~3-5 minutes
Monitoring
Monitor server startup in logs to track:- Time taken for metadata scanning
- Number of files processed
- Any errors during initialization

