Skip to main content

Architecture Diagram

ModMS system architecture showing Aggregator, Indexer, Server, and Receiver components with data flow

Components

The data ingestion component responsible for downloading and preprocessing model data.Responsibilities:
  • Downloads models according to schedule
  • Performs model-specific post-processing on GRIB files
  • Notifies the indexer when new files are ready
  • Must be on the same server as the indexer to share the filesystem
The Aggregator and Indexer must run on the same server to share the filesystem for efficient data transfer.

Data Flow

1

Data Download

The Aggregator downloads model data from external sources according to schedule.
2

Preprocessing

Model-specific post-processing is applied to GRIB files.
3

Conversion

The Indexer converts GRIB files to NetCDF format for efficient storage and querying.
4

Storage

Processed files are copied to the GlusterFS distributed storage system.
5

Indexing

The Indexer notifies the Server of new model runs available for querying.
6

Serving

The Server makes the data available through its flexible API.

Storage Architecture

ModMS uses GlusterFS for distributed storage, providing:
The GlusterFS mountpoint at /data must be properly configured and accessible. If issues occur, see the FAQ for troubleshooting steps.

Servers & environments

Typical ModMS deployment uses:
  • n01.modms.devops.arabiaweather.com
  • n02.modms.devops.arabiaweather.com
  • indexer.modms.devops.arabiaweather.com (Indexer server)
These nodes share the GlusterFS volume mounted at /data and host the Aggregator, Indexer, and Server components.

GlusterFS Architecture Details

ModMS uses a replicated GlusterFS volume with:
  • Multiple brick nodes for data storage
  • Arbiter node for split-brain prevention
  • Self-healing daemon for automatic recovery
  • TCP transport for network communication

Performance Considerations

The server scans all NetCDF files at startup, which can take approximately 3 minutes with large datasets.
For cold starts, it’s recommended to start with 2 container replicas and scale up once ready.
The Aggregator and Indexer must share a filesystem, requiring them to run on the same server.
NetCDF format is used for efficient querying and reduced storage overhead compared to raw GRIB files.