Architecture

Architecture Diagram

Components

Aggregator
Indexer
Server
Receiver

The data ingestion component responsible for downloading and preprocessing model data.Responsibilities:

Downloads models according to schedule
Performs model-specific post-processing on GRIB files
Notifies the indexer when new files are ready
Must be on the same server as the indexer to share the filesystem

The Aggregator and Indexer must run on the same server to share the filesystem for efficient data transfer.

The data processing component that converts and prepares data for storage.Responsibilities:

Creates NetCDF files from GRIB files
Copies output files to GlusterFS mountpoint
Notifies the server of new model runs
Responsible for naming variables consistently

Variable naming is critical for API queries. Ensure variables are properly defined in grib[12].csv files.

Data Flow

Data Download

The Aggregator downloads model data from external sources according to schedule.

Preprocessing

Model-specific post-processing is applied to GRIB files.

Conversion

The Indexer converts GRIB files to NetCDF format for efficient storage and querying.

Storage

Processed files are copied to the GlusterFS distributed storage system.

Indexing

The Indexer notifies the Server of new model runs available for querying.

Serving

The Server makes the data available through its flexible API.

Storage Architecture

ModMS uses GlusterFS for distributed storage, providing:

The GlusterFS mountpoint at /data must be properly configured and accessible. If issues occur, see the FAQ for troubleshooting steps.

Servers & environments

Core ModMS nodes
File storage

Typical ModMS deployment uses:

n01.modms.devops.arabiaweather.com
n02.modms.devops.arabiaweather.com
indexer.modms.devops.arabiaweather.com (Indexer server)

These nodes share the GlusterFS volume mounted at /data and host the Aggregator, Indexer, and Server components.

All ModMS nodes must have:

GlusterFS peers in Connected state
The volume mounted at /data

For operational commands and troubleshooting, see:

GlusterFS Architecture Details

ModMS uses a replicated GlusterFS volume with:

Multiple brick nodes for data storage
Arbiter node for split-brain prevention
Self-healing daemon for automatic recovery
TCP transport for network communication

Performance Considerations

Server Startup Time

The server scans all NetCDF files at startup, which can take approximately 3 minutes with large datasets.

For cold starts, it’s recommended to start with 2 container replicas and scale up once ready.

File System Requirements

The Aggregator and Indexer must share a filesystem, requiring them to run on the same server.

Storage Optimization

NetCDF format is used for efficient querying and reduced storage overhead compared to raw GRIB files.

ModMS

MAMS

Pinpoint

Legacy Met Tools

Geos

Geoterra

Architecture Diagram

Components

Data Flow

Storage Architecture

Servers & environments

GlusterFS Architecture Details

Performance Considerations

ModMS

MAMS

Pinpoint

Legacy Met Tools

Geos

Geoterra

​Architecture Diagram

​Components

​Data Flow

​Storage Architecture

​Servers & environments

​GlusterFS Architecture Details

​Performance Considerations

Architecture Diagram

Components

Data Flow

Storage Architecture

Servers & environments

GlusterFS Architecture Details

Performance Considerations