This project implements a machine learning pipeline for analyzing Magnetospheric Multiscale (MMS) spacecraft data using Gaussian Mixture Models (GMM). The pipeline processes ion spectrogram data to identify and classify different plasma regions in the magnetosphere.
- Reads MMS spacecraft data from CDF (Common Data Format) files
- Extracts ion spectrogram data and timestamps
- Resamples the data to 1-minute intervals for analysis
- Uses spacepy.pycdf for handling CDF files
Processes ion spectrogram data to extract three key features:
- ratio_max_width: Measures peak characteristics in the energy spectrum
- ratio_high_low: Compares high-energy (>4000 eV) to low-energy (<100 eV) intensities
- norm_Bt: Normalizes the total magnetic field to 50 nT
- Zeros all features if:
- High-energy spectrum shows strong linear correlation (r > 0.7)
- No valid peaks are detected
- Saves processed features to CSV format
Creates visual representations of the GMM clustering results:
- Loads processed features and trained GMM model
- Generates 2D scatter plots of the first two features
- Visualizes GMM clusters using:
- Color-coded cluster assignments
- Ellipses showing Gaussian distributions (2σ)
- Clear labels and legends
- Raw CDF data → read_cdf.py
- Resampled time series → feature_engineering.py
- Extracted features → GMM model
- Cluster assignments → Visualization
- Python 3.9
- spacepy
- numpy
- pandas
- scikit-learn
- matplotlib
- joblib
gaussian_mixed_model_mms/
├── data/
│ ├── processed/
│ └── raw/
├── models/
│ └── gmm_v1.py
├── scripts/
│ ├── feature_engineering.py
│ └── read_cdf.py
├── visualizations/
│ ├── vis_v1.py
│ ├── vis_v2.py
│ └── vis_v3.py
├── requirements.txt
├── Dockerfile
└── README.md
-
Clone the repository:
git clone https://github.com/YOUR_USERNAME/gaussian_mixed_model_mms.git cd gaussian_mixed_model_mms -
Using Docker:
docker build -t gaussian_mixed_model_mms . docker run -v $(pwd)/data:/gaussian_mixed_model_mms/data gaussian_mixed_model_mms
- Place raw CDF files in the data/raw directory
- Run the scripts in order:
python scripts/read_cdf.py python scripts/feature_engineering.py python visualization/vis_v3.py
- Processed features are saved in
data/processed/features.csv - GMM model is saved in
models/gmm_model.pkl - Visualization plots show cluster assignments and Gaussian components
- The project is designed for MMS ion spectrogram analysis
- Features are carefully chosen to identify plasma regions
- Visualization helps in understanding the clustering results