SeismoScan combines scientific computing tools, machine learning frameworks, and geospatial software to detect submarine faults from bathymetry data. These technologies were selected through a structured evaluation process, comparing multiple data formats, model architectures, and library options before choosing a stack that balances accuracy, scalability, and usability.
NetCDF was chosen as the primary storage format for bathymetry due to its ability to store multidimensional scientific data efficiently. It cleanly organizes latitude, longitude, and depth arrays while keeping metadata intact. NetCDF integrates well with our preprocessing environment and machine learning tools.
Other formats such as GeoTIFF and HDF5 were evaluated, but NetCDF offered the best balance of file size, structure, compatibility with GMT, and ease of use in Python.
Python is the core language for SeismoScan. It provides strong support for scientific computing and machine learning, making it ideal for building a workflow that involves numerical processing, spatial analysis, and model development.
Its extensive library ecosystem, readability, and adoption across research fields made it the most practical and scalable choice compared to alternatives like C or Julia.
SeismoScan uses a hybrid machine learning approach. Instead of relying on a single model, the system combines multiple specialized techniques to detect and refine potential fault structures in bathymetry data.
The CNN identifies spatial edges and depth discontinuities that often correspond to tectonic structures. Its ability to learn patterns makes it a powerful first step for fault detection.
Clustering groups nearby detections into continuous shapes. This reduces noise, removes isolated points, and organizes model outputs into coherent structural lines.
A Random Forest model evaluates the grouped structures and distinguishes between likely faults and natural seabed variations. It adds interpretability and stability to the detection pipeline.
Together, these components create a robust workflow that can interpret complex seafloor topography and produce reliable predictions.
Handles fast numerical computation and large array operations essential for analyzing bathymetry grids.
Used to format final outputs, organize coordinate data, and support optional logging and configuration management.
Reads NetCDF files, loads depth maps, and extracts structured spatial information.
Provides mathematical tools for smoothing, filtering, and interpolating bathymetry surfaces during preprocessing.
Powers clustering algorithms and the Random Forest model used in the classification stage of the pipeline.
Used for training convolutional neural networks to detect spatial features in the input grid.
GMT is used to compute slopes and visualize bathymetry, which assists in validating model predictions. PyGMT allows Python-based integration for preprocessing steps such as gradient extraction and geographic transformations.
Helps verify predicted faults against known geological features and provides a powerful visual environment for exploring bathymetry datasets.
The technologies chosen for SeismoScan complement one another and create a reliable, research-ready workflow. Python allows rapid development and strong ML support, NetCDF provides structured geospatial data storage, and GMT ensures accurate slope and gradient computation. Combined with a hybrid ML pipeline, the system delivers a balanced solution that is efficient, scalable, and scientifically meaningful.