Project Requirements
Overview
This page summarizes the major requirements of the SeismoScan system and outlines
the development process used to build the project. These requirements guide the
functionality, performance expectations, and constraints of the final application.
Functional Requirements
Organized by Must, Should, Could, and Won’t:
Must Have
- Command line interface for running the program
- Ability to load and process NetCDF bathymetry data
- A functioning ML pipeline for fault detection
- Output of predicted coordinates in a text-based format
- Support for a configuration file
Should Have
- Confidence scores in a separate output file
- Debug/log files for troubleshooting
- Dependency installer or setup utility
Could Have
- Adjustable sensitivity parameters
- Optional output attributes (slope, azimuth, etc.)
- GMT-generated fault visualization
Will Not Include
- No GUI interface
- No cloud or database storage
- No requirement for real-time performance
Performance Requirements
- Process typical datasets within ~15 minutes
- Detect at least 50% of faults (goal: 70–80%)
- Keep false positives under 30% (goal: 20%)
- Produce consistent results across repeated runs
- Handle the largest provided dataset without memory failure
Environmental Requirements
- Must support bathymetry stored in NetCDF format
- Must run in a Linux environment
- Must use Python for ML ecosystem compatibility
- Outputs must be in human-readable text files
Development Process Overview
The SeismoScan project follows a structured development cycle, progressing from research
to a complete ML-driven detection system:
- 1. Research & Planning: Understand bathymetry, submarine faults, and client needs.
- 2. Technology Evaluation: Compare data formats and ML approaches, select final stack.
- 3. Preprocessing Workflow: Build tools for slope calculation, gradient extraction,
filtering, and data cleanup.
- 4. ML Pipeline Development: Train and integrate CNN feature detection, clustering,
and Random Forest classification.
- 5. Integration: Link preprocessing, ML models, configuration settings, and output generation.
- 6. Refinement: Tune accuracy, add optional features, generate logs, and validate results.