In Danish waters, nearly three quarters of the seafloor have never been mapped to modern standards. Worldwide, the situation is much the same: the most recent “Seabed 2030 Update” reports a global seafloor coverage of only 27,3%. Modern hydrographic surveys demand dedicated vessels sailing on dense track lines in controlled patterns. The resulting data is of high quality, but the process is costly and time-consuming, limiting the number of hydrographic surveys that can be completed each year. The result is a persistent shortfall, as hydrographic resources are concentrated on high-priority areas essential for navigational safety, leaving extensive regions unmapped or covered only by older surveys.
Further complicating matters, the seabed is far from static. Currents and waves continually reshape sandy or soft bottoms, while human activities, such as port expansions, channel dredging, and the construction of offshore wind farms, introduce further changes to the marine environment
Addressing the limitations of current survey capacity cannot be achieved by navy vessels alone. Instead, broader participation and innovation is required. DGA has therefore explored the possibility of leveraging crowdsourced bathymetry (CSB), where ordinary vessels contribute depth measurements during routine operations. This distributed approach offers the potential to increase coverage, keep pace with dynamic seabed changes, and gradually validating the oldest data. It can also serve as a decision-support tool to prioritise re-surveying efforts and, where data quality meets required standards e.g. IHO S-44, directly contribute to official charting.
To address these challenges, Danish Geodata Agency (DGA) joined the European research project MobiSpaces, funded by the EU’s Horizon Europe programme. Over the last three years, MobiSpaces developed technologies for data governance, analytics, and edge computing across mobility domains, from urban traffic to the maritime environment. DGA collaborated with the Austrian Institute of Technology (AIT) in the “CrowdSeaMapping” use case, which investigates how depth data collected from ordinary vessels can be integrated into modern workflows by leveraging machine learning models to automatically identify errors and anomalies in crowdsourced depth data.
Leveraging the crowd
Perhaps the most widely recognized example of successful crowdsourcing is Wikipedia. In the geospatial domain, projects such as OpenStreetMap have mapped large parts of the world through voluntary contributions, sometimes surpassing the detail available in official national or commercial datasets.
A comparable approach has emerged in the marine sector, where the International Hydrographic Organization (IHO) defines it as Crowdsourced Bathymetry (CSB). Most modern vessels are already equipped with reliable echo sounders and GNSS systems, yet these measurements are typically used only in real-time navigation and are not stored. The main limitation is the absence of a dedicated system for capturing and transmitting this data. High-speed links such as 4G and 5G are confined to coastal waters, while satellite bandwidth remains costly. Consequently, any practical data collector must be capable of storing raw GNSS and echo sounder data onboard, with deferred transmission when affordable broadband connections become available. To ensure regulatory compliance, the data collector is geofenced to restrict collection to the Danish Exclusive Economic Zone (EEZ).
Proposed architecture
Crowdsourcing however, poses its own problems. Firstly, the data collection is not supervised by a qualified surveyor to ensure optimal system operation. Second, as the crowd grows so does the amount of data that needs to be transmitted and processed. Data cleaning is labour and time consuming. Therefore, DGA and AIT jointly experimented with using federated learning to handle the processing on the data collector itself. This not only reduces the amount of data that needs to be transmitted but also frees up resources as most of the data cleaning operations are handled directly at the source, i.e. the edge nodes of the federated learning system (Fig 1.).
Fig 1: Federated learning data pipeline
Raw CSB data is prone to artefacts such as false bottoms from double returns, spikes from aeration or cavitation, offsets from unmodeled draught or tides, and occasional timing or sensor errors. Normally, manual data cleaning is necessary but – at this scale – it is infeasible, so the CrowdSeaMapping approach proposes that data collectors employ an onboard AI model, MapFed that allows for continuous modelling of the sea floor and the detection of anomalies.
MapFed learns the expected depth distribution for each location using an adaptive prototype approach. The model is initialized from an existing bathymetric grid – in this case, the Danish Depth Model – and can be continuously trained with incoming survey and/or CSB data. Measurements that deviate significantly from the learned distribution are flagged locally. Flagged points are then submitted for expert review. Hydrographers assess whether these represent true artefacts to be discarded or valid deviations that should be assimilated into updated bathymetric grids. This workflow creates a feedback loop: domain expertise continuously improves both the anomaly detection model and the underlying bathymetric reference.
Tests and Results
Several components of the proposed system were evaluated both at sea and in controlled environments. The data collector was installed aboard a vessel and underwent extensive field testing, demonstrating reliable performance under operational conditions. In parallel, the MapFed anomaly detection model and the federated learning setup were validated in a laboratory environment, confirming their technical feasibility and potential for reducing data storage and transfer requirements.
Although full integration of hardware and software components is still required before the system can be considered commercially deployable, the trials have clearly demonstrated the feasibility of the approach and its potential to deliver scalable crowdsourced bathymetry.
Hardware
The data collector was developed by the Danish company Sternula and consists of a hardware interface that can listen to a ship’s NMEA network. The NMEA network connects ships sensors such as echo sounder and GNSS. The data collector also has a Raspberry Pi compute module installed which is used to run the MapFed model (Fig 2.).
Fig 2: Egde device installed aboard Dana.
First deployment (2023)
Fig 3. The track line of the first sea trial. Background Danish Depth Model v2
The system was first tested in the summer of 2023 on board the research vessel Dana IV (operated by DTU Aqua). During this trial, the data collector operated continuously and without failure for 37 days (Fig 3.). The collected CSB data proved immediately useful in validating two incoming but conflicting surveys of Skagerrak. In addition, the results were later incorporated into the second version of the Danish Depth Model (DDM) which was released in August 2024, demonstrating that cleaned CSB data provided better input than interpolated estimates. While the coverage was limited – approximately 8,500 grid cells (50 × 50 m), corresponding to ~21 km² – this represented a significant first step for DGA.
Second deployment (2024–2025)
Fig 4: Track line from second sea trial in 2024-2025. Background Danish Depth Model V2
A second, longer deployment took place between April 2024 and March 2025. Over this period, the system again proved robust and operating for nearly one year without intervention. During this test, depth data from large areas of the North Sea was collected which has potential to be used in future versions of the DDM.
Data Assessment
To assess MapFed’s performance, data from this second deployment was manually cleaned by a hydrographic expert and compared against the model’s output. The comparison showed strong agreement between the MapFed model and expert judgment (on over 85% of the records) even though the model had to be trained on the public DDM with only 50 meter resolution and depth values of varying reliability (ranging from high-quality multibeam surveys to interpolated estimates from historical lead-line measurements. To avoid propagating uncertainty, interpolated values were excluded during training, which resulted in gaps in areas where no suitable reference data existed.) Unsurprisingly, MapFed struggled most in areas with lower DDM reliability since the absence of reliable training data led to misclassifications.