4 A 'check list' for selecting global ocean asset data from the Inventory


This Chapter guides account compilers through a stepwise approach to using the Inventory to identify and assess the suitability of different global datasets. The stepwise approach works through the columns in the Inventory from the left to the right:

Step 1: Identify global datasets that are relevant to ocean asset account components and asset type or condition of interest (columns B-E in the Excel Inventory).

Step 2: Check datasets against the quality assessment criteria to help inform the selection of global datasets that are appropriate/suitable to the national context and ocean accounting priorities (columns F-AA in the Excel Inventory).

Step 2 provides a ‘check list’ of key considerations to keep in mind when deciding on suitable global data to fill national ocean asset data gaps. If documented, these considerations will provide important justification for decisions on global datasets used in national ocean asset accounts. This documentation will also support data quality assurance processes for account compilation. The Inventory is purposefully structured to support the consideration of criteria that are central to many data quality assurance processes implemented by national statistical offices: relevance, interpretability, institutional environment, accuracy and accessibility. A ‘check list’ template is provided in Annex III, which can be used to record any considerations on the data quality criteria in a structured way.

Whilst the assessment steps and ‘check list’ follow the structure of the Inventory, they are not intended to be prescriptive. The different criteria do not have to be considered in any particular order, or comprehensively. Users may choose to go through all criteria or select specific criteria that are particularly relevant to their interests and needs. It will depend on the national context, ocean accounting priorities and resources available which criteria are applied to decide whether to use or not use a dataset.

Step 1: Identify available datasets

The Inventory is organised by account component and ocean asset:

  • ‘Account component’ identifies the components of ocean accounts for which the datasets are relevant (as per Table 2). These include: Ecosystem extent, Ecosystem condition, Individual environmental assets, Carbon, Produced assets, Pressures, and Designated use.

  • ‘Ocean asset (type/condition)’ refers to the type of ocean asset or related conditions that the datasets are relevant for. These include: marine and coastal ecosystem types, marine and coastal condition parameters and indicators, marine and coastal environmental resources (as listed in the GOAP Technical Guidance), carbon parameters related to marine and coastal ecosystems, marine and coastal produced assets, use designations for marine and coastal areas/activities, and parameters and indicators for four types of pressures (water emissions, solid wastes, atmospheric deposition, intensity of use).

Filter options: The ‘How to use’ tab in the Inventory provides two interactive filter menus that allow to pre-select the datasets shown in the ‘Inventory’ tab. Use these filter options to find datasets relevant to the account component and asset type or condition for which data is needed. Once a pre-selection is made, Columns D and E in the ‘Inventory’ tab provide further information about:

  • The official ‘Dataset name’.

  • ‘Data measure and units’.

For example, Fiji used global data to develop pilot mangrove ecosystem accounts (Box 2). To find available datasets for ecosystem extent of mangroves in the Inventory:

After completing a search, clear the selections in the filter menus in steps 2 and 3 before starting a new search.

Box 2. Fiji used Global Mangrove Watch to develop pilot mangrove ecosystem accounts

Fiji’s ocean accounts pilot study used the Global Mangrove Watch dataset. As shown above, this dataset would be identified in the Inventory by selecting ’Ecosystem extent’ as the ’Account component‘ and ’Mangroves (coastal)’ as the ’Ocean asset’. Fiji’s pilot ocean accounts were led by Aberystwyth University and solo Earth Observation and generated preliminary mangrove ecosystem accounts for 10 of 14 provinces within Fiji. The Global Mangrove Watch dataset provided spatial coverage of Fijian islands and timeseries data (1996, 2007 – 2010, 2015, 2016), which facilitated alignment with other environmental, economic, and social mangrove-related datasets in Fiji. The dataset was used to produce ecosystem extent accounts for mangroves, calculating the change in cover per province between 2008 and 2016 (inclusive). The extent of mangroves was integrated with modelled indicators of condition, such as tree height and above ground biomass, to estimate ecosystem service supply.

To assess the suitability of the Global Mangrove Watch data for use in Fiji’s pilot ocean accounts, the dataset was compared with available regional and national datasets on mangrove extent. The comparison found both strengths and limitations for the use of the global dataset:

[-] At a higher resolution, mangroves in urban areas were poorly represented within the Global Mangrove Watch dataset. Further, smaller coastal mangroves (< 0.2 ha) on exposed coasts were often not captured.

[+] The strength of Global Mangrove Watch was that it provided greater resolution and accuracy in the coverage of deltaic mangroves, relative to national datasets.

The experience from Fiji shows that, for the future development of Global Mangrove Watch, it would be useful to consider how multiple datasets could be best integrated through ground truthing, to increase the accuracy and quality of the dataset for use at the national and sub-national scale.

Step 2: Check available datasets against the quality assessment criteria

Structure of this section:

x) DATA QUALITY ASSESSMENT CRITERIA in the Inventory (relevance, interpretability, institutional environment, accuracy and accessibility)

Data quality assessment action.

Inventory columns to check:

  • Tells you which column of the Inventory provide relevant information.

Things to consider:

  • Highlights points to think about when making a decision on using a dataset.

Country experience:

Note: The criteria, assessment action and Inventory columns to check follow the structure of the Excel Inventory from left to right.

a) RELEVANCE of the datasets for the national ocean accounting priorities

Assess whether the spatial resolution and coverage of the available datasets is suitable.

Inventory columns to check:

  • ‘Spatial resolution’ refers to the spatial resolution of the available data.

  • ‘Geographic constraints’ refers to whether the available data has any geographic constraints with regards to global coverage.

Things to consider:

  • What spatial resolution is needed? This will depend on the question that the accounts are meant to address. It is advisable to consider spatial resolution together with other criteria as the data options with the highest resolution might not necessarily be the best suited for the national context and data with high resolution may come with trade-offs and computational challenges. As the experience from Vietnam’s pilot seagrass extent accounts (Box 3) shows, there may be country-specific reasons for selecting datasets with lower resolution.

  • Some datasets that are described as ‘global’ may not provide full or equal global coverage. Certain areas may not be covered, and/or that the quality of the data may vary between different geographic areas (e.g. data quality may be poor for specific locations). Where geographic constraints exist, check if these apply to your accounting area. The example from Fiji (Box 2) and experiences from Canada (Box 4) illustrate the importance of cross-referencing global data with national data to critically evaluate its fitness for purpose for national ocean asset accounting.

  • Where data does not provide the necessary spatial resolution or coverage, are there suitable techniques to downscale or extrapolate that could be employed with the available resources?

  • Are the geographical coverage and spatial resolution consistent with other interesting existing (national) data? This will open-up opportunities for further, integrated analysis

Box 3. Global ocean data enabled the development of pilot ecosystem accounts for mangroves, seagrasses and coral reefs in the Quảng Ninh region of Vietnam

In 2019, Vietnam developed its first pilot ecosystem extent accounts for mangroves, seagrasses and coral reefs, as well as condition accounts for water quality. A national inventory on mangroves and their extent was available to support the pilot mangrove extent accounts. However, Vietnam did not have spatially explicit data on the location and extent of seagrasses and corals. To fill this gap in the national data inventories, the pilot study team supplemented the available national information with spatial data from global datasets for seagrasses and corals. The Global Distribution of Seagrasses and Global Distribution of Coral Reefs datasets, provided by UNEP-WCMC on the Ocean Data Viewer, were used to calculate the proportion of seagrass and coral for each marine basic spatial unit in the Quảng Ninh pilot accounting area. This enabled the initial development of a more comprehensive set of pilot ocean ecosystem extent accounts in Vietnam.

Vietnam updated its initial ocean ecosystem accounts in 2021 with a second pilot study for the Quảng Ninh region. This time, satellite imagery data from Landsat 8 was trialled for the seagrass ecosystem accounts. For the coral ecosystem accounts, the study team used the Allen Coral Atlas provided by the Allen Coral Atlas Partnership and Arizona State University. It is also proposed to use this satellite imagery to compile an ecosystem condition accounts for mangroves, using the satellite data to derive condition variables such as canopy cover or fragmentation.

Assess if the temporal resolution and likelihood of future production of the datasets is suitable.

Inventory columns to check:

  • ‘Time series availability’ refers to whether data observations are available for different periods or points in time (e.g. for different years, decades, five-year periods). The time series may be available in one dataset (see e.g. GMCSD-2 Global Mangrove Carbon, 2000-2012) or in multiple, comparable editions of the same data product (see e.g. Global Ocean Gridded L4 Sea Surface Height and Derived Variables; University of Hamburg-Sea level SSH from C3S)[8].

  • ‘First observation’ refers to the first year in the time series for which data observations are available (where time series exist).

  • ‘Latest observation’ refers to the latest year in the time series for which data observations are available (where time series exist), or to the year that the data are intended to represent (where time series do not exist).

  • ‘Publication date of latest observation’ refers to the year in which the dataset was published or released.

  • ‘Likelihood of future production’ refers to the likelihood that further editions of the dataset will be produced in the future. This has been assessed based on available indicative information and should not be taken as a guarantee of future production.

Things to consider:

  • Are time series available? If yes, do they align with your accounting periods of interest?

  • Can the datasets be used in combination, or in combination with national data, to cover the accounting period of interest?

  • Can data be extrapolated to the accounting period?

  • Is there flexibility to adapt the accounting period to available data to enable initial account development?

  • Accounts are intended to be updated regularly. The likelihood of a dataset being updated or produced in the future will determine whether it can be used consistently for account production moving forward or whether it will have to be replaced by other data.

  • Where future production or updates are unlikely, e.g. where a dataset is a ‘one off’ product or a time series is not likely to be continued, a dataset can still be useful to start with the initial development and testing of pilot accounts or to create a baseline. For example, ‘one off’ global maps of seagrasses and corals were used as a starting point for developing Vietnam’s first pilot ocean ecosystem accounts for Quảng Ninh (Box 5). It may also be possible to use this data for creating time series, for example in the context of training remote sensing observations for local/national ocean asset identification or integrating this baseline data with other time-series data (e.g., spatial data on designated use, Box 5).

  • It is not always clear whether a dataset is likely to be produced or updated in the future. Different factors may give an indication whether future production is likely or not. For example, where time series data are available across multiple editions of the same data product, this may be an indication that the time series will be continued in the future through production of further editions. Metadata on the INSTITUTIONAL ENVIRONMENT of the dataset may provide further indications for the likelihood of future production (e.g. see ‘Data custodian’ and ‘Global policy relevance’).

  • The publication date of the latest observation shows how soon after the production of the data it is made available. This is an indicator of the timeliness of the data that should be considered in the context of future production of the accounts. For example, where there is a substantial gap (e.g., several years) this may imply that the availability of this data may be a constraint for the timeliness of future accounts production. This could limit their ability to influence decision-making cycles.

b) INTERPRETABILITY of the datasets and processing steps required

Assess if the format of the data is suitable.

Inventory columns to check:

  • ‘Data format’ refers to the type of the data. This may be vector data (point, polyline, polygon), raster data or data in tabular format. Where spatially explicit ocean ecosystem asset accounts are being developed, these will require suitable georeferenced data formats, such as raster or vector data, or means of creating these formats.

  • ‘File format’ refers to the format in which the dataset is available. Data may be available as Shapefile, TIFF, GeoTIFF, KML, File Geodatabase, ASCII grid, CSV, XLSX, Web Map Service, Web Feature Service, netCDF, GRD98, mpk, or other formats.

Things to consider:

  • Point or polyline data provide information about the location of an ecosystem, but not about ecosystem boundaries or extent. They are therefore less suitable for developing ecosystem extent accounts. However, it might be possible to estimate the extent of ecosystems based on point or polyline data by combining them with other data types (e.g. location of ecosystems, non-spatially explicit data on area by hectare or km2). For example, this was done for Vietnam’s first pilot seagrass and coral accounts (Box5). Consider also whether the data can be extrapolated to make estimations for a larger area.

  • Is the dataset in a file format that can be used with the available the computational resources and technical expertise?

  • Many ocean accounts are developed using Geographical Information System (GIS) techniques and the data formats described above. Further details on the technical terms and features of these data are available via online resources such as the QGIS training materials: https://docs.qgis.org/3.16/en/docs/gentle_gis_introduction/index.html

Assess the wider interpretability of the data, including its underlying assumptions, limitations and possible sources of error.

Inventory columns to check:

  • ‘Acquisition method’ refers to how the dataset was produced. The Inventory presents information on whether the dataset provides raw or processed data, and whether the data were generated using in-situ collection, remote sensing or modelling.

  • ‘Metadata availability’ refers to the availability of information that enables the correct interpretation and use of the dataset. The metadata will provide important information to determine suitability of a dataset for a specific national context and what limitations and possible sources of error might need to be considered when using it.

  • ‘Supporting documentation’ refers to the availability of user guides, manuals or similar documents that enable use of the dataset.

Things to consider:

  • Are there any caveats for aggregated products that were produced by combining different datasets? For example, do they include data that may be outdated?

  • Are any sources of error described that should be factored in when using the data? Have other users shared any information on use constraints?

  • For raster or polygon data, consider whether there were any assumptions in how the data were assigned spatially that might result in errors, such as an under- or overestimation of ecosystem extent. For example, the HELCOM seagrass dataset was composed of presence/absence XY points provided by several partner countries, which were re-sampled to a coarser grid (5 km2). By rasterising point data in this manner, and without thorough metadata, seagrass extent was 5 to 8 times higher than the estimate from literature values. To correct this overestimation, the extent was limited using factors such as depth and seabed type, using known parameters for seagrass distribution. (see ‘Data format’)

  • What are the assumptions, criteria, parameters used in the production of the dataset?

  • Does the dataset account for ecosystem fragmentation?

  • Is extent based on actual observations or modelled based on assumptions of presence/absence?

  • For remote sensing and modelled data:

  • Was the model used to process satellite data or predict ecosystem extent trained in the accounting area or in an area with comparable conditions? The experience from Vietnam’s second pilot study on seagrass ecosystem accounts shows that satellite data calibrated for different conditions may still be useful for calculating ecosystem extent in the absence of better data. However, in some cases, this may lead to an overestimation (see Box 6) or underestimation of extent.

  • How well has the data been ground truthed? How many data points were used for the validation? Ideally, multiple data points that adequately cover the different locations and environmental conditions of the accounting area should have been used.

  • Where has the data been ground truthed? Has it been tested in locations with similar conditions to those in the accounting area? If not, this might have implications for the validity and accuracy of the data for the specific country context. It may require additional work to validate the data for the accounting area.

  • Is additional work needed and is sufficient technical and computational capacity available to process the data to make it consistent and applicable to the accounting area?

  • In what state are data needed? 1) Raw data can be processed using local algorithms and models for interpretation targeted to the national context. 2) Pre-processed, analysis-ready data for streamlined account production but might have limited scope for localised interpretation. This choice may depend on the priority for the national ocean account development (e.g. rapid initial account development or targeted, precise accounts), and/or on the resources, technical and computational capacity available. For example, satellite imagery data for which local classification algorithms specific to the area of interest can be developed. This may be better for national or local applications; however, this requires more technical expertise and resources than using pre-processed data.

  • Different satellite imagery may be best for different regions of the world, as illustrated by the experience from Vietnam’s second pilot study for seagrass ecosystem accounts (Box 6). This should also be critically evaluated.

c) INSTITUTIONAL ENVIRONMENT for the datasets production

Assess if the institutional environment surrounding the data production is suitable for how the accounts will be used.

Inventory columns to check:

  • ‘Data custodian’ refers to the organisation or institution that looks after and provides the dataset. Data custodians may be national government agencies or institutions (e.g. NOAA), UN agencies or programmes (e.g. UNEP), expert organisations (e.g. UNEP-WCMC), universities, research facilities (e.g. Woods Hole Oceanographic Institution), non-governmental organisations (e.g. The Nature Conservancy), or data service providers for specific sectors (e.g. Axiom EMI).

  • ‘Global policy relevance’ refers to whether the dataset is produced, formally listed, recognised or proposed for an official global policy process such as the Sustainable Development Goals or the post-2020 global biodiversity framework.

  • ‘Authoritativeness’ refers to whether the data producer can be considered to be objective, independent, professional and mandated to collect and provide data. The criterion provides information on whether the dataset has been produced on the basis of scientific/expert peer-review, by a UN agency or programme (e.g. FAO statistics), or by a national government agency (e.g. NOAA).

Things to consider:

  • Information about the institutional environment can be helpful when deciding whether a dataset can be used with confidence or whether additional work is required to validate the data. Consider:

    • Does the data custodian have an official mandate to collect and provide data? For example, is the dataset produced by a national government agency or as part of official UN statistics? Is the dataset linked to an official global policy process?

    • Is the data producer considered to be objective and independent? For example, is the dataset produced by a recognised expert organisation or research institution?

    • Can it be assumed that the dataset has been produced following robust scientific methods and internationally agreed standards? For example, has the dataset been through scientific or expert peer review? Has it been produced as part of official statistics? Has it been produced, formally listed, recognised or proposed to provide indicators for targets under multilateral environmental agreements and associated policy processes (e.g. the Sustainable Development Goals or the post-2020 global biodiversity framework)?

  • If a dataset is linked to an official global policy process, this may give further indication of the likelihood of future production and application of internationally agreed standards.

  • If a dataset is produced by an organisation with a mandate and funding for regular data production this will also provide confidence of future production. One example for this would be the National Aeronautics and Space Administration (NASA).

d) ACCURACY of the datasets

Assess if the accuracy and reliability of the data are suitable for how the accounts will be used.

Inventory columns to check:

  • ‘Quality assurance’ refers to whether documentation on quality assurance and validation is available for the dataset.

  • ‘Errata and known issues’ refers to whether information about corrected errors and/or known issues is available for the dataset. Where information is available, these issues may be highlighted in in ‘Errata’ documentation, providing short descriptions on the nature of the issue.

Things to consider: The Inventory identifies whether information and documentation about quality assurance, known errors and issues is made available by the data provider. However, the Inventory does not provide details about quality assurance, validation and ground truthing, sources of error or known issues as the relevance of these will depend on the national context and priorities for ocean account development.

  • Where information about quality assurance and/or errors and issues is available, this provides confidence that accuracy, reliability, validity and consistency have been considered, and data quality standards applied, in the data production.

  • Where available, the information about quality assurance and/or errors and issues should be reviewed before using a dataset to check whether:

    • The applied data quality standards conform with any relevant national data quality standards.

    • Existing validation and ground truthing of the data are relevant for the national context.

    • Any known issues of the global data and their quality that affect their use in the accounting area of the national ocean accounts.

    • Additional work is needed to ensure and/or increase the accuracy and reliability for the national context (e.g. ground truthing for the accounting area).

  • Where information about quality assurance and/or errors is NOT available, this does not necessarily mean that the data is inaccurate, unreliable, inconsistent or not valid. However, caution should be applied when using the data. Additional work may be required to do quality assurance checks and validation before the data can be used with confidence. The available metadata, including information about the acquisition method, should be reviewed carefully to identify any relevant methodological assumptions, criteria, parameters or sources of error that might be described there.

e) ACCESSIBILITY of the datasets

Assess if the data are easy to access and use.

Inventory columns to check:

  • ‘Availability online’ refers to whether the dataset is easily accessible online and downloadable (in one or multiple open access formats), or whether the data provider must be contacted for access and/or permission.

Things to consider:

  • Is the data easy to access and download online? Are instructions provided?

  • Do you have the right software and know how to access and use the data in the available formats? (see ‘File format’ under INTERPRETABILITY)

  • Where the data is not available to download, are clear instructions given for how it can be accessed? Is an email address provided? Is there an online request form?

  • In some cases, the link to access the data is at the bottom of the page or on a separate page. The link in the Inventory should guide you directly, or as close as possible to the place on the website where the data can be downloaded.

Assess if the data are available for free and can be used for national ocean accounts without restrictions.

Inventory columns to check:

  • ‘Terms of use’ refers to the terms of use for the data. This includes information on a) whether the data is available for free or requires payment, and b) whether there are any restrictions or requirements for the use of the data (e.g. attributions required; permission required; no derivate products; non-commercial use only; etc.).

Things to consider:

  • Most of the datasets in the Inventory are available for free and with minimal use restrictions (usually requiring attribution).

  • In some cases, use may be restricted. Restrictions are often related to commercial use, which should not affect the use of the data for national ocean account development. For example, data from the World Database on Protected Areas is freely available for non-commercial use, whereas commercial users must pay a fee via the IBAT[10] Alliance.

  • The Inventory currently includes one dataset that is available at a cost: Axiom EMI Oil & Gas and Renewables Data (Global): Offshore Wind Database (Renewables). Fees are more likely to apply for datasets related to specific industries/sectors that are provided by commercial data service providers. In some cases it may be worth considering the use of paid data services, where they exist, to fill specific data gaps.