Downloading Water Column Sonar Data from AWS

This tutorial covers how to download water column sonar data stored in an Amazon Web Services (AWS) S3 bucket using S3 file paths provided by the map viewer data discovery portal. Two methods for downloading these files will be demonstrated.

Setup

Parse JSON data

From the map viewer, you can access a JSON formatted listing of S3 file paths for a selected dataset.

s3_downloads_1of2.png

s3_downloads_2of2.png

Read in and parse the JSON data from the resulting url to extract the file paths needed to download files. Example JSON output

Download data

Data files are stored in an Amazon Web Services S3 bucket and are accessible for immediate download using a variety of tools. This tutorial covers two methods - using the boto3 library and using the subprocess library to run AWS CLI commands.

Method 1: Using boto3

The library boto3 provides an object-oriented and well documented interface to the data set. We can configure the boto3 resource to access the bucket, "noaa-wcsd-pds" as an anonymous user using low-level functions from botocore.

Let's parse the JSON data for files and store them in a list to drive data downloading. We'll pull file names and object keys from the JSON.

The object key looks like:
'data/raw/Okeanos_Explorer/EX1709/EM302/0352_20171031_101728_EX1709_MB.wcd'

The filename:
'0352_20171031_101728_EX1709_MB.wcd'

We'll also filter the bucket for files associated with any tarballs since acoustic data in the bucket are stored as individual files. Finally, we'll pull README files, which contain essential metadata about the datasets the data are associated with.

Now let's download the data. Loop through the list of file information we created in the previous step and use boto3 to download the files.

Method 2: Using AWS CLI

AWS CLI is a tool that allows interaction with an S3 bucket from the command line. To get started, first install the tool. Note that there are two versions of AWS CLI available with different installation requirements. Version 1 requires python 3.6+ on your system. Version 2 requires an AWS account. Once installed, use the subprocess python library to run AWS CLI commands.

Additional Resources

Documentation on the bucket structure, as well as tools and tutorials for data processing are listed below.

AWS Registry Page: https://registry.opendata.aws/ncei-wcsd-archive/

Tutorials: https://cires.gitbook.io/ncei-wcsd-archive/

Contact: wcd.info@noaa.gov