Yesterday I was asked if we had some (very) old files from a project. They were originally generated by one of our freelancers, who at the end of the project, had given us a backup of his external drive for archiving.

My go-to for this kind of long-term storage is AWS S3 or Glacier. I’d backed up the drive to a bucket and had forgotten about it until now.

The aws CLI is powerful but inscrutable, given that it can do anything the web console can, for any AWS service.

The files were from Sibelius (.sib) and for Clarinet, Flute and Sax. Cue some googling about the aws s3 list-objects command.

You can combine the search terms to the --query parameter to limit the results of the list-objects call. Searches are case sensitive, and I wasn’t sure how the files would have been named originally, so I went for:

--query "Contents[?(contains(Key,'larinet') || contains(Key,'lute') || contains(Key,'ax')) && contains(Key,'.sib')]

To give me the best chance of finding Clarinet or clarinet, etc (no good of they’re named CLARINET, but what kind of monster would do that?).

Because I only need the path to the S3 object for the get-object call, I’ll filter on just those:

aws s3api list-objects --bucket "my-bucket-name" --prefix 'path/to/files/' --query "Contents[?(contains(Key,'larinet') || contains(Key,'lute') || contains(Key,'ax')) && contains(Key,'.sib')]"

This gives me an array of json objects like:

        "Key": "path/to/files/Project backup/Clarinets/example.sib",
        "LastModified": "2017-09-18T21:01:19+00:00",
        "ETag": "\"e9abc61fab413ec5de1126e7ab48ac38\"",
        "Size": 226180,
        "StorageClass": "STANDARD",
        "Owner": {
            "DisplayName": "owner-name",
            "ID": "d9fc4a52c54711ec9d640242ac120002f54ce640c54711ec9d640242ac120002"

We can then loop through these, downloading the files. Instead of just dumping them all in one directory, let’s replicate the path structure they’ve been saved to:

# Get the list of objects to download - note that the Key search is case sensitive
# hence using larinet instead of clarinet or Clarinet, etc
aws s3api list-objects --bucket "my-bucket-name" --prefix 'path/to/files/' --query "Contents[?(contains(Key,'larinet') || contains(Key,'lute') || contains(Key,'ax')) && contains(Key,'.sib')]" > files.json

# loop through the list, grabbing each object, putting it in the relevant folder
cat files.json | jq -r '.[] | .Key'| while read key
  do dir="$(dirname "$key")"
     file="$(basename "$key")"     
     mkdir -p "$dir"
     aws s3api get-object --bucket "my-bucket-name" --key "$key" "${dir}"/"${file}"