browniebroke.com

Add cache-control header to an entire S3 Bucket using Boto3

May 01, 20191 min read

I recently came across a task which seem pretty generic, but for which I couldn’t find an existing solution online: update the Cache-Control header for an entire S3 bucket using boto3.

Existing solutions

  • Do it through the AWS management console, but I wanted to script it.
  • Do it using Boto 2, but this is no longer installed on our system, I didn’t want to reintroduce an outdated dependency.

Boto3 solution

After a bit of fiddling and digging through the documentation, I came up with this pretty simple script:

s3 = boto3.resource('s3')
bucket = s3.Bucket('my-public-bucket')
for summary in bucket.objects.filter(Prefix='static'):
    obj = summary.Object()
    obj.copy_from(
        CopySource={
            'Bucket': 'my-public-bucket',
            'Key': obj.key,
        },
        CacheControl='public,max-age=604800,immutable',
        MetadataDirective='REPLACE',
    )

The value depends a lot on your use case, you might want ot read the excellent Cache-Control for civilians post which goes into detail.

This solution uses the copy_from API, which a bit of an unnatural API for achieving this goal, but it does the trick, and seems to be the only way at the moment. Hopefully this may help others.

Failures to get there:

  • I initially tried via the put API, but this actually overrides the existing file with an empty one, which is not what I wanted.
  • The copy to itself operation failed without MetadataDirective='REPLACE'. I was about to give up but then discovered about it.
boto3awss3bucketpythonpython3cache-control