CodePipeline and S3

Using CodePipeline and S3 to protect your software deploys in AWS

One of the great threats to your software is supply chain poisoning where malicious code gets inserted into your system. This can typically happen if third-party packets or frameworks get compromised by bad actors, and there have been several examples of this happening to npm and pypi packets as well as docker images.

This is one of the reasons why you should never have the download and installation of third-party components in your build scripts. Let’s dive right in to settle what should be done.

How I do it

Make sure to check in those package-lock.json and requirements.txt files. They should still be updated regularly, but not automatically mind you, at every build. You should also set up different target environments for your deployments. A minimum of two, but preferably three - Development, Staging and Production. The preferred way to do this in AWS is to have separate accounts for each environment. You could use different regions, but the regions can differ in what services are available, so that is entirely depending on the services and needs you have in every specific case.

The development environment is where you move fast and break things. This is where you have your CI/CD pipeline running to be able to support swift development.

Staging is where integration and regression tests are run and when deploying to the staging environment, I suggest using AWS CodePipeline and S3 as the source of your code for the deployment. This strategy allows you to guarantee that you’re using the exact same code in a production deployment by simply copying the object from your staging S3 to your production S3. Not bad, eh?

Another benefit is that it’s easy to do a rollback since you can select a previous version of the S3 object as the trigger for the CodePipeline. You can get the pipeline to trigger automatically on a specific object (key) in a bucket, when using S3 as a source. I like to create two copies of the same release, one with the version and/or short git hash as a postfix and a second one with a known postfix, like latest. I then use the copy with the known name as the source for the pipeline. That way the pipeline is always triggered by an object with the same name, but I still have easy access to previous versions. Again, not bad, eh? Do note though, that to get a pipeline to trigger on a specific S3 upload event you need to add a CloudTrail event selector for the S3 event. If you’re using CDK to deploy your pipeline the example in the document will create a multi-region trail with management events enabled which might become expensive. Expensive is rarely good, regardless of if it’s money, time or performance we’re talking about.