Duel stack life - exchanging momentum for speed

The need for speed - analysing the speed differences in AWS vs GCP

James Montgomery

5 minute read

TL; DR

I knew when I began to duel stack this site it would introduce challenges which would not exist in a single cloud solution. In this case, I found myself chasing speed demons at the expense of focus on other project time. I’ve enjoyed every minute of it - however, there’s a lesson in there.

Multi-cloud thoughts

This site is a project which began on one hand as a reason to explore the GCP Firebase hosting product. Extending it to AWS has added an interesting dimension to it whilst still being relevant to the project itself. There are some strong feelings out there on multi-cloud as a default strategy:

Multi-cloud also leads to inevitable comparisons. One might be the speed of operation or perceived performance. This comparison whilst easy to consider on the surface is fraught with blind alleys and data fallacies. This tweet offers a useful bingo card for the subject:

I had previously arrived at the decision to employ the AWS version of this site in an active-passive manner with manual intervention. I also spent some time considering how I might go about changing posture to automatic failover or even active-active. For science.

Comparing GCP and AWS

Prior to any of this, I did some basic analysis of the site via Google’s own PageSpeed Insights. Then I was introduced to GTmetrix which offered deeper insight alongside the same PageSpeed score.

Point in time performance (Canada)

Point in time report for ja.mesmontgomery.co.uk
Naturally, I should compare this to the AWS hosting location. I expected the same results, after all, it was just a static site.
Point in time report for the Amplify location
I wasn’t expecting this difference in page time to fully load. I understand a point measurement is not indicative of performance so I took a view over a longer period of time.

Historic performance (Canada)

GCP last week

Firebase daily performance history

Amplify last week

Amplify daily performance history

The key takeaway was that the time to first paint, a key user-perceivable metric, was consistently higher via the Amplify solution from this test location. GTmetrix describes first paint time as:

First paint time is the first point at which the browser does any sort of rendering on the page. Depending on the structure of the page, this first paint could just be displaying the background colour (including white), or it could be a majority of the page being rendered.

The monitoring interval was still daily and this was a single location so we can’t really establish anything more concrete.

Indeed, when I use another test location we get comparable results to Firebase:

Amplify via GTmetrix London

In all likelihood, this is a location specific issue whereby it is using a sub-optimal edge or related connectivity assuming all other factors in the test are equal. But what if it wasn’t? I delved deeper.

Getting into the weeds

I noticed via GTmetrix that the caching of the items were low TTL relative to Firebase. 5 seconds vs 1 hour.

TTL comparison

I hypothesised that perhaps there was less edge caching. Here is the quote from the Amplify user guide on the impact of TTL selection:

You can control how long your objects stay in a CDN cache before the CDN forwards another request to your origin. Reducing the duration enables you to serve dynamic content. Increasing the duration means your users get better performance because your objects are more likely to be served directly from the edge cache. A longer duration also reduces the load on your origin.

Updating the TTL was straightforward:

UI for changing the TTL
I found that changing this didn’t take effect on redeploying the latest build and needed to push a change. Unexpectedly this only updated the new items:
Effect of the TTL change

The Amplify documentation on application performance was recently updated to reflect their instant cache invalidation approach. The observed TTL changes were consistent with that approach.

Rather than go deeper at this point into trying to understand the Amplify stack I put CloudFlare proxy in place to force caching of all items at a value greater than or equal to Firebase. The performance from this location was unchanged.

It was time to take a step back - this was not how to draw a horse:

This is not how to draw a horse
Image author unknown. Here is the first place I saw it.

A wider view

I considered how you might operationally support multiple CDNs and assert confidence in parallel content delivery. For that outcome, I’d need to measure within the context of CDN performance domains. CDNPerf emerged as an interesting tool as did the associated PerfOps product. We can see here the geographical differences via these point in time tests from CDNPerf test locations around the world:

Firebase Global Latency

Firebase Global Latency via CDNperf

Amplify Global Latency

Amplify Global Latency via CDNperf

PerfOps and Flexbalancer

The FlexBalancer service was an unexpected find. It appeared to take metrics such as the user experience and geo-proximity in the consideration of providing a DNS based response. The product was not generally available when I enquired, however, the PerfOps team were kind enough to give me access to their open beta.

I’ve had some time to experiment with it. Their product is an interesting approach to traffic steering and I hope it does well. At the time of writing, there was a free-tier on the pricing page and I hope to take advantage of it upon launch.

In conclusion

As I based this site on a template I’ve had to understand how it works before I could reach conclusions on content performance. To complicate matters the CDN used for primary content delivery is only part of the user experience due to external resource dependencies.

A multi-platform approach must have clear value. That value might be the experience of learning, content delivery performance assurance or it might be one which roots itself in survival. It is interesting to see the tools and services develop in this space. PerfOps’ FlexBalancer has a somewhat “cake and eat it too” potential when it comes to the use of multiple CDNs.

Relevant posts

Acknowledgements