I knew when I began to duel stack this site it would introduce challenges which would not exist in a single cloud solution. In this case, I found myself chasing speed demons at the expense of focus on other project time. I’ve enjoyed every minute of it - however, there’s a lesson in there.
This site is a project which began on one hand as a reason to explore the GCP Firebase hosting product. Extending it to AWS has added an interesting dimension to it whilst still being relevant to the project itself. There are some strong feelings out there on multi-cloud as a default strategy:
So people are piling on to Keith for daring to suggest that multi-cloud may not be a winning strategy.— Corey Quinn (@QuinnyPig) March 7, 2019
Allow me to chime in and outright state that multi-cloud for a generic workload is, absent a compelling business reason, a moronic default. https://t.co/aRqF7Rlb48
Multi-cloud also leads to inevitable comparisons. One might be the speed of operation or perceived performance. This comparison whilst easy to consider on the surface is fraught with blind alleys and data fallacies. This tweet offers a useful bingo card for the subject:
15 data fallacies and biases pic.twitter.com/fpcjoYcZq6— Emilio Ferrara (@jabawack) March 16, 2019
I had previously arrived at the decision to employ the AWS version of this site in an active-passive manner with manual intervention. I also spent some time considering how I might go about changing posture to automatic failover or even active-active. For science.
Comparing GCP and AWS
Prior to any of this, I did some basic analysis of the site via Google’s own PageSpeed Insights. Then I was introduced to GTmetrix which offered deeper insight alongside the same PageSpeed score.
Point in time performance (Canada)
Historic performance (Canada)
GCP last week
Amplify last week
The key takeaway was that the time to first paint, a key user-perceivable metric, was consistently higher via the Amplify solution from this test location. GTmetrix describes first paint time as:
First paint time is the first point at which the browser does any sort of rendering on the page. Depending on the structure of the page, this first paint could just be displaying the background colour (including white), or it could be a majority of the page being rendered.
The monitoring interval was still daily and this was a single location so we can’t really establish anything more concrete.
Indeed, when I use another test location we get comparable results to Firebase:
In all likelihood, this is a location specific issue whereby it is using a sub-optimal edge or related connectivity assuming all other factors in the test are equal. But what if it wasn’t? I delved deeper.
Getting into the weeds
I noticed via GTmetrix that the caching of the items were low TTL relative to Firebase. 5 seconds vs 1 hour.
I hypothesised that perhaps there was less edge caching. Here is the quote from the Amplify user guide on the impact of TTL selection:
You can control how long your objects stay in a CDN cache before the CDN forwards another request to your origin. Reducing the duration enables you to serve dynamic content. Increasing the duration means your users get better performance because your objects are more likely to be served directly from the edge cache. A longer duration also reduces the load on your origin.
Updating the TTL was straightforward:
The Amplify documentation on application performance was recently updated to reflect their instant cache invalidation approach. The observed TTL changes were consistent with that approach.
Rather than go deeper at this point into trying to understand the Amplify stack I put CloudFlare proxy in place to force caching of all items at a value greater than or equal to Firebase. The performance from this location was unchanged.
It was time to take a step back - this was not how to draw a horse:
A wider view
I considered how you might operationally support multiple CDNs and assert confidence in parallel content delivery. For that outcome, I’d need to measure within the context of CDN performance domains. CDNPerf emerged as an interesting tool as did the associated PerfOps product. We can see here the geographical differences via these point in time tests from CDNPerf test locations around the world:
Firebase Global Latency
Amplify Global Latency
PerfOps and Flexbalancer
The FlexBalancer service was an unexpected find. It appeared to take metrics such as the user experience and geo-proximity in the consideration of providing a DNS based response. The product was not generally available when I enquired, however, the PerfOps team were kind enough to give me access to their open beta.
I’ve had some time to experiment with it. Their product is an interesting approach to traffic steering and I hope it does well. At the time of writing, there was a free-tier on the pricing page and I hope to take advantage of it upon launch.
As I based this site on a template I’ve had to understand how it works before I could reach conclusions on content performance. To complicate matters the CDN used for primary content delivery is only part of the user experience due to external resource dependencies.
A multi-platform approach must have clear value. That value might be the experience of learning, content delivery performance assurance or it might be one which roots itself in survival. It is interesting to see the tools and services develop in this space. PerfOps’ FlexBalancer has a somewhat “cake and eat it too” potential when it comes to the use of multiple CDNs.