KERNEL PANIC

FATAL_ERROR: RED_BULL_RESERVOIR_EMPTY

A problem has been detected and systems have been shut down to prevent damage to your sanity.


*** STOP: 0x000000GO (0x000000RU, 0x000000ST, 0x000000SRE, 0x000000AI)


Rebooting in 5 seconds...

Originally published on an external platform.

Recently I wrote about how to use the elasticache_replication_group Terraform resource to create an ElastiCache Cluster in AWS. Here is something I discovered later.

Updating anything on an existing ElastiCache cluster keeps indicating that it will re-create the elasticache_replication_group… but why?

Even though I was changing parameters like autoscaling or the number of nodes, Terraform still tried to re-create the entire cluster. After inspecting the provider code, nothing looked immediately indicative of this behavior. Then I looked at the provider version—we were using an older version (3.0.0) and found the culprit.

"auth_token": {
    Type:      schema.TypeString,
    Optional:  true,
    DiffSuppressFunc: suppressAuthTokenDiff,
    Description: "Password used to access a password-protected server",
    Sensitive: true,
    ForceNew:  true,
},

Basically, I had created an Auth-enabled Redis Cluster for the default Redis user. Later, whenever I used the same tfvars to update any parameter, Terraform assumed the Auth Token was being refreshed or updated. This happens because there is no verification of whether the Auth Token has actually changed at the Terraform or AWS ElastiCache API level. It looks something like this in the UI:

# aws_elasticache_replication_group.default will be updated in-place
~ resource "aws_elasticache_replication_group" "default" {
    ~ auth_token = (sensitive value) # forces replacement
    # ... other attributes
}

Bottom Line: The AWS ElastiCache API and Terraform Provider (v3.0.0) don’t detect if the Auth Token is the same; they always treat it as a new token, triggering a resource replacement.

The Fix

The solution was simple: I made the auth_token assignment conditional.

From this:

auth_token = var.transit_encryption_enabled ? var.auth_token : null

To this:

auth_token = var.transit_encryption_enabled && var.existing_cluster == false ? var.auth_token : null

Now, if the variable var.existing_cluster is set to true, the var.auth_token never gets populated for existing clusters, and Terraform performs the update correctly without recreation.

Another option would be to upgrade the AWS provider to a version where ForcesNew has been removed from this field:

"auth_token": {
    Type:      schema.TypeString,
    Optional:  true,
    DiffSuppressFunc: suppressAuthTokenDiff,
    Description: "Password used to access a password-protected server",
    Sensitive: true,
    // ForceNew:  true, (removed in newer versions)
},

However, that wasn’t straightforward for our infrastructure due to other dependencies. By making this simple conditional change, we saved ourselves a lot of headache!

Happy Terraforming!!

36.5°C
CORE TEMPERATURE

KERNEL PANIC

Critical system failure. All Gophers have escaped.

Rebooting universe in 5...

Error: PEBKAC_EXCEPTION
Address: 0xDEADBEEF