Working with Google Cloud Managed Instance Groups

Google Cloud Managed Instance Groups (MIGs) are groups of identical virtual machine instances that serve the same purpose.

Instances are created based on an Instance Template which defines the configuration that all instances will use including image, instance size, network, etc.

MIGs that host services are fronted by a load balancer which distributes client requests across the instances in the group.

MIG instances can also run batch processing applications which do not serve client requests and do not require a load balancer.

MIGs can be configured for autoscaling to increase the number of VM instances in the group based on CPU load or demand.

They can also auto-heal by replacing failed instances. Health checks are used to make sure each instance is responding correctly.

MIGs should be Regional and use VM instances in at least two different zones of a region. Regional MIGs can have up to 2000 instances.

Terraform

Two different modules authored by Google can be used to create an Instance Template and MIG:

  • Instance template: terraform-google-modules/vm/google//modules/instance_template
  • Multi-version MIG: terraform-google-modules/vm/google//modules/mig_with_percent

To optionally create an Internal HTTP load balancer, use: GoogleCloudPlatform/lb-internal/google

The following examples below create a service account, two instance templates, a MIG, and an Internal HTTP load balancer.

Pre-requisites

  • A custom image should be created with nginx installed and running at boot.
  • A VPC with a proxy-only subnet is required.
  • The instance template requires a service account.
# Enable IAM API
resource "google_project_service" "project" {
 project = "my-gcp-project-1234"
 service = "iam.googleapis.com"
 disable_on_destroy = false
}

# Service Account required for the Instance Template module
resource "google_service_account" "sa" {
 project = "my-gcp-project-1234"
 account_id = "sa-mig-test"
 display_name = "Service Account MIG test"
 depends_on = [ google_project_service.project ]
}

Update project, account_id, and display_name with appropriate values.

Instance Templates

The instance template defines the instance configuration. This includes which network to join, any labels to apply to the instance, the size of the instance, network tags, disks, custom image, etc.

The MIG deployment requires an instance template.

The instance template requires that a source image have already been created.

In this terraform code example, two instance templates are created:

  • “A” template – initial version to use in the MIG
  • “B” template – future upgrade version to use with an optional canary update method

During the initial deployment, each instance template can point to the same custom image for the source_image value. In the future, each instance template should point to a different custom image.

# Instance Template "A"
# Module src: https://github.com/terraform-google-modules/terraform-google-vm/blob/master/modules/instance_template
# Registry: https://registry.terraform.io/modules/terraform-google-modules/vm/google/latest/submodules/instance_template
# Creates google_compute_instance_template
module "instance_template_A" {
 source = "terraform-google-modules/vm/google//modules/instance_template"
 region = "us-central1"
 project_id = "my-gcp-project-1234"
 subnetwork = "us-central-01"
 
 service_account = {
  email = google_service_account.sa.email
  scopes = ["cloud-platform"]
 }

 name_prefix = "nginx-a"
 tags = ["nginx"]
 labels = { mig = "nginx" }
 machine_type = "f1-micro"
 startup_script = "sed -i 's/nginx/'$HOSTNAME'/g' /var/www/html/index.nginx-debian.html"

 source_image_project = "my-gcp-project-1234"
 source_image = "image-nginx"
 disk_size_gb = 10
 disk_type = "pd-balanced"
 preemptible = true
}

# Instance Template "B"
module "instance_template_B" {
 source = "terraform-google-modules/vm/google//modules/instance_template"
 region = "us-central1"
 project_id = "my-gcp-project-1234"
 subnetwork = "us-central-01"
 
 service_account = {
  email = google_service_account.sa.email
  scopes = ["cloud-platform"]
 }

 name_prefix = "nginx-b"
 tags = ["nginx"]
 labels = { mig = "nginx" }
 machine_type = "f1-micro"
 startup_script = "sed -i 's/nginx/'$HOSTNAME'/g' /var/www/html/index.nginx-debian.html"

 source_image_project = "my-gcp-project-1234"
 source_image = "image-nginx"
 disk_size_gb = 10
 disk_type = "pd-balanced"
 preemptible = true
}

Update the following with appropriate values:

  • Module name
  • region
  • project_id
  • subnetwork – the VPC subnet to use for instances deployed via the template
  • name_prefix– prefix the name of instance template, it will have a version attached to the name.
    • Be sure to include any specific versioning to indicate what is in the custom image.
    • Lowercase only.
  • tags – any required tags
  • labels – network labels to apply to instances deployed via the template
  • machine_type – machine size to use
  • startup_script – startup script to run on each boot (not just deployment)
  • source_image_project – project where the image resides
  • source_image – image name
  • disk_size_gb – size of the boot disk
  • disk_type – type of boot disk
  • preemptible – if set to true, instances can be pre-empted as needed by Google Cloud.

More instance template module options are available:

Changes to the instance template will result in a new version of the template. The MIG will be modified to use the new version. All MIG instances will be recreated. See the update_policy section of the MIG module definition (below) to control the update behavior.

Managed Instance Group

The MIG creates the set of instances using the same custom image and image template. Instances are customized as usual during first boot.

A custom startup script can run every time the instance starts and configure the VM further. See Overview  |  Compute Engine Documentation  |  Google Cloud

In this Regional MIG terraform example, the initial set of instances are deployed using the “A” template set as the instance_template_initial_version.

The same “A” template is also set for the instance_template_next_version with a value of 0 for the next_version_percent.

In a future canary update, set the instance_template_next_version to the “B” template with an appropriate value for next_version_percent.

# Regional Managed Instance Group with support for canary updates 
# Module src: https://github.com/terraform-google-modules/terraform-google-vm/tree/master/modules/mig_with_percent 
# Registry: https://registry.terraform.io/modules/terraform-google-modules/vm/google/latest/submodules/mig_with_percent 
# Creates google_compute_health_check.http (optional), google_compute_health_check.https (optional), google_compute_health_check.tcp (optional), google_compute_region_autoscaler.autoscaler (optional), google_compute_region_instance_group_manager.mig 

module "mig_nginx" { 
 source = "terraform-google-modules/vm/google//modules/mig_with_percent" 
 project_id = "my-gcp-project-1234" 
 hostname = "mig-nginx" 
 region = "us-central1" 
 target_size = 4
 
 instance_template_initial_version = module.instance_template_A.self_link 

 instance_template_next_version = module.instance_template_A.self_link 
 next_version_percent = 0 
 
 //distribution_policy_zones = ["us-central1-a", "us-central1-f"]
 
 update_policy = [{ # See https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_region_instance_group_manager#nested_update_policy 
  type = "PROACTIVE" 
  instance_redistribution_type = "PROACTIVE" 
  minimal_action = "REPLACE" 
  max_surge_percent = null 
  max_unavailable_percent = null 
  max_surge_fixed = 4 
  max_unavailable_fixed = null 
  min_ready_sec = 50 
  replacement_method = "SUBSTITUTE" 
  }] 
 
 named_ports = [{ 
  name = "web" 
  port = 80 
 }] 
 
 health_check = { 
  type = "http" 
  initial_delay_sec = 30 
  check_interval_sec = 30 
  healthy_threshold = 1 
  timeout_sec = 10 
  unhealthy_threshold = 5 
  response = "" 
  proxy_header = "NONE" 
  port = 80 
  request = "" 
  request_path = "/" 
  host = "" 
 } 
 
 autoscaling_enabled = "false" 
 /* 
 max_replicas = var.max_replicas 
 min_replicas = var.min_replicas 
 cooldown_period = var.cooldown_period 
 autoscaling_cpu = var.autoscaling_cpu 
 autoscaling_metric = var.autoscaling_metric 
 autoscaling_lb = var.autoscaling_lb 
 autoscaling_scale_in_control = var.autoscaling_scale_in_control 
 */ 
} 

Update the following with appropriate values:

  • Module name
  • project_id
  • hostname – the prefix for provisioned VM names/hostnames. Will have a random set of 4 characters appended to the end.
  • region
  • target_size – number of instances to create in the MIG. Does not need to equal the number of zones in distribution_policy_zones.
  • instance_template_initial_version – template to use for initial deployment
  • instance_template_next_version – template to use for future canary update
  • next_version_percent – percentage of instances in the group (of target_size) that should use the canary update
  • distribution_policy_zones – zone names in the region where VMs should be provisioned.
    • Optional. If not specified, the Google-authored terraform module will automatically select each zone in the region.
      • Example: us-central1 region has 4 zones so each zone will be populated in this field. This directly impacts the update_policy and its max_surge_fixed value.
    • This value cannot be changed later. The module will ignore any changes.
      • The MIG will need to be destroyed and recreated to update the zones to use.
    • More than two zones can be specified.
    • The target_size does not need to match the number of zones specified.
    • See About regional MIGs  |  Compute Engine Documentation  |  Google Cloud .
  • update_policy – specifies how instances should be recreated when a new version of the instance template is available.
    • type set to
      • PROACTIVE will update all instances in a rolling fashion.
        • Leave max_unavailable_fixed as null which results in a value of 0, meaning no live instances can be unavailable.
        • Recommended
      • OPPORTUNISTIC means “only when you manually initiate the update on selected instances or when new instances are created. New instances can be created when you or another service, such as an autoscaler, resizes the MIG. Compute Engine does not actively initiate requests to apply opportunistic updates on existing instances.”
        • Not recommended
    • max_surge_fixed indicates the number of additional instances that are temporarily added to the group during an update.
      • These new instances will use the updated template.
      • Should be greater than or equal to the number of zones in distribution_policy_zones. If there are no zones specified in distribution_policy_zones, as mentioned previously, the Google-authored MIG module will automatically select all the zones in the region.
    • replacement_method can be set to either of the following values:
      • RECREATE instance name is preserved by deleting the old instance and then creating a new one with the same name.
      • SUBSTITUTE will create new instances with new names.
        • Results in a faster upgrade of the MIG – instances are available sooner than using RECREATE.
        • Recommended.
    • See Terraform Registry and Automatically apply VM configuration updates in a MIG  |  Compute Engine Documentation  |  Google Cloud
  • named_ports – set the port name and port number as appropriate
  • health_check – set the check type, port, and request_path as appropriate
  • Autoscaling can also be configured. See Autoscaling groups of instances  |  Compute Engine Documentation  |  Google Cloud

More MIG module options are available:

Changes to the MIG may result in a VMs needing to update. See the update_policy section of the MIG module definition (above) to configure the behavior when updating the MIG members.

Load Balancer

An Internal Load Balancer can make a MIG highly available to internal clients.

module "ilb_nginx" {
 source = "GoogleCloudPlatform/lb-internal/google"
 version = "~4.0"
 project = "my-gcp-project-1234"
 network = module.vpc_central.network_name
 subnetwork = module.vpc_central.subnets["us-central1/central-01-subnet-ilb"].name
 region = "us-central1"
 name = "ilb-nginx"
 ports = ["80"]
 source_tags = ["nginx"]
 target_tags = ["nginx"]

 backends = [{
  group = module.mig_nginx.instance_group
  description = ""
  failover = false
 }]

 health_check = {
  type = "http"
  check_interval_sec = 30
  healthy_threshold = 1
  timeout_sec = 10
  unhealthy_threshold = 5
  response = ""
  proxy_header = "NONE"
  port = 80
  request = ""
  request_path = "/"
  host = ""
  enable_log = false
  port_name = "web"
 }
}

Update the following with appropriate values:

  • Module name
  • project
  • network and subnetwork – the VPC and proxy-only subnet to use
  • region
  • name
  • ports – the port to listen on
  • source_tags and target_tags – network tags to use, should be present on the MIG members via the instance template.
  • backends – points to the MIG
  • health_check – should generally match the MIG healthcheck.

More options are available, : see the module source for variables.tf and main.tf

Be sure to consider any necessary firewall rules, especially if using network tags.

The Google-authored MIG module has create_before_destroy set to true, so a new MIG can replace an existing one as a backend behind the load balancer via a very minimal outage (less than 10 seconds). The new MIG will be created and added as a backend, and then the old MIG will be destroyed.

Day 2 operations

Changing size of MIG

If needed, adjust the target_size value of the MIG module to increase or decrease the number of instances. Adjustments take place right away.

If increasing the number of instances and a new template is in place and the update_policy is OPPORTUNISTIC, the new instances will be deployed using the new template.

Changing the zones to use for a MIG

Cannot be changed after creation. MIG must be destroyed and recreated.

Deleting MIG members

Deleting a MIG member automatically reduces the target number of instances for the MIG. The deleted member is not replaced.

Restarting MIG members

Do not manually restart a MIG instance from within the VM itself. This will cause its healthcheck to fail and the MIG will delete/recreate the VM using the template.

Use the RESTART/REPLACE button in the Cloud Console and choose the Restart option. This affects all instances in the group, but can be limited to only acting against a maximum number at a time (“Maximum unavailable instances”).

The “Replace” option within RESTART/REPLACE will delete and recreate instances using the current template.

Updating MIG instances to a new version

When a new version of the custom image is released, such as when it has been updated with new software, the MIG can be updated in a controlled fashion until all members are running the updated version, without any outage.

The MIG module update_policy setting is very important for this process to ensure there is no outage:

  • max_surge_fixed is the number of additional instances created in the MIG and verified healthy before the old ones are removed.
    • Should be set to greater than or equal to the number of zones in distribution_policy_zones
  • max_unavailable_fixed should be set to null which equals 0: no live instances will be unavailable during the update.

The MIG module has options for two different instance templates in order to support performing a canary update where only a percentage of instances are upgraded to the new version:

  • instance_template_initial_version – template to use for initial deployment
  • instance_template_next_version – template to use for future canary update
  • next_version_percent – percentage of instances in the group (of target_size) that should use the canary update

Initially, both options may point to the same template and 0% is allocated to the “next” version.

If a load balancer is used, newly created instances that are verified healthy will automatically be selected to respond to client requests.

Canary update

To move a percentage of instances to the “next” version via a “canary” update:

  1. Set the instance_template_next_version to point to an instance template which uses an updated custom image
  2. Set the next_version_percent to an appropriate percentage of instances in the group that should use the “next” template.
  3. Make sure update_policy has type set to PROACTIVE – this will cause the change to take effect right away.

When applied via terraform, all instances will be recreated (adhering to the update_policy) but a percentage of instances will be created using the “next” template.

After the canary update has been validated and all instances should be upgraded, see the steps below for a Regular update.

Regular update

To update all instances at once (adhering to the update_policy):

  1. Set both the instance_template_initial_version and instance_template_next_version to point to an instance template which uses an updated custom image
  2. Set the next_version_percent to 0.
  3. Make sure update_policy has type set to PROACTIVE – this will cause the change to take effect right away.

When applied via terraform, all instances will be recreated (adhering to the update_policy).

2 thoughts on “Working with Google Cloud Managed Instance Groups

  1. Robert

    Hi,

    Thanks for the post. I’m following it to slowly set up my own infra consisting of a managed group and a load balancer. One thing I don’t understand: You wrote that it is necessary to have a proxy-only subnet. However, according to these examples: https://github.com/terraform-google-modules/terraform-google-lb-internal, it is optional to create such a network. I mean, they don’t make it explicitly.
    Nevertheless, assume that I want to create a proxy-only subnet for my load balancer, then:
    1. Do I have to create a separate subnet for the load balancer and the MIG, i.e., the LB runs in subnet A and VMs from the template in subnet B?
    2. Where did you get “module.vpc_central.{network_name, subnets}” from? I looked through the available terra modules and couldn’t find it.

    Thanks,
    Robert

    Reply
    1. chouse Post author

      Hi Robert, it’s been a bit since I wrote this post so bear with me a little bit here as I outline my thought process on answering your questions.

      In the module “ilb_nginx”, it is using as its source GoogleCloudPlatform/lb-internal/google which according to https://registry.terraform.io/modules/GoogleCloudPlatform/lb-internal/google/latest is sourced from https://github.com/GoogleCloudPlatform/terraform-google-lb-internal which is the link you mentioned in the comment, so I am just confirming we are referring to the same module source.

      Now on that module’s github README in the Usage section, it doesn’t have network or subnetwork in the example, but if we look in its variables.tf we can see network and subnetwork are defined with a default value of “default” and if we look in its main.tf we can see var.network is consumed by google_compute_network (VPC) and var.subnetwork is consumed by google_compute_subnetwork (subnet), and since these are both data types (instead of resource), they should already exist.

      Now, when we look further down in the module’s main.tf we see the google_compute_forwarding_rule which uses the network and subnetwork variables. According to https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_forwarding_rule (which has a ton of examples), the `subnetwork` argument is “The subnetwork that the load balanced IP should belong to for this Forwarding Rule. This field is only used for INTERNAL load balancing”. So this is only where the client-facing IP address of the ILB should be allocated, nothing about proxy-only.

      In the first example on the page, they do create a google_compute_subnetwork with purpose INTERNAL_HTTPS_LOAD_BALANCER indicating it is a proxy-only subnet, and then they do also create a subnet to use for the client-facing load balanced IP of the internal load balancer, which is referred to in the google_compute_forwarding_rule as the subnetwork argument, but you’ll note that the proxy_subnet while created is never referenced in the example.

      If we take a look at https://cloud.google.com/load-balancing/docs/proxy-only-subnets, we can understand why we do need to create a proxy-only subnet, but don’t necessarily need to refer to it in terraform when setting up ILBs – the VPC is capable of having a single proxy-only subnet, and internal load balancers deployed on the VPC will automatically use the proxy-only subnet. The proxy-only subnet IP is the source of forwarded requests as seen by backend instances.

      So to finally answer your questions, yes, you should have a separate subnet for the frontend interface of the load balancer (internal clients will make requests to IPs in this subnet), and then you will need the proxy-only subnet where forwarded requests will come from heading towards the backend, and finally a subnet for your instances to use, as part of a MIG.

      I believe you should be able to actually put the client-facing ILB interface and backend instances on the same subnet, but it’s a little cleaner if you separate them, and then you can apply VPC firewall rules to keep clients from hitting backends directly.

      Client -> front end ILB subnet -> ILB proxy-only subnet -> MIG instance backend/subnet.

      Regarding the “module.vpc_central” I reference, good catch on that, I do not have access to the source where I designed this, but I was probably using Google’s net-vpc module, https://github.com/GoogleCloudPlatform/cloud-foundation-fabric/tree/master/modules/net-vpc.

      You can refer to the VPC that it creates by using module.vpc.network.id or module.vpc.network.self_link, and subnetworks by name module.vpc.subnet_self_links[“subnetwork-name”]

      Hope this helps!

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *