Category Archives: Google Cloud

Automating Compute Engine Local SSD configuration in Windows


Similar to other public cloud providers, Google Cloud offers the ability to attach local high-speed disks to virtual machines (VMs). This offering is called Local SSD.

The “Local” in Local SSD means that the disks are physically attached to the hypervisor host where the VM is executing. This provides the VM with high-speed/low-latency data operations over the NVME interface.

On Windows GCE instances (VMs), Local SSD volumes are typically used for temporary or ephemeral data such as the Windows Pagefile, Microsoft SQL Server Temp database, or other high-speed caching needs.


Certain GCE instance machine sizes come with different quantities of Local SSD disks, which are always 375 GB in size. They can be striped together to create larger volumes with higher performance using logical volume management tools in the Operating System.

Read more about Local SSD Performance.


However, again like other public clouds, Local SSD storage is ephemeral and comes with some risks.

Data on volumes created with Local SSD disks can be irretrievably lost and no guarantee is made as to the safety or durability of that data.

Specifically, if the OS is shut down from within the Guest Operating System, the GCE instance will power off and the Local SSD disks and the data they contain will be lost. When the VM is started back up via the Cloud Console, it will have new, empty Local SSD disks which must be initialized again in order to be used.

There are other situations where data on Local SSD disks can be lost, such as if the hypervisor host has an issue and Google Cloud cannot migrate the Local SSD data along with the VM to a new host.

Read more about Local SSD data persistence.

So, generally speaking, Local SSDs offer great performance but the usage of them needs to be understood and the loss of data should not impact the application.

Usage with localssd_init

In Google Cloud, Local SSD disks show up as raw disks that need to be initialized and formatted before they can be used.

To automate the initialization of Local SSDs on Windows GCE instances and create usable volumes, I have developed a simple PowerShell script called localssd_init.ps1 which can be found in my GitHub repository gce-localssd-init-powershell.

The script is configured to check if certain drive letters are present. If the drive letters are missing, it will recreate them using a specific configuration of Local SSD disks.

To use the script, simply copy it to your server and then sign it or configure the ExecutionPolicy.

Next, edit the localssd_init.ps1 script and configure the volumes that will be created using Local SSD disks, starting around line 50:

$LocalSSDConfig = @(
    [LocalSSDVol]@{Name = "SQLTempDB"; DriveLetter = 'E'; LocalSSDQty = 2; NTFSAlloc = '65536' },
    [LocalSSDVol]@{Name = "Pagefile"; DriveLetter = 'P'; LocalSSDQty = 2; NTFSAlloc = '8192'; PostScript = "C:\path\to\pagefile.ps1" }

Add, remove, or modify entries in the array. Be sure to have a comma after each line, except for the last line, and separate the key=value pairs with a semicolon.

For each entry,

  • Update the Name and DriveLetter where the volume will be mounted. The name will be set for the Storage Pool and Volume.
  • Set the LocalSSDQty to the number of 375 GB Local SSD disks to stripe across and set the NTFSAlloc to the needed NTFS Allocation unit size.

Optionally, set PostScript to the path to a custom script to run after the volume has been created. The custom script could change the system Pagefile configuration to use the new drive (a reboot will be be required) or restart SQL Server so it recreates the Temp database files.

The script uses Windows Storage Spaces to create a new Storage Pool using the necessary quantity of Local SSD volumes using “Simple” resiliency which is the same as Striping and then creates a new Volume using the maximum size of the pool and formats it using NTFS with the required allocation size and mounts it at the required drive letter.

During server creation, the localssd_init.ps1 script can be used to create the Local SSD-backed volumes and then run at every subsequent boot to make sure they exist and recreate them if they are missing.

Simply run the script and the disks will be created. The script will log to the standard output and also create entries in the System Event Log.

To enable the script to run at every boot, use this snippet to configure a new Windows Task Manager job to run at startup. Be sure to update the path to the script:

  Set-ExecutionPolicy RemoteSigned
  $trigger = new-jobtrigger -atstartup -randomdelay 00:00:10
  register-scheduledjob -trigger $trigger -filepath $path -name localssd_init


If you are on the fence about using Local SSD on Windows GCE instances because those disks might need to be initialized again after an outage, check out localssd_init.ps1 to make sure they’re always available for use.

Configuring Private Google Access

Private Google Access (PGA) is the method by which resources in Google Cloud such as Google Cloud Compute Engine (GCE), Google Cloud VMware Engine (GCVE) or on-premise, can access Google Cloud APIs without having Internet access or needing an assigned external IP address.

For on-premise clients, this connectivity can take place through private hybrid connections such as Interconnect which may be faster than the subscribed Internet service.

For an overview, check out

Ultimately, PGA is delivered through a set of public IP addresses which are not advertised on the Internet but are available via the Google Cloud VPC in your project.

  • for is used for services that are supported by VPC Service Controls and blocks access to services that do not support VPC service controls.
  • for is used for most services (except Workspace web or interactive websites) regardless of their support for VPC Service Controls.

In this post, we’ll cover working with which supports most APIs.

To enable clients to use PGA to access common APIs, two things need to be configured:

  1. Routing: Clients need to have a network route to reach these IP addresses.
  2. DNS: Clients need to resolve the public API fully-qualified domain name (FQDN, such as to one of the 4 IP addresses in the range:, .9, .10, and .11.


In order to access the range for, we need to first determine where our clients are located on the network and how they get to the Internet.

Google Compute Engine

Compute Engine clients on a VPC must meet the requirements to use PGA. Two of these requirements are the VM must not have an external public IP address and it must use a subnet which has PGA enabled. Enabling PGA is done per-subnet, not per-VPC. Clients will access the range through the VPC’s standard default internet gateway route.

If the default internet gateway route has been overridden by a custom route that directs Internet-bound traffic ( to a firewall appliance, then an additional custom route must be created for the range with a next hop of default-internet-gateway and a priority that is higher than the custom route for

This way, traffic for will go out the default Internet gateway of the VPC. It doesn’t actually go to the Internet – instead the underlying Google Cloud network will accept that traffic and direct it to the private internal interfaces of Google APIs.

In the screenshot below, the routes are ordered from highest priority (lowest number) to lowest priority (highest number). The private-google-access route has a priority of 90 which is a higher priority than the route for with a priority of 100 that leads to another network. Therefore, traffic for the PGA range will go out the default Internet gateway of the VPC and not onward to the other network like all other Internet traffic.

Google Cloud VMware Engine

If Google Cloud VMware Engine (GCVE) is configured to use VMware Engine Internet access, then traffic for will flow through the GCVE Internet access service.

If Internet access for GCVE workloads has been disabled and Internet-bound traffic is configured to use another network, then traffic for will flow through the GCVE VPC peering connection to your peered VPC and route based on the routing configuration of your VPC.

The VPC in your project should be advertising a route for for with a priority that overrides the default internet gateway route. Therefore, an additional custom route must be created for the range with a next hop of default-internet-gateway and a priority that is higher than the custom route for

See the previous screenshot for an example.

See Private Google Access: a closer look for more details about PGA for GCVE.


Typically, on-premise clients trying to reach the range for would take the default route out to the internet. But, since this range is not available on the internet, communication will fail.

Supporting Private Google Access with on-premise clients requires hybrid connectivity between the Google Cloud project VPC and the on-premise environment such as through Interconnect or Cloud VPN.

A route for must be advertised by the Cloud Router that is associated with the hybrid connectivity. This way, clients on-premise will have a route to that leads back over the hybrid connectivity, instead of out to the internet.

Additionally, if the VPC associated with the hybrid connectivity (Interconnect or Cloud VPN) has its default internet gateway route overridden by a custom route that directs Internet-bound traffic ( to a firewall appliance, then an additional custom route must be created for the range with a next hop of default-internet-gateway and a priority that is higher than the custom route for

This way, on-premise clients trying to reach the range will be directed over the hybrid connectivity and then out the VPC’s default internet gateway to access PGA.


To test connectivity for any client to the range for once routing has been configured, use the Test-NetConnection PowerShell cmdlet with port 80 or 443 (only HTTP/S is supported, ICMP Ping is not). The TcpTestSucceeded value should return True if the routing was configured successfully:

Test-NetConnection -computername -port 443

ComputerName     :
RemoteAddress    :
RemotePort       : 443
InterfaceAlias   : Ethernet
SourceAddress    :
TcpTestSucceeded : True

Verification of connectivity should be completed prior to making any DNS changes to avoid an outage for clients trying to reach Google APIs.


Once routing is in place and tested, changes in DNS must be made. These changes should be done on the DNS server used by clients.

To override the DNS resolution of a Google API fully-qualified domain name (FQDN), we need to create a new zone in DNS along with A- and CNAME-records that resolve the API FQDN to the private IPs. This way, clients will no longer use the publicly-advertised DNS zone and records which resolve to the public IP of the API and will instead use the private IP from the DNS server’s overriding zone.

If a client is configured to check its local hosts file first for DNS resolution prior to using the internal or public DNS, entries in the hosts file can be configured to point a public API FQDN to a PGA IP. This is a safe way to fully test PGA on a single host before affecting all clients by updating the common internal DNS server.

In this example, we override resolution of and point it to one of the IP addresses,

% cat /etc/hosts
# Host Database
# localhost is used to configure the loopback interface
# when the system is booting.  Do not change this entry.
##       localhost broadcasthost
::1             localhost

Any requests on this client to will now go to and should be successful as long as routing is in place.

To enable all clients which use a common DNS server to use PGA IPs instead of the public IP, create a DNS zone for in the private DNS server.

Within the new zone, create A-records for using the four different IP addresses in the range:,,, and

On-premise Active Directory clients

If clients are part of an Active Directory (AD) domain, AD DNS can be used to create the zone and A-records:

Be sure to create A-records for each of the IPs:

Next, create a wildcard CNAME-record for * which points to the A records.

Now, when any client tries to resolve any hostname, it will be given one of the PGA IPs.

Compute Engine with Google Cloud DNS

If clients use Google Cloud DNS, such as Compute Engine instances, this same configuration can be made in Cloud DNS.

Create a new private zone for, and then create a record set for and add the four IP addresses, and then also create a record set for the wildcard CNAME *

Terraform can be used to create zones and record sets. Following is an example for setting up

resource "google_dns_managed_zone" "dns-zone-googleapis-com" {
  name        = "googleapis-com"
  project     = google_project.svpc.project_id
  dns_name    = ""
  description = "Private Google Access"

  visibility = "private"

  private_visibility_config {
    networks {
      network_url =

resource "google_dns_record_set" "private-google-access-a" {
  project      = google_project.svpc.project_id
  name         = "private.${google_dns_managed_zone.dns-zone-googleapis-com.dns_name}"
  managed_zone =
  type         = "A"
  ttl          = 300

  rrdatas = ["", "", "", ""]

resource "google_dns_record_set" "private-google-access-cname" {
  project      = google_project.svpc.project_id
  name         = "*.${google_dns_managed_zone.dns-zone-googleapis-com.dns_name}"
  managed_zone =
  type         = "CNAME"
  ttl          = 300
  rrdatas      = []

Google Cloud VMware Engine

Google Cloud VMware Engine clients will need to use a DNS server such as Active Directory or Google Cloud DNS (through a Cloud DNS inbound server policy) to resolve overriding domain names.

Additional API domains

Additional zones may need to be configured in the DNS server other than depending on what the client is trying to access. This could include and others. See Domain Options for

For example, to override the domain, create A records for the domain itself and then a wildcard CNAME for * which points to the domain A records.

When using any Google Cloud service, be sure to review which API FQDNs are in use and determine if additional overriding zones need to be created in the DNS server.

For further implementation details about Private Google Access, see

Working with Google Cloud Managed Instance Groups

Google Cloud Managed Instance Groups (MIGs) are groups of identical virtual machine instances that serve the same purpose.

Instances are created based on an Instance Template which defines the configuration that all instances will use including image, instance size, network, etc.

MIGs that host services are fronted by a load balancer which distributes client requests across the instances in the group.

MIG instances can also run batch processing applications which do not serve client requests and do not require a load balancer.

MIGs can be configured for autoscaling to increase the number of VM instances in the group based on CPU load or demand.

They can also auto-heal by replacing failed instances. Health checks are used to make sure each instance is responding correctly.

MIGs should be Regional and use VM instances in at least two different zones of a region. Regional MIGs can have up to 2000 instances.


Two different modules authored by Google can be used to create an Instance Template and MIG:

  • Instance template: terraform-google-modules/vm/google//modules/instance_template
  • Multi-version MIG: terraform-google-modules/vm/google//modules/mig_with_percent

To optionally create an Internal HTTP load balancer, use: GoogleCloudPlatform/lb-internal/google

The following examples below create a service account, two instance templates, a MIG, and an Internal HTTP load balancer.


  • A custom image should be created with nginx installed and running at boot.
  • A VPC with a proxy-only subnet is required.
  • The instance template requires a service account.
# Enable IAM API
resource "google_project_service" "project" {
 project = "my-gcp-project-1234"
 service = ""
 disable_on_destroy = false

# Service Account required for the Instance Template module
resource "google_service_account" "sa" {
 project = "my-gcp-project-1234"
 account_id = "sa-mig-test"
 display_name = "Service Account MIG test"
 depends_on = [ google_project_service.project ]

Update project, account_id, and display_name with appropriate values.

Instance Templates

The instance template defines the instance configuration. This includes which network to join, any labels to apply to the instance, the size of the instance, network tags, disks, custom image, etc.

The MIG deployment requires an instance template.

The instance template requires that a source image have already been created.

In this terraform code example, two instance templates are created:

  • “A” template – initial version to use in the MIG
  • “B” template – future upgrade version to use with an optional canary update method

During the initial deployment, each instance template can point to the same custom image for the source_image value. In the future, each instance template should point to a different custom image.

# Instance Template "A"
# Module src:
# Registry:
# Creates google_compute_instance_template
module "instance_template_A" {
 source = "terraform-google-modules/vm/google//modules/instance_template"
 region = "us-central1"
 project_id = "my-gcp-project-1234"
 subnetwork = "us-central-01"
 service_account = {
  email =
  scopes = ["cloud-platform"]

 name_prefix = "nginx-a"
 tags = ["nginx"]
 labels = { mig = "nginx" }
 machine_type = "f1-micro"
 startup_script = "sed -i 's/nginx/'$HOSTNAME'/g' /var/www/html/index.nginx-debian.html"

 source_image_project = "my-gcp-project-1234"
 source_image = "image-nginx"
 disk_size_gb = 10
 disk_type = "pd-balanced"
 preemptible = true

# Instance Template "B"
module "instance_template_B" {
 source = "terraform-google-modules/vm/google//modules/instance_template"
 region = "us-central1"
 project_id = "my-gcp-project-1234"
 subnetwork = "us-central-01"
 service_account = {
  email =
  scopes = ["cloud-platform"]

 name_prefix = "nginx-b"
 tags = ["nginx"]
 labels = { mig = "nginx" }
 machine_type = "f1-micro"
 startup_script = "sed -i 's/nginx/'$HOSTNAME'/g' /var/www/html/index.nginx-debian.html"

 source_image_project = "my-gcp-project-1234"
 source_image = "image-nginx"
 disk_size_gb = 10
 disk_type = "pd-balanced"
 preemptible = true

Update the following with appropriate values:

  • Module name
  • region
  • project_id
  • subnetwork – the VPC subnet to use for instances deployed via the template
  • name_prefix– prefix the name of instance template, it will have a version attached to the name.
    • Be sure to include any specific versioning to indicate what is in the custom image.
    • Lowercase only.
  • tags – any required tags
  • labels – network labels to apply to instances deployed via the template
  • machine_type – machine size to use
  • startup_script – startup script to run on each boot (not just deployment)
  • source_image_project – project where the image resides
  • source_image – image name
  • disk_size_gb – size of the boot disk
  • disk_type – type of boot disk
  • preemptible – if set to true, instances can be pre-empted as needed by Google Cloud.

More instance template module options are available:

Changes to the instance template will result in a new version of the template. The MIG will be modified to use the new version. All MIG instances will be recreated. See the update_policy section of the MIG module definition (below) to control the update behavior.

Managed Instance Group

The MIG creates the set of instances using the same custom image and image template. Instances are customized as usual during first boot.

A custom startup script can run every time the instance starts and configure the VM further. See Overview  |  Compute Engine Documentation  |  Google Cloud

In this Regional MIG terraform example, the initial set of instances are deployed using the “A” template set as the instance_template_initial_version.

The same “A” template is also set for the instance_template_next_version with a value of 0 for the next_version_percent.

In a future canary update, set the instance_template_next_version to the “B” template with an appropriate value for next_version_percent.

# Regional Managed Instance Group with support for canary updates 
# Module src: 
# Registry: 
# Creates google_compute_health_check.http (optional), google_compute_health_check.https (optional), google_compute_health_check.tcp (optional), google_compute_region_autoscaler.autoscaler (optional), google_compute_region_instance_group_manager.mig 

module "mig_nginx" { 
 source = "terraform-google-modules/vm/google//modules/mig_with_percent" 
 project_id = "my-gcp-project-1234" 
 hostname = "mig-nginx" 
 region = "us-central1" 
 target_size = 4
 instance_template_initial_version = module.instance_template_A.self_link 

 instance_template_next_version = module.instance_template_A.self_link 
 next_version_percent = 0 
 //distribution_policy_zones = ["us-central1-a", "us-central1-f"]
 update_policy = [{ # See 
  type = "PROACTIVE" 
  instance_redistribution_type = "PROACTIVE" 
  minimal_action = "REPLACE" 
  max_surge_percent = null 
  max_unavailable_percent = null 
  max_surge_fixed = 4 
  max_unavailable_fixed = null 
  min_ready_sec = 50 
  replacement_method = "SUBSTITUTE" 
 named_ports = [{ 
  name = "web" 
  port = 80 
 health_check = { 
  type = "http" 
  initial_delay_sec = 30 
  check_interval_sec = 30 
  healthy_threshold = 1 
  timeout_sec = 10 
  unhealthy_threshold = 5 
  response = "" 
  proxy_header = "NONE" 
  port = 80 
  request = "" 
  request_path = "/" 
  host = "" 
 autoscaling_enabled = "false" 
 max_replicas = var.max_replicas 
 min_replicas = var.min_replicas 
 cooldown_period = var.cooldown_period 
 autoscaling_cpu = var.autoscaling_cpu 
 autoscaling_metric = var.autoscaling_metric 
 autoscaling_lb = var.autoscaling_lb 
 autoscaling_scale_in_control = var.autoscaling_scale_in_control 

Update the following with appropriate values:

  • Module name
  • project_id
  • hostname – the prefix for provisioned VM names/hostnames. Will have a random set of 4 characters appended to the end.
  • region
  • target_size – number of instances to create in the MIG. Does not need to equal the number of zones in distribution_policy_zones.
  • instance_template_initial_version – template to use for initial deployment
  • instance_template_next_version – template to use for future canary update
  • next_version_percent – percentage of instances in the group (of target_size) that should use the canary update
  • distribution_policy_zones – zone names in the region where VMs should be provisioned.
    • Optional. If not specified, the Google-authored terraform module will automatically select each zone in the region.
      • Example: us-central1 region has 4 zones so each zone will be populated in this field. This directly impacts the update_policy and its max_surge_fixed value.
    • This value cannot be changed later. The module will ignore any changes.
      • The MIG will need to be destroyed and recreated to update the zones to use.
    • More than two zones can be specified.
    • The target_size does not need to match the number of zones specified.
    • See About regional MIGs  |  Compute Engine Documentation  |  Google Cloud .
  • update_policy – specifies how instances should be recreated when a new version of the instance template is available.
    • type set to
      • PROACTIVE will update all instances in a rolling fashion.
        • Leave max_unavailable_fixed as null which results in a value of 0, meaning no live instances can be unavailable.
        • Recommended
      • OPPORTUNISTIC means “only when you manually initiate the update on selected instances or when new instances are created. New instances can be created when you or another service, such as an autoscaler, resizes the MIG. Compute Engine does not actively initiate requests to apply opportunistic updates on existing instances.”
        • Not recommended
    • max_surge_fixed indicates the number of additional instances that are temporarily added to the group during an update.
      • These new instances will use the updated template.
      • Should be greater than or equal to the number of zones in distribution_policy_zones. If there are no zones specified in distribution_policy_zones, as mentioned previously, the Google-authored MIG module will automatically select all the zones in the region.
    • replacement_method can be set to either of the following values:
      • RECREATE instance name is preserved by deleting the old instance and then creating a new one with the same name.
      • SUBSTITUTE will create new instances with new names.
        • Results in a faster upgrade of the MIG – instances are available sooner than using RECREATE.
        • Recommended.
    • See Terraform Registry and Automatically apply VM configuration updates in a MIG  |  Compute Engine Documentation  |  Google Cloud
  • named_ports – set the port name and port number as appropriate
  • health_check – set the check type, port, and request_path as appropriate
  • Autoscaling can also be configured. See Autoscaling groups of instances  |  Compute Engine Documentation  |  Google Cloud

More MIG module options are available:

Changes to the MIG may result in a VMs needing to update. See the update_policy section of the MIG module definition (above) to configure the behavior when updating the MIG members.

Load Balancer

An Internal Load Balancer can make a MIG highly available to internal clients.

module "ilb_nginx" {
 source = "GoogleCloudPlatform/lb-internal/google"
 version = "~4.0"
 project = "my-gcp-project-1234"
 network = module.vpc_central.network_name
 subnetwork = module.vpc_central.subnets["us-central1/central-01-subnet-ilb"].name
 region = "us-central1"
 name = "ilb-nginx"
 ports = ["80"]
 source_tags = ["nginx"]
 target_tags = ["nginx"]

 backends = [{
  group = module.mig_nginx.instance_group
  description = ""
  failover = false

 health_check = {
  type = "http"
  check_interval_sec = 30
  healthy_threshold = 1
  timeout_sec = 10
  unhealthy_threshold = 5
  response = ""
  proxy_header = "NONE"
  port = 80
  request = ""
  request_path = "/"
  host = ""
  enable_log = false
  port_name = "web"

Update the following with appropriate values:

  • Module name
  • project
  • network and subnetwork – the VPC and proxy-only subnet to use
  • region
  • name
  • ports – the port to listen on
  • source_tags and target_tags – network tags to use, should be present on the MIG members via the instance template.
  • backends – points to the MIG
  • health_check – should generally match the MIG healthcheck.

More options are available, : see the module source for and

Be sure to consider any necessary firewall rules, especially if using network tags.

The Google-authored MIG module has create_before_destroy set to true, so a new MIG can replace an existing one as a backend behind the load balancer via a very minimal outage (less than 10 seconds). The new MIG will be created and added as a backend, and then the old MIG will be destroyed.

Day 2 operations

Changing size of MIG

If needed, adjust the target_size value of the MIG module to increase or decrease the number of instances. Adjustments take place right away.

If increasing the number of instances and a new template is in place and the update_policy is OPPORTUNISTIC, the new instances will be deployed using the new template.

Changing the zones to use for a MIG

Cannot be changed after creation. MIG must be destroyed and recreated.

Deleting MIG members

Deleting a MIG member automatically reduces the target number of instances for the MIG. The deleted member is not replaced.

Restarting MIG members

Do not manually restart a MIG instance from within the VM itself. This will cause its healthcheck to fail and the MIG will delete/recreate the VM using the template.

Use the RESTART/REPLACE button in the Cloud Console and choose the Restart option. This affects all instances in the group, but can be limited to only acting against a maximum number at a time (“Maximum unavailable instances”).

The “Replace” option within RESTART/REPLACE will delete and recreate instances using the current template.

Updating MIG instances to a new version

When a new version of the custom image is released, such as when it has been updated with new software, the MIG can be updated in a controlled fashion until all members are running the updated version, without any outage.

The MIG module update_policy setting is very important for this process to ensure there is no outage:

  • max_surge_fixed is the number of additional instances created in the MIG and verified healthy before the old ones are removed.
    • Should be set to greater than or equal to the number of zones in distribution_policy_zones
  • max_unavailable_fixed should be set to null which equals 0: no live instances will be unavailable during the update.

The MIG module has options for two different instance templates in order to support performing a canary update where only a percentage of instances are upgraded to the new version:

  • instance_template_initial_version – template to use for initial deployment
  • instance_template_next_version – template to use for future canary update
  • next_version_percent – percentage of instances in the group (of target_size) that should use the canary update

Initially, both options may point to the same template and 0% is allocated to the “next” version.

If a load balancer is used, newly created instances that are verified healthy will automatically be selected to respond to client requests.

Canary update

To move a percentage of instances to the “next” version via a “canary” update:

  1. Set the instance_template_next_version to point to an instance template which uses an updated custom image
  2. Set the next_version_percent to an appropriate percentage of instances in the group that should use the “next” template.
  3. Make sure update_policy has type set to PROACTIVE – this will cause the change to take effect right away.

When applied via terraform, all instances will be recreated (adhering to the update_policy) but a percentage of instances will be created using the “next” template.

After the canary update has been validated and all instances should be upgraded, see the steps below for a Regular update.

Regular update

To update all instances at once (adhering to the update_policy):

  1. Set both the instance_template_initial_version and instance_template_next_version to point to an instance template which uses an updated custom image
  2. Set the next_version_percent to 0.
  3. Make sure update_policy has type set to PROACTIVE – this will cause the change to take effect right away.

When applied via terraform, all instances will be recreated (adhering to the update_policy).

Site to Site VPN between Google Cloud and pfSense on VMware at home

I’ve always wanted to set up a Site to Site VPN between a cloud provider and my home network. What follows is a guide inspired by Configure Google Cloud HA VPN with BGP on pfSense but customized for a Google Wi-Fi home network and updated with some pfSense changes that I had to figure out.

Home Network

When we built the house in 2015, I set up a 3-pack of original Google Wi-Fi (not the “Nest” version) to use as my router and access points throughout the house. Google Wi-Fi is great – it’s very easy to get started. Once deployed, it can generally be thought of as “set it and forget it”. However, it doesn’t provide all the bells and whistles that some of the more advanced home routers offer, but this can be a blessing in disguise because there is less to fiddle with and potentially mess up. Most importantly, it delivers a reliable experience for the family.

My home lab is a simple Intel NUC with a dual-core Intel Core i3-6100U 2.3 GHz CPU and 32 GB RAM. It runs a standalone instance of VMware ESXi 7. I run a few VMs when I need to, but nothing “production”.

Site-to-Site VPN with Google Cloud

Since switching to a full-time focus on cloud engineering and architecture, one of the things I’ve always wanted to try is to set up a Site-to-Site IPsec VPN tunnel with BGP between my home and a virtual private cloud (VPC) network to better understand the customer experience for VPN configuration and network management.

As I mentioned earlier, Google Wi-Fi is rather basic and doesn’t offer any VPN capability, but it can do port forwarding, and when combined with a virtual appliance, that’s all we really need.

pfSense overview

Since Google Wi-Fi does not have any VPN capabilities, I intend to use a pfSense virtual appliance in ESXi to act as a router for virtual machine clients on an internal ESXi host-only network. The host-only network will have no physical uplinks so the only way out to the Internet or the private cloud network is through the pfSense router.

pfSense will provide DHCP, DNS, NAT, and routing/default gateway services only to the clients on the internal host-only network.

Because of the way it is designed, no other router can sit between Google Wi-Fi and the internet without some loss of functionality, including mesh networking, and we do not want to disturb the other users of the network (family), so we will create an isolated network with pfSense on the ESXi host.

pfSense VM will have two virtual NICs:

  • NIC1 is connected to the “VM Network” and has Internet access through the home network.
  • NIC2 is connected to the internal isolated “host-only” network which does not have any connectivity to the Internet.
Network diagram showing connectivity between the Internet and pfSense running as a virtual machine straddling two networks in the ESXi host on the Intel NUC.

pfSense installation

Netgate has a comprehensive guide on how to install the pfSense virtual appliance on VMware ESXi.

Following are some installation tips that I found to be helpful:

  • Upload the pfSense ISO to an ESXi datastore – don’t forget to unzip it first.
  • When creating a new VM for pfSense on ESXi 7, select Guest OS family “Other” and Guest OS version “FreeBSD Pre-11 versions (64-bit)”
  • VM Hardware:
    • Set CPU to 2
    • Set Memory to 1 GB
    • Set Hard Disk to 8 GB
      • Make sure the SCSI adapter is LSI Logic Parallel
    • Set Network Adapter 1 to the home/internet network, mark it as Connect
    • Add a second Network Adapter for the host-only network, leave it as E1000, mark it as Connect
    • CD/DVD Drive 1 set to Datastore ISO file and browse for the pfSense ISO, mark is as Connect

Boot the VM off the ISO, accept the defaults and let it reboot.

On first boot, the WAN interface will have a DHCP IP from the home network (Google Wi-Fi assigns in the range) and the internal-facing LAN interface will have a static IP of If this is incorrect, use the “Assign interfaces” menu item in the console to set which NIC corresponds to WAN and LAN appropriately. Use the ESXi configuration page to find the MAC address of each NIC and which network it is connected to in order to configure them appropriately.

Port-forward IPSec ports to pfSense

After pfSense is installed, we need to port-forward the external Internet-facing IPSec ports on the Google Wi-Fi router to the pfSense VM.

Google has recently relocated management of Google Wi-Fi to the Google Home app. Look for the Wi-Fi area, click the “gear” icon in upper right, select “Advanced networking”, and then “Port management”.

Use the “+” button to add a new rule. Scroll through the IPv4 tab to find the new “pfSense” entry and select it. Verify the MAC address shown is the same as the pfSense VM’s WAN NIC connected to the home network (“VM Network”). Add an entry for UDP 500. Repeat for UDP 4500.

Note: It is not possible to configure port forwarding unless the internal target is online. The Google Home app will only show a list of active targets that are connected to the network. If the pfSense host is not present, verify the VM is powered on and connected to the home network.

Port forwarding rules for inbound UDP/500 and UDP/4500 forwarding to the pfSense NIC1 on the home network

By default, pfSense only allows management access through its LAN interface, so the next step is to deploy a Jump VM with a web browser on the host-only network. Use the VM console to access the Jump VM desktop and launch the browser since it will not be reachable on the home network (in case you wanted to RDP). Verify it has a IP. It should also be able to reach the internet but this is not required.

pfSense initial configuration

On the Jump VM, browse to, accept the certificate warning, and log in as admin with password pfsense. Step through the wizard.

Some tips:

  • Set the Hostname and Domain to something different than the rest of the network.
  • Configure WAN interface: Uncheck “Block RFC1918 Private Networks”
  • Set a secure password for admin
  • Select Interfaces | WAN
    • Uncheck “Block bogon networks” if selected
    • Click Save and then Apply

Google Cloud VPN configuration

Use the Google Cloud Console for the following steps:

  • Networking | VPC Networks
    • Create a new VPC network or use an existing one. Should have Dynamic routing mode set to Global.
  • Networking | Hybrid Connectivity | VPN
    • Create a new VPN Connection
      • Classic VPN
      • Select VPC network created earlier
      • Create a new external IP address or use an available one
      • Tunnels – set Remote peer IP address to the home external internet IPv4 address (from home, visit and note the IPv4 address)
      • Generate and save the pre-shared key – it is needed for pfSense.
      • Select Dynamic (BGP) routing option and create a new Cloud Router. Set Google ASN to 65000. Create a new BGP session, set Peer ASN (pfSense) to 65001. Enter Cloud Router BGP IP of, and BGP peer IP (pfSense) of
      • Note the external public IP address of the Cloud VPN.

pfSense IPsec configuration

Use the Jump VM web browser for these steps in the pfSense web interface:

  • System | Advanced | Firewall & NAT tab: Allow APIPA traffic
  • VPN | IPSsec, Add P1
    • Set Remote Gateway to the Google Cloud VPN external public IP recorded previously.
    • Set “My identifier” to be “IP address” and enter the external public IPv4 address of the home network recorded earlier.
    • Enter the Pre-Shared Key generated for the Google Cloud VPN tunnel
      • It may not be possible to paste the key in to the VM console – visit and create a new “Burn after reading” paste with the key and then access the paste from the Jump VM to retrieve the key.
    • Set the Phase 1 Encryption Algorithm to AES256-GCM
    • Set Life Time to 36000
  • Save and apply changes
  • Show P2 entries, Add P2
    • Mode: Routed (VTI)
    • Local network: Address, BGP IP
    • Remote network: Address, BGP IP
    • Protocol: ESP
    • Encryption Algorithms
      • AES, 128 bits
      • AES128-GCM, 128 bits
      • AES192-GCM, Auto
      • AES256-GCM, Auto
    • Hash Algorithms: SHA256
    • PFS key group: 14 (2048 bit)
  • Save, Apply changes
  • Click on Firewall | Rules, select IPsec from along the top, Add a new rule
    • Set Protocol to Any
  • Save rule, Apply changes

pfSense BGP configuration

Go to System | Package Manager, click on Available Packages, search for “frr”. Install “frr”. This will connect out to the Internet to retrieve the packages. Wait for it to complete successfully.

Go to Services | FRR Global/Zebra

  • Global Settings
    • Enable FRR
    • Enter a master password.
    • Set Syslog Logging to enabled and set Package Logging Level to Extended
  • Click on Access Lists along the top
    • Add a new Access List
      • Name: GCP
      • Access List Entries: set Sequence to 0, set Action to Permit, check box for Source Any
      • Click Save
  • Click on Prefix Lists along the top
    • Add a new Prefix List
      • Name: IPv4-any
      • Prefix List Entries: set Sequence to 0, set Action to Permit, check box for Any
      • Click Save
  • Click on BGP along the top
    • Enable BGP Routing
    • Set Local AS to 65001 (GCP Cloud Router was set to 65000)
    • Set Router ID to (GCP Cloud Router was set to
    • Set Hold Time to 30
    • At the bottom, set Networks to Distribute to
    • Click Save
  • Click Neighbors along the top, add a new Neighbor
    • Name/Address:
    • Remote AS: 65000
    • Prefix List Filter: IPv4-any, for both Inbound & Outbound
    • Path Advertise: All Paths to Neighbor
    • Save

Checking status

In pfSense, click on Status | FRR

In the Zebra Routes area, you should see “B>*” entries for subnets in the GCP VPC “via” (BGP IP of GCP Cloud Router)

In the BGP Routes area, should see Networks listed for GCP VPC subnets, with Next Hop of (BGP IP of GCP Cloud Router) and Path of 65000 (GCP Cloud Router ASN)

BGP Neighbors should list as a neighbor with remote AS 65000, local AS 65001 and a number of “accepted prefixes” which are the VPC subnets.

Visit the Cloud VPN area in Google Cloud Console, the VPN Tunnel should show Established, and the BGP session should also show BGO established.

Visit the VPC and click on its Routes. There should be one listed for the on-premise pfSense LAN, via next hop

Validating connectivity

At this point, VMs in GCP should be able to communicate with VMs in the on-premise pfSense LAN network.

Create a GCE instance with no public IP and attach it to the VPC subnet. Make sure firewall rules apply to the instance permit ingress traffic from network and permit the appropriate ports and protocols:

  • icmp
  • TCP 22 for SSH
  • TCP 3389 for RDP

Wrapping up

If things are not connecting, double-check everything, but also be sure to check the logs in pfSense and in GCP Cloud Logging. The most frequent issue I encountered was a mismatch of proposals by not selecting the right ciphers for the tunnel, or not setting my identifier properly. Also consider how firewall rules will impact communication.

Finally, the settings outlined here are obviously not meant for production use. I don’t claim to understand BGP any more than what it took to get pfSense working with Cloud VPN, so some of the settings I recommend could be enhanced and tightened from a security perspective. As always, your mileage may vary.

Changing GCP machine and disk size

Changing a Google Cloud (GCP) Compute Engine (GCE) virtual machine size or disk size are typical “Day 2” activities that an operations team may perform as the needs of the application running in the VM evolve past what was initially specified during deployment.

As a best practice, all infrastructure deployment and modifications should be performed via Infrastructure-as-Code (IaC) where resources are defined using a declarative language such as Terraform and then a deployment process runs to create or update the resource using cloud APIs.

Changing machine size

For a given GCP Terraform google_compute_instance, change the machine_type value to one which meets the cpu/memory requirement:

  • See API machine type names (third-party site)
  • See GCP Terraform provider documentation
    • In the google_compute_instance set allow_stopping_for_update = true to avoid having to manually stop the VM prior to making the update in Terraform. With this argument set, Terraform will stop the instance during terraform apply and then start the instance when complete.

Increasing disk size

See Working with persistent disks  |  Compute Engine Documentation

Disk sizes may only be increased, not decreased.

Google recommends taking a snapshot of a disk prior to increasing its size. The snapshot is for safekeeping in case there is an issue with the overall process so that the data is not lost.

If a smaller size is set, terraform will plan to destroy the disk and create a new one.

  • This can be prevented by setting the lifecycle argument on the google_compute_disk resource causing the plan to fail:
lifecycle {
 prevent_destroy = true

Increasing the size of a disk can be done via Google Cloud Console, gcloud command line, or API/terraform. For IaC purposes, only terraform should be used.

Increasing boot disk size

VMs using public images automatically resize the root partition and file system after you’ve resized the boot disk on the VM and restarted the VM. If you are using an image that does not support this functionality, you must manually resize the root partition and file system.

Working with persistent disks | Compute Engine documentation

If the VM was created in terraform and did not have a boot disk created separately with a specific size, setting a new boot disk size in the google_compute_instance resource will cause terraform to recreate the VM.

VMs should be created in terraform with separate/independently-created boot & data disk google_compute_disk resources in order to safely increase the size of the disks in the future.

Create VMs in terraform with separate/independent boot & data disks.


data "google_compute_image" "debian9" {
 project = "debian-cloud"
 name = "debian-9-stretch-v20211105"

resource "google_compute_disk" "test-np5-boot" {
 project = <project_id>
 name = "test-np5-boot"
 type = "pd-standard"
 zone = "us-central1-a"
 size = 30

 image = data.google_compute_image.debian9.self_link

resource "google_compute_disk" "test-np5-data1" {
 project = <project_id>
 name = "test-np5-data1"
 type = "pd-standard"
 zone = "us-central1-a"
 size = 10

resource "google_compute_instance" "test-np5" {
 name = "test-np5"
 machine_type = "e2-micro"
 zone = "us-central1-a"
 project = <project_id>

 allow_stopping_for_update = true

 boot_disk {
  source =

 attached_disk {
  source =

 network_interface {
  subnetwork = "uscentral1"
  subnetwork_project = shared_vpc_host_project

 metadata = {
  serial-port-logging-enable = true
  serial-port-enable = true

Be sure to specify a specific name for the google_compute_image (as shown) so that the boot disk is not flagged to be recreated when a new version is released.

By default, a boot disk created separately from the VM will still be deleted when the instance is deleted. Set auto-delete = false in the boot_disk section of the google_compute_instance to prevent this behavior.

To increase the size of the boot disk, change the size value for the google_compute_disk called by google_compute_instance boot_disk argument:

resource "google_compute_disk" "test-np5-boot" {
 project = <project_id>
 name = "test-np5-boot"
 type = "pd-standard"
 zone = "us-central1-a"
 size = 40

 image = data.google_compute_image.debian9.self_link

Terraform will update the size of the boot disk. The VM will not be restarted automatically, even if google_compute_instance has allow_stopping_for_update set to true because the change is being made to the google_compute_disk resource, not the VM instance.

Manually restart the VM during a maintenance window. If using a public image, or an image customized from a public image, the OS boot disk and partition should be expanded automatically.

If not, see Resize the file system and partitions.

Adding a new data disk

In terraform, create a new disk using the google_compute_disk resource. Example:

resource "google_compute_disk" "data1" {
 project = <project_id>
 name = "test-np4-data1"
 type = "pd-standard"
 zone = "us-central1-a"
 size = 10

 lifecycle {
  prevent_destroy = true

Modify the terraform VM google_compute_instance resource to include the attached_disk argument which references the google_compute_disk resource.


attached_disk {
 source =

Increasing data disk size

Modify the terraform google_compute_disk data disk size argument:

resource "google_compute_disk" "test-np5-data1" {
 project = <project_id>
 name = "test-np5-data1"
 type = "pd-standard"
 zone = "us-central1-a"
 size = 20

Terraform will update the size of the data disk. The VM will not be restarted automatically, even if google_compute_instance has allow_stopping_for_update set to true because the change is being made to the google_compute_disk resource, not the VM instance.

Modern OSs should automatically detect the capacity change of the data disk. If not, perform a rescan using the method provided by the operating system.

For exact steps to increase the size of a filesystem after increasing the disk size, see Resize the file system and partitions (select “Linux instances” or “Windows instances”).