Author Archives: chouse

GCP IPSec VPN to on-prem pfSense for Internet egress

Overview

Several different components are involved:

Google Cloud VPN (basic, route-based)
Google Cloud VPC with one or more subnets
pfSense Community Edition (Virtual Machine)
Google Cloud Compute Engine (GCE) (Virtual Machine for testing)
One or more internet providers for the on-prem environment.

Essentially, Google Cloud VPN will establish an IPSec tunnel to the WAN/external interface of pfSense. A static route of 0.0.0.0/0 will be created in the Google Cloud VPC to direct all egress traffic (such as from a GCE instance) to the VPN tunnel. pfSense will have its own VPN configuration to establish the on-prem end of the tunnel, as well as some firewall and NAT rules to properly pass GCP traffic to the Internet.

On-prem

In my lab, I have a pfSense virtual machine (running on vSphere 7) with a virtual NIC connected to my LAN. My LAN uses a cable provider for Internet. I also have a Wireless 5G Internet provider which I have connected directly to the pfSense virtual machine using a USB Ethernet adapter in USB passthrough mode directly to the virtual machine.

The pfSense WAN interface is set to the USB adapter and has a public IP address from the Wireless 5G provider. It also has the LAN interface with a static IP from the 192.168.86.0/23 subnet of my internal network.

I do not use pfSense as my main router, it will only be used as a VPN Gateway for connecting to GCP.

GCP

In GCP, I have a single VPC with a single subnet: 10.252.1.0/24. I have several VPC firewall rules created:

permit Identity-Aware Proxy so that I can use browser-based SSH to connect to GCE instances that have private IPs
permit all traffic from my on-prem subnet 192.168.86.0/23.

There is a test GCE instance on the subnet at 10.252.1.30.

A Classic Cloud VPN was created with a public IP and IKEv2 tunnel with pre-shared key. This is a route-based tunnel (not BGP) and the remote network range was set to 0.0.0.0/0. This tells GCP that all IP addresses (the Internet) can be reached through the VPN tunnel. This automatically creates a route for 0.0.0.0/0 in the VPC with next hop of the VPN tunnel. The VPC’s default route for 0.0.0.0/0 with next hop of “default internet gateway” should be deleted.

Note that GCP Classic Cloud VPN configurations are not recommended. Instead, a Cloud VPN HA configuration with BGP should be used. pfSense is compatible with Cloud VPN HA and BGP. See Site to Site VPN between Google Cloud and pfSense on VMware at home for more details including how to install and configure pfSense from scratch as well as use pfSense with BGP.

pfSense configuration

From the VPN menu, choose IPsec. Create a new Phase1 (P1). Leave the Interface as WAN. In the Remote Gateway field, enter the public IP of the GCP Cloud VPN gateway. Provide the pre-shared key that was used to create the GCP Cloud VPN tunnel.

Set Encryption Algorithm to AES256-GCM and Life Time to 36000.

Save it and then Add a Phase2 (P2). Set Mode to Tunnel IPv4. Set Local Network to Network with address 0.0.0.0/0. This tells GCP that this tunnel accepts traffic destined for 0.0.0.0/0. Set the Remote Network to Network and put in the remote (GCP) subnet. In my case, I set it to 10.252.1.0/24.

Check the box for AES256-GCM and set it to 128 bits. Save it.

From the Status drop-down, choose IPsec. The tunnel should come up shortly. GCP should also show the tunnel as up.

From the Firewall menu, choose Rules and then IPsec. Add a new rule for IPv4 for any source/port to any destination/port, any protocol.

In my LAN I added a static route in my LAN router for 10.252.1.0/24 with next-hop of my pfSense LAN IP. Then I was able to ping the GCE instance from my computer. The GCE instance could also ping IPs in my 192.168.86.0/23 network through pfSense.

At this point, communication should be working between GCP and on-prem networks, but there is one more step to enable Internet egress.

From the Firewall menu, choose NAT and then Outbound. Set Outbound NAT mode to Hybrid (automatic + custom rules). Create a new Mapping. Set Interface to WAN and Source to 10.252.1.0/24. Save and apply the changes.

Now, the GCE instance can reach the Internet through pfSense. To verify, run curl ifconfig.me and verify that the public IP address returned is the one assigned to the pfSense WAN interface.

Recall that I said I had two internet providers: Cable and Wireless 5G. With pfSense, I can easily toggle which provider is used by GCP for Internet access. From the System menu, choose Routing. The Gateways will be listed. By default, only the WAN_DHCP gateway was listed with the public IP for the Wireless 5G provider. I added a new gateway using the LAN interface and my LAN default gateway of 192.168.86.1.

When I change the default gateway to my LAN gateway and apply the changes, my GCE instance now accesses the internet through pfSense’s LAN interface using the cable internet provider instead of the WAN interface Wireless 5G. Technically at this point the Outbound NAT mapping created earlier is not required, but it doesn’t hurt to leave it in place. Toggling the pfSense default gateway back to WAN and applying the change causes the GCE instance to again access the internet using the Wireless 5G connection.

pfSense is a great environment to experiment with site-to-site VPNs, NAT, and routing, and is pretty robust and capable of accomplishing a lot of different lab activities.

Automating Compute Engine Local SSD configuration in Windows

Overview

Similar to other public cloud providers, Google Cloud offers the ability to attach local high-speed disks to virtual machines (VMs). This offering is called Local SSD.

The “Local” in Local SSD means that the disks are physically attached to the hypervisor host where the VM is executing. This provides the VM with high-speed/low-latency data operations over the NVME interface.

On Windows GCE instances (VMs), Local SSD volumes are typically used for temporary or ephemeral data such as the Windows Pagefile, Microsoft SQL Server Temp database, or other high-speed caching needs.

Performance

Certain GCE instance machine sizes come with different quantities of Local SSD disks, which are always 375 GB in size. They can be striped together to create larger volumes with higher performance using logical volume management tools in the Operating System.

Risks

However, again like other public clouds, Local SSD storage is ephemeral and comes with some risks.

Data on volumes created with Local SSD disks can be irretrievably lost and no guarantee is made as to the safety or durability of that data.

Specifically, if the OS is shut down from within the Guest Operating System, the GCE instance will power off and the Local SSD disks and the data they contain will be lost. When the VM is started back up via the Cloud Console, it will have new, empty Local SSD disks which must be initialized again in order to be used.

There are other situations where data on Local SSD disks can be lost, such as if the hypervisor host has an issue and Google Cloud cannot migrate the Local SSD data along with the VM to a new host.

Read more about Local SSD data persistence.

So, generally speaking, Local SSDs offer great performance but the usage of them needs to be understood and the loss of data should not impact the application.

Usage with localssd_init

In Google Cloud, Local SSD disks show up as raw disks that need to be initialized and formatted before they can be used.

To automate the initialization of Local SSDs on Windows GCE instances and create usable volumes, I have developed a simple PowerShell script called localssd_init.ps1 which can be found in my GitHub repository gce-localssd-init-powershell.

The script is configured to check if certain drive letters are present. If the drive letters are missing, it will recreate them using a specific configuration of Local SSD disks.

To use the script, simply copy it to your server and then sign it or configure the ExecutionPolicy.

Next, edit the localssd_init.ps1 script and configure the volumes that will be created using Local SSD disks, starting around line 50:

$LocalSSDConfig = @(
    [LocalSSDVol]@{Name = "SQLTempDB"; DriveLetter = 'E'; LocalSSDQty = 2; NTFSAlloc = '65536' },
    [LocalSSDVol]@{Name = "Pagefile"; DriveLetter = 'P'; LocalSSDQty = 2; NTFSAlloc = '8192'; PostScript = "C:\path\to\pagefile.ps1" }
)

Add, remove, or modify entries in the array. Be sure to have a comma after each line, except for the last line, and separate the key=value pairs with a semicolon.

For each entry,

Update the Name and DriveLetter where the volume will be mounted. The name will be set for the Storage Pool and Volume.
Set the LocalSSDQty to the number of 375 GB Local SSD disks to stripe across and set the NTFSAlloc to the needed NTFS Allocation unit size.

Optionally, set PostScript to the path to a custom script to run after the volume has been created. The custom script could change the system Pagefile configuration to use the new drive (a reboot will be be required) or restart SQL Server so it recreates the Temp database files.

The script uses Windows Storage Spaces to create a new Storage Pool using the necessary quantity of Local SSD volumes using “Simple” resiliency which is the same as Striping and then creates a new Volume using the maximum size of the pool and formats it using NTFS with the required allocation size and mounts it at the required drive letter.

During server creation, the localssd_init.ps1 script can be used to create the Local SSD-backed volumes and then run at every subsequent boot to make sure they exist and recreate them if they are missing.

Simply run the script and the disks will be created. The script will log to the standard output and also create entries in the System Event Log.

To enable the script to run at every boot, use this snippet to configure a new Windows Task Manager job to run at startup. Be sure to update the path to the script:

  Set-ExecutionPolicy RemoteSigned
  $path="C:\path\to\localssd_init.ps1"
  $trigger = new-jobtrigger -atstartup -randomdelay 00:00:10
  register-scheduledjob -trigger $trigger -filepath $path -name localssd_init
  get-scheduledjob

Conclusion

If you are on the fence about using Local SSD on Windows GCE instances because those disks might need to be initialized again after an outage, check out localssd_init.ps1 to make sure they’re always available for use.

Configuring Private Google Access

2 Replies

Private Google Access (PGA) is the method by which resources in Google Cloud such as Google Cloud Compute Engine (GCE), Google Cloud VMware Engine (GCVE) or on-premise, can access Google Cloud APIs without having Internet access or needing an assigned external IP address.

For on-premise clients, this connectivity can take place through private hybrid connections such as Interconnect which may be faster than the subscribed Internet service.

For an overview, check out https://cloud.google.com/vpc/docs/private-google-access

Ultimately, PGA is delivered through a set of public IP addresses which are not advertised on the Internet but are available via the Google Cloud VPC in your project.

199.36.153.4/30 for restricted.googleapis.com is used for services that are supported by VPC Service Controls and blocks access to services that do not support VPC service controls.
199.36.153.8/30 for private.googleapis.com is used for most services (except Workspace web or interactive websites) regardless of their support for VPC Service Controls.

In this post, we’ll cover working with private.googleapis.com which supports most APIs.

To enable clients to use PGA to access common APIs, two things need to be configured:

Routing: Clients need to have a network route to reach these IP addresses.
DNS: Clients need to resolve the public API fully-qualified domain name (FQDN, such as storage.googleapis.com) to one of the 4 IP addresses in the 199.36.153.8/30 range: 199.36.153.8, .9, .10, and .11.

Routing

In order to access the 199.36.153.8/30 range for private.googleapis.com, we need to first determine where our clients are located on the network and how they get to the Internet.

Google Compute Engine

Compute Engine clients on a VPC must meet the requirements to use PGA. Two of these requirements are the VM must not have an external public IP address and it must use a subnet which has PGA enabled. Enabling PGA is done per-subnet, not per-VPC. Clients will access the 199.36.153.8/30 range through the VPC’s standard default internet gateway route.

If the default internet gateway route has been overridden by a custom route that directs Internet-bound traffic (0.0.0.0/0) to a firewall appliance, then an additional custom route must be created for the 199.36.153.8/30 range with a next hop of default-internet-gateway and a priority that is higher than the custom route for 0.0.0.0/0.

This way, traffic for 199.36.153.8/30 will go out the default Internet gateway of the VPC. It doesn’t actually go to the Internet – instead the underlying Google Cloud network will accept that traffic and direct it to the private internal interfaces of Google APIs.

In the screenshot below, the routes are ordered from highest priority (lowest number) to lowest priority (highest number). The private-google-access route has a priority of 90 which is a higher priority than the route for 0.0.0.0/0 with a priority of 100 that leads to another network. Therefore, traffic for the PGA range 199.36.153.8/30 will go out the default Internet gateway of the VPC and not onward to the other network like all other Internet traffic.

Google Cloud VMware Engine

If Google Cloud VMware Engine (GCVE) is configured to use VMware Engine Internet access, then traffic for 199.36.153.8/30 will flow through the GCVE Internet access service.

If Internet access for GCVE workloads has been disabled and Internet-bound traffic is configured to use another network, then traffic for 199.36.153.8/30 will flow through the GCVE VPC peering connection to your peered VPC and route based on the routing configuration of your VPC.

The VPC in your project should be advertising a route for for 0.0.0.0/0 with a priority that overrides the default internet gateway route. Therefore, an additional custom route must be created for the 199.36.153.8/30 range with a next hop of default-internet-gateway and a priority that is higher than the custom route for 0.0.0.0/0.

See the previous screenshot for an example.

See Private Google Access: a closer look for more details about PGA for GCVE.

On-premise

Typically, on-premise clients trying to reach the 199.36.153.8/30 range for private.googleapis.com would take the default route out to the internet. But, since this range is not available on the internet, communication will fail.

Supporting Private Google Access with on-premise clients requires hybrid connectivity between the Google Cloud project VPC and the on-premise environment such as through Interconnect or Cloud VPN.

A route for 199.36.153.8/30 must be advertised by the Cloud Router that is associated with the hybrid connectivity. This way, clients on-premise will have a route to 199.36.153.8/30 that leads back over the hybrid connectivity, instead of out to the internet.

Additionally, if the VPC associated with the hybrid connectivity (Interconnect or Cloud VPN) has its default internet gateway route overridden by a custom route that directs Internet-bound traffic (0.0.0.0/0) to a firewall appliance, then an additional custom route must be created for the 199.36.153.8/30 range with a next hop of default-internet-gateway and a priority that is higher than the custom route for 0.0.0.0/0.

This way, on-premise clients trying to reach the 199.36.153.8/30 range will be directed over the hybrid connectivity and then out the VPC’s default internet gateway to access PGA.

Testing

To test connectivity for any client to the 199.36.153.8/30 range for private.googleapis.com once routing has been configured, use the Test-NetConnection PowerShell cmdlet with port 80 or 443 (only HTTP/S is supported, ICMP Ping is not). The TcpTestSucceeded value should return True if the routing was configured successfully:

Test-NetConnection -computername 199.36.153.8 -port 443


ComputerName     : 199.36.153.8
RemoteAddress    : 199.36.153.8
RemotePort       : 443
InterfaceAlias   : Ethernet
SourceAddress    : 10.210.32.3
TcpTestSucceeded : True

Verification of connectivity should be completed prior to making any DNS changes to avoid an outage for clients trying to reach Google APIs.

DNS

Once routing is in place and tested, changes in DNS must be made. These changes should be done on the DNS server used by clients.

To override the DNS resolution of a Google API fully-qualified domain name (FQDN), we need to create a new zone in DNS along with A- and CNAME-records that resolve the API FQDN to the private IPs. This way, clients will no longer use the publicly-advertised DNS zone and records which resolve to the public IP of the API and will instead use the private IP from the DNS server’s overriding zone.

If a client is configured to check its local hosts file first for DNS resolution prior to using the internal or public DNS, entries in the hosts file can be configured to point a public API FQDN to a PGA IP. This is a safe way to fully test PGA on a single host before affecting all clients by updating the common internal DNS server.

In this example, we override resolution of storage.googleapis.com and point it to one of the private.googleapis.com IP addresses, 199.36.153.8.

% cat /etc/hosts
##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting.  Do not change this entry.
##
127.0.0.1       localhost
255.255.255.255 broadcasthost
::1             localhost
199.36.153.8  storage.googleapis.com

Any requests on this client to storage.googleapis.com will now go to 199.36.153.8 and should be successful as long as routing is in place.

To enable all clients which use a common DNS server to use PGA IPs instead of the public IP, create a DNS zone for googleapis.com in the private DNS server.

Within the new googleapis.com zone, create A-records for private.googleapis.com using the four different IP addresses in the 199.36.153.8/30 range: 199.36.153.8, 199.36.153.9, 199.36.153.10, and 199.36.153.11.

On-premise Active Directory clients

If clients are part of an Active Directory (AD) domain, AD DNS can be used to create the zone and A-records:

Be sure to create A-records for each of the IPs:

Next, create a wildcard CNAME-record for *.googleapis.com which points to the private.googleapis.com A records.

Now, when any client tries to resolve any googleapis.com hostname, it will be given one of the PGA IPs.

Compute Engine with Google Cloud DNS

If clients use Google Cloud DNS, such as Compute Engine instances, this same configuration can be made in Cloud DNS.

Create a new private zone for googleapis.com, and then create a record set for private.googleapis.com and add the four IP addresses, and then also create a record set for the wildcard CNAME *.googleapis.com.

Terraform can be used to create zones and record sets. Following is an example for setting up googleapis.com:

resource "google_dns_managed_zone" "dns-zone-googleapis-com" {
  name        = "googleapis-com"
  project     = google_project.svpc.project_id
  dns_name    = "googleapis.com."
  description = "Private Google Access"

  visibility = "private"

  private_visibility_config {
    networks {
      network_url = google_compute_network.shared_hub.id
    }
  }
}

resource "google_dns_record_set" "private-google-access-a" {
  project      = google_project.svpc.project_id
  name         = "private.${google_dns_managed_zone.dns-zone-googleapis-com.dns_name}"
  managed_zone = google_dns_managed_zone.dns-zone-googleapis-com.name
  type         = "A"
  ttl          = 300

  rrdatas = ["199.36.153.8", "199.36.153.9", "199.36.153.10", "199.36.153.11"]
}

resource "google_dns_record_set" "private-google-access-cname" {
  project      = google_project.svpc.project_id
  name         = "*.${google_dns_managed_zone.dns-zone-googleapis-com.dns_name}"
  managed_zone = google_dns_managed_zone.dns-zone-googleapis-com.name
  type         = "CNAME"
  ttl          = 300
  rrdatas      = [google_dns_record_set.private-google-access-a.name]
}

Google Cloud VMware Engine

Google Cloud VMware Engine clients will need to use a DNS server such as Active Directory or Google Cloud DNS (through a Cloud DNS inbound server policy) to resolve overriding domain names.

Additional API domains

Additional zones may need to be configured in the DNS server other than googleapis.com depending on what the client is trying to access. This could include gcr.io and others. See Domain Options for private.googleapis.com.

For example, to override the domain gcr.io, create A records for the domain itself and then a wildcard CNAME for *.gcr.io which points to the domain A records.

When using any Google Cloud service, be sure to review which API FQDNs are in use and determine if additional overriding zones need to be created in the DNS server.

For further implementation details about Private Google Access, see https://cloud.google.com/vpc/docs/configure-private-google-access

Working with Google Cloud Managed Instance Groups

2 Replies

Google Cloud Managed Instance Groups (MIGs) are groups of identical virtual machine instances that serve the same purpose.

Instances are created based on an Instance Template which defines the configuration that all instances will use including image, instance size, network, etc.

MIGs that host services are fronted by a load balancer which distributes client requests across the instances in the group.

MIG instances can also run batch processing applications which do not serve client requests and do not require a load balancer.

MIGs can be configured for autoscaling to increase the number of VM instances in the group based on CPU load or demand.

They can also auto-heal by replacing failed instances. Health checks are used to make sure each instance is responding correctly.

MIGs should be Regional and use VM instances in at least two different zones of a region. Regional MIGs can have up to 2000 instances.

Terraform

Two different modules authored by Google can be used to create an Instance Template and MIG:

Instance template: terraform-google-modules/vm/google//modules/instance_template
Multi-version MIG: terraform-google-modules/vm/google//modules/mig_with_percent

To optionally create an Internal HTTP load balancer, use: GoogleCloudPlatform/lb-internal/google

The following examples below create a service account, two instance templates, a MIG, and an Internal HTTP load balancer.

Pre-requisites

A custom image should be created with nginx installed and running at boot.
A VPC with a proxy-only subnet is required.
The instance template requires a service account.
- The API iam.googleapis.com must be enabled on the project

# Enable IAM API
resource "google_project_service" "project" {
 project = "my-gcp-project-1234"
 service = "iam.googleapis.com"
 disable_on_destroy = false
}

# Service Account required for the Instance Template module
resource "google_service_account" "sa" {
 project = "my-gcp-project-1234"
 account_id = "sa-mig-test"
 display_name = "Service Account MIG test"
 depends_on = [ google_project_service.project ]
}

Update project, account_id, and display_name with appropriate values.

Instance Templates

The instance template defines the instance configuration. This includes which network to join, any labels to apply to the instance, the size of the instance, network tags, disks, custom image, etc.

The MIG deployment requires an instance template.

The instance template requires that a source image have already been created.

In this terraform code example, two instance templates are created:

“A” template – initial version to use in the MIG
“B” template – future upgrade version to use with an optional canary update method

During the initial deployment, each instance template can point to the same custom image for the source_image value. In the future, each instance template should point to a different custom image.

# Instance Template "A"
# Module src: https://github.com/terraform-google-modules/terraform-google-vm/blob/master/modules/instance_template
# Registry: https://registry.terraform.io/modules/terraform-google-modules/vm/google/latest/submodules/instance_template
# Creates google_compute_instance_template
module "instance_template_A" {
 source = "terraform-google-modules/vm/google//modules/instance_template"
 region = "us-central1"
 project_id = "my-gcp-project-1234"
 subnetwork = "us-central-01"
 
 service_account = {
  email = google_service_account.sa.email
  scopes = ["cloud-platform"]
 }

 name_prefix = "nginx-a"
 tags = ["nginx"]
 labels = { mig = "nginx" }
 machine_type = "f1-micro"
 startup_script = "sed -i 's/nginx/'$HOSTNAME'/g' /var/www/html/index.nginx-debian.html"

 source_image_project = "my-gcp-project-1234"
 source_image = "image-nginx"
 disk_size_gb = 10
 disk_type = "pd-balanced"
 preemptible = true
}

# Instance Template "B"
module "instance_template_B" {
 source = "terraform-google-modules/vm/google//modules/instance_template"
 region = "us-central1"
 project_id = "my-gcp-project-1234"
 subnetwork = "us-central-01"
 
 service_account = {
  email = google_service_account.sa.email
  scopes = ["cloud-platform"]
 }

 name_prefix = "nginx-b"
 tags = ["nginx"]
 labels = { mig = "nginx" }
 machine_type = "f1-micro"
 startup_script = "sed -i 's/nginx/'$HOSTNAME'/g' /var/www/html/index.nginx-debian.html"

 source_image_project = "my-gcp-project-1234"
 source_image = "image-nginx"
 disk_size_gb = 10
 disk_type = "pd-balanced"
 preemptible = true
}

Update the following with appropriate values:

Module name
region
project_id
subnetwork – the VPC subnet to use for instances deployed via the template
name_prefix– prefix the name of instance template, it will have a version attached to the name.
- Be sure to include any specific versioning to indicate what is in the custom image.
- Lowercase only.
tags – any required tags
labels – network labels to apply to instances deployed via the template
machine_type – machine size to use
startup_script – startup script to run on each boot (not just deployment)
source_image_project – project where the image resides
source_image – image name
disk_size_gb – size of the boot disk
disk_type – type of boot disk
preemptible – if set to true, instances can be pre-empted as needed by Google Cloud.
- Preemptible instances can run for up to 24 hours before being stopped.
- The MIG will recreate replacements when preemptible capacity is available again.
- This is a cost-saving measure and should be used where possible.
- See Instance groups | Compute Engine Documentation | Google Cloud

More instance template module options are available:

See the module source for variables.tf and main.tf
See the resource definition for google_compute_instance_template

Changes to the instance template will result in a new version of the template. The MIG will be modified to use the new version. All MIG instances will be recreated. See the update_policy section of the MIG module definition (below) to control the update behavior.

Managed Instance Group

The MIG creates the set of instances using the same custom image and image template. Instances are customized as usual during first boot.

A custom startup script can run every time the instance starts and configure the VM further. See Overview | Compute Engine Documentation | Google Cloud

In this Regional MIG terraform example, the initial set of instances are deployed using the “A” template set as the instance_template_initial_version.

The same “A” template is also set for the instance_template_next_version with a value of 0 for the next_version_percent.

In a future canary update, set the instance_template_next_version to the “B” template with an appropriate value for next_version_percent.

# Regional Managed Instance Group with support for canary updates 
# Module src: https://github.com/terraform-google-modules/terraform-google-vm/tree/master/modules/mig_with_percent 
# Registry: https://registry.terraform.io/modules/terraform-google-modules/vm/google/latest/submodules/mig_with_percent 
# Creates google_compute_health_check.http (optional), google_compute_health_check.https (optional), google_compute_health_check.tcp (optional), google_compute_region_autoscaler.autoscaler (optional), google_compute_region_instance_group_manager.mig 

module "mig_nginx" { 
 source = "terraform-google-modules/vm/google//modules/mig_with_percent" 
 project_id = "my-gcp-project-1234" 
 hostname = "mig-nginx" 
 region = "us-central1" 
 target_size = 4
 
 instance_template_initial_version = module.instance_template_A.self_link 

 instance_template_next_version = module.instance_template_A.self_link 
 next_version_percent = 0 
 
 //distribution_policy_zones = ["us-central1-a", "us-central1-f"]
 
 update_policy = [{ # See https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_region_instance_group_manager#nested_update_policy 
  type = "PROACTIVE" 
  instance_redistribution_type = "PROACTIVE" 
  minimal_action = "REPLACE" 
  max_surge_percent = null 
  max_unavailable_percent = null 
  max_surge_fixed = 4 
  max_unavailable_fixed = null 
  min_ready_sec = 50 
  replacement_method = "SUBSTITUTE" 
  }] 
 
 named_ports = [{ 
  name = "web" 
  port = 80 
 }] 
 
 health_check = { 
  type = "http" 
  initial_delay_sec = 30 
  check_interval_sec = 30 
  healthy_threshold = 1 
  timeout_sec = 10 
  unhealthy_threshold = 5 
  response = "" 
  proxy_header = "NONE" 
  port = 80 
  request = "" 
  request_path = "/" 
  host = "" 
 } 
 
 autoscaling_enabled = "false" 
 /* 
 max_replicas = var.max_replicas 
 min_replicas = var.min_replicas 
 cooldown_period = var.cooldown_period 
 autoscaling_cpu = var.autoscaling_cpu 
 autoscaling_metric = var.autoscaling_metric 
 autoscaling_lb = var.autoscaling_lb 
 autoscaling_scale_in_control = var.autoscaling_scale_in_control 
 */ 
}

Update the following with appropriate values:

Module name
project_id
hostname – the prefix for provisioned VM names/hostnames. Will have a random set of 4 characters appended to the end.
region
target_size – number of instances to create in the MIG. Does not need to equal the number of zones in distribution_policy_zones.
instance_template_initial_version – template to use for initial deployment
instance_template_next_version – template to use for future canary update
next_version_percent – percentage of instances in the group (of target_size) that should use the canary update
distribution_policy_zones – zone names in the region where VMs should be provisioned.
- Optional. If not specified, the Google-authored terraform module will automatically select each zone in the region.
  - Example: us-central1 region has 4 zones so each zone will be populated in this field. This directly impacts the update_policy and its max_surge_fixed value.
- This value cannot be changed later. The module will ignore any changes.
  - The MIG will need to be destroyed and recreated to update the zones to use.
- More than two zones can be specified.
- The target_size does not need to match the number of zones specified.
- See About regional MIGs | Compute Engine Documentation | Google Cloud .
update_policy – specifies how instances should be recreated when a new version of the instance template is available.
- type set to
  - PROACTIVE will update all instances in a rolling fashion.
    - Leave max_unavailable_fixed as null which results in a value of 0, meaning no live instances can be unavailable.
    - Recommended
  - OPPORTUNISTIC means “only when you manually initiate the update on selected instances or when new instances are created. New instances can be created when you or another service, such as an autoscaler, resizes the MIG. Compute Engine does not actively initiate requests to apply opportunistic updates on existing instances.”
    - Not recommended
- max_surge_fixed indicates the number of additional instances that are temporarily added to the group during an update.
  - These new instances will use the updated template.
  - Should be greater than or equal to the number of zones in distribution_policy_zones. If there are no zones specified in distribution_policy_zones, as mentioned previously, the Google-authored MIG module will automatically select all the zones in the region.
- replacement_method can be set to either of the following values:
  - RECREATE instance name is preserved by deleting the old instance and then creating a new one with the same name.
  - SUBSTITUTE will create new instances with new names.
    - Results in a faster upgrade of the MIG – instances are available sooner than using RECREATE.
    - Recommended.
- See Terraform Registry and Automatically apply VM configuration updates in a MIG | Compute Engine Documentation | Google Cloud
named_ports – set the port name and port number as appropriate
health_check – set the check type, port, and request_path as appropriate
Autoscaling can also be configured. See Autoscaling groups of instances | Compute Engine Documentation | Google Cloud

More MIG module options are available:

See the module source of variables.tf and main.tf
See the resource definition for google_compute_region_instance_group_manager

Changes to the MIG may result in a VMs needing to update. See the update_policy section of the MIG module definition (above) to configure the behavior when updating the MIG members.

Load Balancer

An Internal Load Balancer can make a MIG highly available to internal clients.

module "ilb_nginx" {
 source = "GoogleCloudPlatform/lb-internal/google"
 version = "~4.0"
 project = "my-gcp-project-1234"
 network = module.vpc_central.network_name
 subnetwork = module.vpc_central.subnets["us-central1/central-01-subnet-ilb"].name
 region = "us-central1"
 name = "ilb-nginx"
 ports = ["80"]
 source_tags = ["nginx"]
 target_tags = ["nginx"]

 backends = [{
  group = module.mig_nginx.instance_group
  description = ""
  failover = false
 }]

 health_check = {
  type = "http"
  check_interval_sec = 30
  healthy_threshold = 1
  timeout_sec = 10
  unhealthy_threshold = 5
  response = ""
  proxy_header = "NONE"
  port = 80
  request = ""
  request_path = "/"
  host = ""
  enable_log = false
  port_name = "web"
 }
}

Update the following with appropriate values:

Module name
project
network and subnetwork – the VPC and proxy-only subnet to use
region
name
ports – the port to listen on
source_tags and target_tags – network tags to use, should be present on the MIG members via the instance template.
backends – points to the MIG
health_check – should generally match the MIG healthcheck.

More options are available, : see the module source for variables.tf and main.tf

Be sure to consider any necessary firewall rules, especially if using network tags.

The Google-authored MIG module has create_before_destroy set to true, so a new MIG can replace an existing one as a backend behind the load balancer via a very minimal outage (less than 10 seconds). The new MIG will be created and added as a backend, and then the old MIG will be destroyed.

Day 2 operations

Changing size of MIG

If needed, adjust the target_size value of the MIG module to increase or decrease the number of instances. Adjustments take place right away.

If increasing the number of instances and a new template is in place and the update_policy is OPPORTUNISTIC, the new instances will be deployed using the new template.

Changing the zones to use for a MIG

Cannot be changed after creation. MIG must be destroyed and recreated.

Deleting MIG members

Deleting a MIG member automatically reduces the target number of instances for the MIG. The deleted member is not replaced.

Restarting MIG members

Do not manually restart a MIG instance from within the VM itself. This will cause its healthcheck to fail and the MIG will delete/recreate the VM using the template.

Use the RESTART/REPLACE button in the Cloud Console and choose the Restart option. This affects all instances in the group, but can be limited to only acting against a maximum number at a time (“Maximum unavailable instances”).

The “Replace” option within RESTART/REPLACE will delete and recreate instances using the current template.

Updating MIG instances to a new version

When a new version of the custom image is released, such as when it has been updated with new software, the MIG can be updated in a controlled fashion until all members are running the updated version, without any outage.

The MIG module update_policy setting is very important for this process to ensure there is no outage:

max_surge_fixed is the number of additional instances created in the MIG and verified healthy before the old ones are removed.
- Should be set to greater than or equal to the number of zones in distribution_policy_zones
max_unavailable_fixed should be set to null which equals 0: no live instances will be unavailable during the update.

The MIG module has options for two different instance templates in order to support performing a canary update where only a percentage of instances are upgraded to the new version:

instance_template_initial_version – template to use for initial deployment
instance_template_next_version – template to use for future canary update
next_version_percent – percentage of instances in the group (of target_size) that should use the canary update

Initially, both options may point to the same template and 0% is allocated to the “next” version.

If a load balancer is used, newly created instances that are verified healthy will automatically be selected to respond to client requests.

Canary update

To move a percentage of instances to the “next” version via a “canary” update:

Set the instance_template_next_version to point to an instance template which uses an updated custom image
Set the next_version_percent to an appropriate percentage of instances in the group that should use the “next” template.
Make sure update_policy has type set to PROACTIVE – this will cause the change to take effect right away.

When applied via terraform, all instances will be recreated (adhering to the update_policy) but a percentage of instances will be created using the “next” template.

After the canary update has been validated and all instances should be upgraded, see the steps below for a Regular update.

Regular update

To update all instances at once (adhering to the update_policy):

Set both the instance_template_initial_version and instance_template_next_version to point to an instance template which uses an updated custom image
Set the next_version_percent to 0.
Make sure update_policy has type set to PROACTIVE – this will cause the change to take effect right away.

When applied via terraform, all instances will be recreated (adhering to the update_policy).

Fool Minecraft on consoles into connecting to a remote private server

chou.se

cloud thots

Author Archives: chouse

GCP IPSec VPN to on-prem pfSense for Internet egress

Overview

On-prem

GCP

pfSense configuration

Automating Compute Engine Local SSD configuration in Windows

Overview

Performance

Risks

Usage with localssd_init

Conclusion

Configuring Private Google Access

Routing

Google Compute Engine

Google Cloud VMware Engine

On-premise

Testing

DNS

On-premise Active Directory clients

Compute Engine with Google Cloud DNS

Google Cloud VMware Engine

Additional API domains

Working with Google Cloud Managed Instance Groups

Terraform

Pre-requisites

Instance Templates

Managed Instance Group

Load Balancer

Day 2 operations

Changing size of MIG

Changing the zones to use for a MIG

Deleting MIG members

Restarting MIG members

Updating MIG instances to a new version

Canary update

Regular update

Fool Minecraft on consoles into connecting to a remote private server