Running infrastructure at scale almost always guarantees dizzying complexity and anxiety-inducing pressure to maintain systems in a production environment. This is further exacerbated when multiple delivery teams require slight variations of the same infrastructure components, across several cloud providers, each with a different set of observability requirements. Gradually, production environments become large, unmanageable, difficult to change, and perhaps resembling the figure below. But there must be a better way, ensuring infrastructure remains fluid, auditable, and where changes can be managed centrally to promote both visibility and shared responsibility.

terraform architecture
A small-medium sized infrastructure diagram by Holori

In a cloud environment, infrastructure has typically been managed manually through each cloud provider’s respective web platform, each presenting their version of the same compute resources uniquely; expecting users to internalise the platform’s idiosyncrasies required to provide the resources they need. Steadily, this has resulted in the development of professional roles dedicated to managing these resources, taking control away from the software developers making use of them. With this we have, as an industry, creating a further divide between those developing applications that run in production and those that ensure they continue to operate, thus establishing an unbalanced competency across the organisation.

Due to this it is fair to assert that most software developers are unfamiliar with how the production software they create is performing, or even worse when it becomes unavailable.

At Scopify, we build infrastructure checks that are both scalable and reliable. Through our intuitive web application, both developers and operations staff alike can create monitoring resources to ensure applications remain operational and perform as expected. However, as production systems grow and the demand for more monitoring checks becomes apparent, keeping track of what monitors exist and which are still to be realised becomes difficult.

In this article we’ll explore the complexities of monitoring larger infrastructures and how Scopify solves these problems with simple automation tools.

Case Study

Before we continue let’s first introduce a scenario outlining the problem we’re attempting to solve. Imagine for a moment that we’ve been tasked to create uptime monitors for each of our 50 production services, with each one requiring a different tolerance before alerts are sent out to the respective delivery teams.

In the past we may have created each one of these monitors individually through a web UI, making sure that we input the correct value for each given field. When we’re done, we’d pray that we had not made a mistake.

create a new test
An example uptime test creation form by Scopify

Now imagine management has informed us that service downtime is not enacted quick enough and that we need to decrease the alert tolerances across all the current monitors. Additionally, the support team needs to be made aware of any outages so they can better handle customer issues. This is where the terror sets in as we’re forced to click through every monitor and edit each one manually.

This problem never eases and as more changes are requested, we’re forced to make manual edits. Here at Scopify we identify with this issue and to solve it we have turned to Terraform.

What is Terraform

At this point you may be asking yourself, what is Terraform, and how does it solve the problems discussed above. The Terraform website has a succinct answer to this question and describes it as follows:

“Terraform is an infrastructure as code tool that lets you define both cloud and on-prem resources in human-readable configuration files that you can version, reuse, and share.”

https://www.hashicorp.com/products/terraform

Alright, but what does this mean and how does it fit into current practices? At its core Terraform is a command-line application for managing and provisioning infrastructure resources by promoting a DevOps methodology and transparent workflow. This implies that it should fit into any organisation already employing DevOps best practices – including version control, knowledge sharing, and automation (CI/CD). Terraform achieves this by requiring authors to create configuration files written in a proprietary language, HashiCorp Configuration Language, acting as the source of truth and describing the infrastructure holistically. A minimal example of such a file is as follows.

resource "statuscake_contact_group" "operations_team" {
  name = "Operations Team"

  email_addresses = [
    "[email protected]",
    "[email protected]",
  ]
}

Adding these configuration files to version control should be considered the norm and is the primary method to enable auditability and change management across the infrastructure. Smaller chunks of configuration can even be reused in the form of modules – generalised units of infrastructure with configurable variables. Modules can increase the manageability of infrastructure, and expedite the release of similar components with slight variation. For example, it may be necessary to enforce that all staff on the operations team are included in the list of alert recipients for each configured Scopify contact group but still allow for additional recipients. This can easily be achieved with a single module.

# ./modules/statuscake/contact_group/main.tf
variable "name" {
  type = string
}

variable "email_addresses" {
  type    = list(string)
  default = []
}

resource "statuscake_contact_group" "contact_group" {
  name = var.name

  email_addresses = concat([
    "[email protected]",
  ], var.email_addresses)
}

output "id" {
  value = statuscake_contact_group.contact_group.id
}

# ./main.tf
module "operations_team" {
  source = "./modules/statuscake/contact_group"
  name   = "Operations Team"
}

module "development_team" {
  source = "./modules/statuscake/contact_group"
  name   = "Development Team"

  email_addresses = [
    "[email protected]",
  ]
}

Given a collection of configuration files Terraform can be instructed to enact the desired state of the infrastructure and provision (or update or delete) the described resources. We will not dive into how Terraform achieves this as there are many other articles available, including Terraform’s own website.

To find out more about using the Terraform command-line application, take a look at the documentation on the Terraform website.

Creating Monitoring Checks that Scale

Now that we have a little background on the Terraform tool, let’s focus on how it can be used to build a better application monitoring suite. At Scopify we have developed our own Terraform provider that can be used to build out the different types of monitors supported by the platform using the same declarative language introduced above. If we were to use the same scenario from before we can create our monitors in such a way that encourages future change and visibility across the delivery team.

To begin we first need to instruct Terraform of the providers we intend to use and supply any required configuration values. In the case of the Scopify provider, we need only supply our API key. Your API Key can be found in the account settings when logged in to the Scopify App.

provider "statuscake" {
  api_token = "my-api-token"
}

After this we declare two contact groups, making use of the module we created in a previous section.

module "operations_team" {
  source = "./modules/statuscake/contact_group"
  name   = "Operations Team"
}

module "development_team" {
  source = "./modules/statuscake/contact_group"
  name   = "Development Team"

  email_addresses = [
    "[email protected]",
  ]
}

These modules will be used later to reference the IDs of the provisioned contact groups when creating our monitors. Next comes the part of the configuration that defines the monitors we wish to create. It may come across as rather verbose but will prove useful when changes are required.

variable "monitors" {
  type = map(object({
    trigger_rate = number
    address      = string
  }))

  default = {
    "monitor1" = {
      trigger_rate = 2
      address      = "https://www.example.com"
    },
    "monitor2" = {
      trigger_rate = 5
      address      = "https://www.example.co.uk"
    },
  }
}

For brevity, we’ve only defined 2 of our monitors, but this variable could be extended to include all the monitors necessary to cover our infrastructure. Briefly, we have defined a variable of type map. The keys of the map are expected to be of type string, representing the name of the monitor, and the values of the type object. This object may only have two properties, trigger_rate (number) and address (string). No other fields or value types will be accepted.

Now that we have defined the monitors we intend to create, we need to declare the “shape” of our uptime monitoring resource. If you’re familiar with procedural programming languages, you will undoubtedly be comfortable using loops to perform similar operations over a collection of values. Terraform has its own construct to work with loops but they work in much the same way. We’ll implement a loop to declare all our monitors using the same resource template.

resource "statuscake_uptime_check" "uptime_check" {
  for_each = var.monitors

  name           = each.key
  check_interval = 30
  trigger_rate   = each.value.trigger_rate

  # reference the id of the contact groups from above
  contact_groups = [
    module.operations_team.id,
    module.development_test.id,
  ]

  http_check {
    follow_redirects = true
    validate_ssl     = true

    status_codes = [
      "202",
      "404",
      "405",
    ]
  }

  monitored_resource {
    address = each.value.address
  }
}

Running this configuration file through the Terraform command line will create each of the monitors we described.

Finally, responding to the change request from earlier is as simple as updating the values in the monitor variable and including an additional contact group configuration. Looking at the diff we can see a minimal number of changes.

module "development_team" {
  ]
}

+ module "support_team" {
+   source = "./modules/statuscake/contact_group"
+   name   = "Development Team"
+
+   email_addresses = [
+     "[email protected]",
+     "[email protected]",
+     "[email protected]",
+   ]
+ }

variable "monitors" {
  
  default {
    "monitor1" = {
-     trigger_rate = 2
+     trigger_rate = 1
      address = "https://www.example.com"
    },
    "monitor2" = {
-     trigger_rate = 5
+     trigger_rate = 2
      address = "https://www.example.co.uk"
    },
  }

resource "statuscake_uptime_check" "uptime_check" {
  contact_groups = [
    module.operations_team.id,
    module.development_team.id,
+   module.support_team.id,
  ]

  http_check {

Again, running this through the Terraform command line will align the new desired state with the Scopify platform.

Conclusion

Terraform can truly transform how infrastructure and applications are deployed within a delivery team. It has a clear advantage over using web-based applications and even improves upon the user experience of APIs by allowing simple automation, a declarative configuration language, and modulation of code for many different providers which can be used to create an end-to-end solution.



Source link

Related Post

Leave a Comment