Terraform Workspace and Project Best Practices
The way that you approach your project and workspace organization strategy largely depends on your organization's needs, but there are considerations that all Terraform Cloud and Terraform Enterprise users should keep in mind. This article presents the best practices to structure your configuration and Terraform Cloud organization.
Configuration structure
How you structure your Terraform configuration affects your workspace design and scope. Some considerations, such as the file structure of your configuration and version-control repositories, will depend on your team's needs and preferences. There are best practices that we recommend to all Terraform users that help teams develop Terraform configuration more efficiently.
Avoid duplicate configuration
One common source of technical debt is repetitive Terraform configuration. Teams often duplicate configuration to provision similar resources. If the configuration needs to change, they must make changes in many places rather than just once.
Terraform supports modules, which let you define reusable collections of resources that you can deploy together. By encapsulating your configuration in a module, you only need to update it once for all of the users of the module to receive the changes. Members of your organization can then source the modules from your organization's private module registry. You can also version modules to increase stability. In this case, if a developer pins their configuration to a specific version of a module, they can be confident that the module builds the same infrastructure every time.
Determine repository Scope
The structure of your Terraform configuration and how you store it in version control is a part of your workspace management strategy. When creating a workspace connected to a version control repository, you have the option to set the working directory. The working directory defaults to the root of the repository, but you can instead choose a subdirectory if you manage multiple environments or collections of resources in the same repository. This lets you choose to either create one repository per workspace, or to group the configuration for multiple workspaces into one repository. Both approaches are valid, so we recommend that you choose an approach that your organization is most comfortable with.
Evaluate module convenience vs flexibility
Using modules helps reduce duplication in your Terraform configuration, but scoping your modules is a tradeoff of convenience and flexibility. To help reduce duplication in your Terraform code, you can design your module to be reusable by configuration input variables. However, designing your modules with a smaller scope gives greater flexibility as teams can easily adopt newer features within a new module version without it affecting other capabilities or resources of another.
Limit child modules
Nesting modules can complicate your configuration's dependency graph and significantly increase the number of references it must track. While the impact of nested modules depends on the modules themselves, we recommend that you limit module nesting to a maximum depth of two, where one module calls another module.
Workspace structure
A Terraform Cloud workspace manages a single state file and the lifecycle of its resources. It is the smallest collection of Terraform Cloud-managed infrastructure. Any operation on a resource can potentially affect other resources managed in the same state file, so it is best to keep the potential blast radius of your operations small. To do so, manage resources in separate workspaces when possible, grouping together only necessary and logically-related resources. For example, even though your application may require both compute resources and a database, these resources can operate independently and should be in their own workspaces. Scoping your configuration and planning your workspace strategy early in your adoption of Terraform Cloud and Terraform Enterprise will simplify your operations and make them safer.
Group by volatility
Volatility refers to the rate of change of the resources in a workspace. Infrastructure such as databases, VPCs, and subnets change much less frequently than infrastructure such as your web servers. By exposing your long-living infrastructure to unnecessary volatility, you introduce more opportunities for accidental changes. When planning your workspace organization, group resources by volatility.
The above example groups together tightly-coupled resources like networking, security, and identity. Compute, storage, and databases have separate workspaces, since they change at different frequencies. You may scale compute instances multiple times a day, but your database instances probably change far less frequently. By grouping these parts of your infrastructure into separate workspaces, you decouple unrelated resources and reduce the risk of unexpected changes.
Determine stateful vs stateless infrastructure
By managing stateful resources independently of stateless ones, such as separating databases from compute instances, you limit the blast radius of operations that cause the resource recreation, and make the workspace easier to refactor in the future. Grouping workspaces that manage stateful resources can help protect against accidental data loss. Stateless resources are ones that you cannot delete and recreate because they persist data, such as databases and object storage.
Consider the workspace structure in the Volatility section. You could potentially manage filestore and database resources together, as they are both stateful resources. Your compute resources are stateless and should still have a separate workspace.
Separate privileges and responsibilities
A best practice is to split up workspaces based on team responsibilities and required privileges. For example, consider an application that requires separate developer and production environments, each with special networking and application infrastructure. One approach is to create four different workspaces, two for the developer environment and two for production.
From the privilege standpoint, this means that only the networking team has permissions to manage the resources in the networking workspaces, and others cannot modify their configuration. If the workspace scope is too large, a user might need more permissions than appropriate in order to access the workspace.
Splitting your workspaces by team also helps limit the responsibility per workspace and allows teams to maintain distinct areas of ownership. If you need to reference attributes of resources managed in other workspaces, you can share the outputs using the tfe_outputs data source. By limiting the scope of each workspace and sharing just the required outputs with others, you reduce the risk of leaking potentially sensitive information in a workspace’s state. To share outputs from a workspace, you must explicitly enable remote state sharing in the workspace settings.
Avoid large Terraform plans and applies
Terraform Cloud and Terraform Enterprise execute workloads using agents. Every time an agent refreshes a workspace’s state, it builds a dependency graph of the resources to determine how to sequence operations in the workspace. As the number of resources your workspace manages grows, these graphs become larger and more complex. As these graphs grow, they require more worker RAM to build them. If your agent's performance degrades or workloads take longer to complete, we suggest exploring ways to split up the workspace to reduce the size of the dependency graph.
Determine workspace concurrency vs Terraform parallelism
Concurrency refers to the number of plan and apply operations Terraform Cloud or Terraform Enterprise can run simultaneously. In Terraform Cloud, your subscription limits the maximum concurrency. Terraform Enterprise lets you configure the concurrency, but defaults to 10 concurrent runs. As you increase concurrency, the amount of memory your Terraform Enterprise installation requires increases as well. Refer to the Capacity and Performance documentation for more information.
Parallelism refers to the number of tasks the Terraform CLI performs simultaneously in a single workload. By default, Terraform performs a maximum of 10 operations in parallel. When running a terraform apply
command, Terraform refreshes each resource in the state file and compares to the remote object. Every resource refresh, creation, update, or destruction is an individual operation. If your workload creates 11 resources, Terraform starts by creating the first 10 resources in its dependency graph, and will begin creating the 11th once it finishes creating one of the first 10 resources.
You can increase the parallelism of the Terraform CLI, but this increases a run's CPU usage. We recommend that you instead break down large Terraform configurations into smaller ones with fewer resources when possible. Long-running Terraform workloads are an early sign of a bloated workspace scope.
Projects
Projects let you group and scope access to your workspaces. You can group related workspaces into projects and give teams more permissive access to individual projects rather than granting them permissions to the entire organization.
Projects offer several advantages to help you further develop your workspace strategy:
- Increased workspace organization: You can add related workspaces to projects to simplify and organize a team's workspace view.
- Simplified management: You can create project-level permissions and variable sets that apply to all current and future workspaces in the project. For example, you can create a project variable set containing your cloud provider credentials and all workspaces in the project will have access to them.
- Reduced risk with centralized control: You can scope project permissions to only grant teams administrator access to the projects and workspaces they need.
- Efficiency through self-service: Granting users project-level administrator permissions lets them create and manage workspaces.
When using projects, we recommend the following:
- Automate the creation of projects, variable sets, and teams together using the TFE provider.
- Restrict the number of project administrators similar to maintain least privilege.
- Designate a “landing zone” project that will contain workspaces used to create all other projects, teams, and workspaces. This lets you have a variable set that includes the organization token, which the TFE provider can use to create other resources in your organization. You can also create a Sentinel policy to prevent users in other projects from accessing the organization token.
Finally, decide on the logical boundaries for your projects. Some considerations to keep in mind include:
- Provider boundaries: For smaller organizations, creating one project per cloud account may make it easier to manage access. Projects can use dynamic credentials by configuring a project variable set to avoid hard-coding long-lived static credentials.
- Least privilege: You can create teams and grant them access to projects with workspaces of similar levels of criticality. For example, a production networking workspace should be in a separate project from a development compute workspace.
- Variable set usage: Project-wide variable sets let you configure and reuse values such as default tags for cost-codes, owners, and support contacts.
- Practitioner efficiency: Consider if it makes sense for a practitioner to need to visit multiple projects to complete a deployment.
Next Steps
This article introduces some considerations to keep in mind as your organization matures their workspace and project usage. Being deliberate about how you use these to organize your infrastructure will ensure smoother and safer operations. Terraform Cloud provides a place to try these concepts hands-on, and you can get started for free.
The HashiCorp Well-Architected Framework provides more in-depth information on Terraform adoption.