Ansible¶
Scope¶
Ansible automation: playbook design, idempotent task authoring, role-based organization, inventory management (static and dynamic), Ansible Vault for secrets, Molecule testing, variable precedence, handlers, and AWX/Ansible Automation Platform.
Agentless automation tool for configuration management, application deployment, and infrastructure provisioning. Ansible uses SSH (or WinRM for Windows) to execute tasks defined in YAML playbooks, requiring no agent installation on managed hosts.
Checklist¶
- [Critical] Use Ansible Vault to encrypt all sensitive data (passwords, API keys, certificates) in playbooks and variable files — never commit plaintext secrets
- [Critical] Ensure all tasks are idempotent — use purpose-built modules (e.g.,
ansible.builtin.copy,ansible.builtin.service) instead ofcommand/shellwherever possible - [Critical] Use roles to organize tasks into reusable, testable units with well-defined interfaces (
defaults/main.ymlfor overridable variables,vars/main.ymlfor internal variables) - [Critical] Maintain an inventory that reflects the actual environment — use dynamic inventory plugins for cloud environments (AWS EC2, Azure, GCP) to auto-discover hosts
- [Critical] Pin collection versions in
requirements.ymlfor reproducible automation (ansible-galaxy collection install -r requirements.yml) - [Recommended] Test roles with Molecule before deploying to production (lint, converge, verify, idempotence checks)
- [Recommended] Use
ansible-lintin CI to enforce playbook best practices and coding standards - [Recommended] Structure variables by precedence:
group_vars/for environment-level,host_vars/for host-specific, roledefaults/for overridable defaults - [Recommended] Use handlers for service restarts after configuration changes — handlers run once at the end of a play even if notified multiple times
- [Recommended] Tag tasks to enable selective execution (
ansible-playbook site.yml --tags "config,deploy") - [Recommended] Use
--check(dry-run) and--diffmodes to preview changes before applying - [Optional] Deploy AWX or Ansible Automation Platform for web-based UI, RBAC, scheduling, and credential management
- [Optional] Use callback plugins or ARA for playbook run reporting and history
- [Optional] Implement custom modules in Python for organization-specific automation that existing modules do not cover
Why This Matters¶
Ansible's agentless architecture is its defining advantage — there is nothing to install, patch, or maintain on managed hosts. SSH and Python (present on virtually every Linux system) are the only requirements. This makes Ansible uniquely practical for brownfield environments, network devices, legacy systems, and air-gapped infrastructure where installing agents is impractical or prohibited. Playbooks are YAML, which is readable by developers and operations staff alike, lowering the barrier to collaboration. Ansible's module library is extraordinarily broad: from cloud provisioning (AWS, Azure, GCP, VMware, OpenStack) to OS configuration, application deployment, networking devices (Cisco IOS, Juniper, Arista), and container orchestration. Unlike Terraform which focuses on infrastructure lifecycle, Ansible excels at the configuration and deployment layers — what happens inside the OS after the VM is provisioned. The two are complementary, not competing. For organizations already using SSH, Ansible integrates into existing access control and audit infrastructure with no additional attack surface.
Core Concepts¶
Inventory¶
Static inventory:
# inventory/production.ini
[webservers]
web1.example.com
web2.example.com
[databases]
db1.example.com ansible_port=2222
[production:children]
webservers
databases
[production:vars]
ansible_user=deploy
ntp_server=time.example.com
Dynamic inventory (AWS EC2):
# inventory/aws_ec2.yml
plugin: amazon.aws.aws_ec2
regions:
- us-east-1
- us-west-2
keyed_groups:
- key: tags.Environment
prefix: env
- key: instance_type
prefix: type
filters:
tag:ManagedBy: ansible
compose:
ansible_host: private_ip_address
# Verify dynamic inventory
ansible-inventory -i inventory/aws_ec2.yml --graph
Playbooks¶
# site.yml
---
- name: Configure web servers
hosts: webservers
become: true
vars:
http_port: 8080
max_clients: 200
pre_tasks:
- name: Update apt cache
ansible.builtin.apt:
update_cache: true
cache_valid_time: 3600
roles:
- common
- nginx
- app_deploy
post_tasks:
- name: Verify application health
ansible.builtin.uri:
url: "http://localhost:{{ http_port }}/health"
status_code: 200
register: health_check
retries: 5
delay: 10
until: health_check.status == 200
Roles¶
roles/nginx/
|-- defaults/main.yml # overridable default variables
|-- vars/main.yml # internal variables (higher precedence)
|-- tasks/main.yml # task definitions
|-- handlers/main.yml # handler definitions
|-- templates/nginx.conf.j2 # Jinja2 templates
|-- files/ssl/ # static files to copy
|-- meta/main.yml # role metadata and dependencies
|-- molecule/ # test scenarios
# roles/nginx/tasks/main.yml
---
- name: Install nginx
ansible.builtin.apt:
name: nginx
state: present
- name: Deploy nginx configuration
ansible.builtin.template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: '0644'
validate: nginx -t -c %s
notify: Restart nginx
- name: Ensure nginx is running and enabled
ansible.builtin.service:
name: nginx
state: started
enabled: true
# roles/nginx/handlers/main.yml
---
- name: Restart nginx
ansible.builtin.service:
name: nginx
state: restarted
Ansible Vault¶
# Encrypt a file
ansible-vault encrypt group_vars/production/secrets.yml
# Encrypt a string (inline in YAML)
ansible-vault encrypt_string 'SuperSecretPassword' --name 'db_password'
# Edit encrypted file
ansible-vault edit group_vars/production/secrets.yml
# Run playbook with vault password
ansible-playbook site.yml --ask-vault-pass
ansible-playbook site.yml --vault-password-file ~/.vault_pass
# Multiple vault IDs for different secrets
ansible-vault encrypt --vault-id prod@prompt secrets_prod.yml
ansible-vault encrypt --vault-id dev@prompt secrets_dev.yml
ansible-playbook site.yml --vault-id prod@prompt --vault-id dev@prompt
# group_vars/production/secrets.yml (encrypted)
db_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
61626364656667686970...
api_key: !vault |
$ANSIBLE_VAULT;1.1;AES256
71828394051627384950...
Collections¶
# requirements.yml
---
collections:
- name: amazon.aws
version: ">=7.0.0,<8.0.0"
- name: azure.azcollection
version: "2.3.0"
- name: community.vmware
version: "4.0.0"
- name: ansible.posix
version: "1.5.0"
roles:
- name: geerlingguy.docker
version: "6.1.0"
# Install collections and roles
ansible-galaxy collection install -r requirements.yml
ansible-galaxy role install -r requirements.yml
# Install from private automation hub
ansible-galaxy collection install myorg.internal_collection \
--server https://automationhub.example.com/
Key collections:
- amazon.aws — EC2, S3, RDS, EKS, IAM, CloudFormation
- azure.azcollection — VMs, AKS, Storage, networking
- google.cloud — GCE, GKE, Cloud SQL, networking
- community.vmware — vSphere VMs, networking, storage
- openstack.cloud — Nova, Neutron, Cinder, Keystone
- ansible.netcommon — base networking (cli_command, cli_config)
- cisco.ios, junipernetworks.junos, arista.eos — vendor-specific networking
Jinja2 Templating¶
{# templates/nginx.conf.j2 #}
worker_processes {{ ansible_processor_vcpus }};
events {
worker_connections {{ max_clients | default(1024) }};
}
http {
{% for server in virtual_hosts %}
server {
listen {{ http_port }};
server_name {{ server.name }};
{% if server.ssl is defined and server.ssl %}
listen 443 ssl;
ssl_certificate /etc/ssl/{{ server.name }}.crt;
ssl_certificate_key /etc/ssl/{{ server.name }}.key;
{% endif %}
location / {
proxy_pass http://{{ server.upstream }};
}
}
{% endfor %}
}
Idempotency¶
Modules are idempotent by default — running a playbook twice produces no changes on the second run. The command and shell modules are exceptions; guard them with creates or removes:
# BAD — runs every time
- name: Initialize database
ansible.builtin.command: /opt/app/init-db.sh
# GOOD — only runs if the file does not exist
- name: Initialize database
ansible.builtin.command: /opt/app/init-db.sh
args:
creates: /opt/app/.db_initialized
# GOOD — use a purpose-built module instead
- name: Ensure database exists
community.postgresql.postgresql_db:
name: myapp
state: present
Molecule (Testing Framework)¶
# roles/nginx/molecule/default/molecule.yml
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: instance
image: ubuntu:22.04
pre_build_image: true
provisioner:
name: ansible
verifier:
name: ansible # or testinfra
# roles/nginx/molecule/default/verify.yml
---
- name: Verify nginx installation
hosts: all
gather_facts: false
tasks:
- name: Check nginx is installed
ansible.builtin.package_facts:
manager: auto
- name: Assert nginx package is present
ansible.builtin.assert:
that: "'nginx' in ansible_facts.packages"
- name: Check nginx is listening on port 80
ansible.builtin.wait_for:
port: 80
timeout: 5
# Run full test sequence
molecule test
# Run individual phases
molecule create # create test instance
molecule converge # run the role
molecule verify # run verification tests
molecule idempotence # run again, expect no changes
molecule destroy # tear down
AWX / Ansible Automation Platform¶
AWX (upstream open-source) and Ansible Automation Platform (Red Hat commercial) provide:
- Web UI: Visual playbook management, job history, real-time output
- RBAC: Role-based access control per project, inventory, credential, and job template
- Credential management: Store SSH keys, cloud credentials, vault passwords securely
- Scheduling: Cron-like scheduling of playbook runs
- Workflow templates: Chain multiple job templates with conditional logic (on success/failure/always)
- API: REST API for integration with ticketing systems, ChatOps, self-service portals
- Execution environments: Container images with specific Ansible versions and collections
Dynamic Inventory¶
# AWS EC2
ansible-inventory -i inventory/aws_ec2.yml --list
# Azure
# inventory/azure_rm.yml
plugin: azure.azcollection.azure_rm
auth_source: auto
include_vm_resource_groups:
- myapp-prod-rg
keyed_groups:
- key: tags.Role
prefix: role
# GCP
# inventory/gcp_compute.yml
plugin: google.cloud.gcp_compute
projects:
- my-gcp-project
zones:
- us-central1-a
filters:
- labels.managed_by = ansible
Connection Plugins¶
# SSH (default for Linux)
[linux_servers]
web1.example.com
# WinRM (Windows)
[windows_servers]
win1.example.com
[windows_servers:vars]
ansible_connection=winrm
ansible_winrm_transport=ntlm
ansible_port=5986
ansible_winrm_server_cert_validation=ignore
# Local (run on Ansible controller)
- name: Run locally
ansible.builtin.command: echo "local task"
delegate_to: localhost
connection: local
# Network devices (network_cli)
[routers]
router1.example.com
[routers:vars]
ansible_connection=ansible.netcommon.network_cli
ansible_network_os=cisco.ios.ios
Ansible for Networking¶
- name: Configure Cisco IOS router
hosts: routers
gather_facts: false
tasks:
- name: Configure interface
cisco.ios.ios_l3_interfaces:
config:
- name: GigabitEthernet0/1
ipv4:
- address: 192.168.1.1/24
state: merged
- name: Configure OSPF
cisco.ios.ios_ospfv2:
config:
processes:
- process_id: 1
router_id: 1.1.1.1
areas:
- area_id: "0"
ranges:
- address: 192.168.0.0
netmask: 255.255.0.0
- name: Save configuration
cisco.ios.ios_config:
save_when: modified
Ansible vs Terraform¶
| Aspect | Ansible | Terraform |
|---|---|---|
| Primary use | Configuration management, app deployment | Infrastructure provisioning |
| Approach | Procedural (ordered task list) | Declarative (desired state) |
| State | Stateless (no state file) | Stateful (state file tracks resources) |
| Agent | Agentless (SSH/WinRM) | Agentless (API calls) |
| Strengths | OS config, app deploy, networking, runbooks | Cloud resource lifecycle management |
| Idempotency | Module-dependent (most are idempotent) | Built-in (plan/apply model) |
| Scope | Very broad (cloud, OS, network, apps) | Infrastructure-focused |
| Complementary? | Yes — configure what Terraform provisions | Yes — provision what Ansible configures |
Typical combined workflow: Terraform creates VMs, VPCs, load balancers. Ansible configures the OS, deploys applications, manages users, and handles ongoing configuration drift on those VMs.
Common Decisions (ADR Triggers)¶
- Ansible vs Terraform vs both: Ansible excels at OS-level configuration and application deployment. Terraform excels at cloud infrastructure lifecycle. Most organizations benefit from both. Document the boundary: what Terraform manages vs what Ansible manages.
- Static vs dynamic inventory: Static inventories are simple but drift from reality. Dynamic inventory plugins auto-discover hosts but require cloud API access and proper tagging. Choose based on environment volatility.
- AWX/AAP vs CLI-only: AWX adds RBAC, scheduling, audit trails, and self-service. CLI-only is simpler and sufficient for small teams. Evaluate based on team size, compliance requirements, and self-service needs.
- Role sourcing strategy: Decide between writing roles from scratch, using Ansible Galaxy community roles, or maintaining a private automation hub. Community roles save time but may not meet security/compliance standards.
- Vault strategy: Single vault password for simplicity or multiple vault IDs per environment for security isolation. Consider who needs access to which secrets and how passwords are distributed.
- Testing approach: Molecule provides comprehensive role testing but adds CI time. Decide which roles require full Molecule testing (critical infrastructure roles) versus lighter-weight linting only.
Reference Architectures¶
Full-Stack Application Deployment¶
Playbook: site.yml
|-- Play 1: Common (all hosts)
| |-- Role: common (NTP, SSH hardening, users, packages)
| |-- Role: monitoring_agent (Prometheus node_exporter)
|
|-- Play 2: Web Tier (webservers)
| |-- Role: nginx (reverse proxy, TLS termination)
| |-- Role: app_deploy (git clone, pip install, systemd service)
| |-- Handler: Restart nginx, Restart app
|
|-- Play 3: Database Tier (databases)
| |-- Role: postgresql (install, pg_hba.conf, replication)
| |-- Role: backup (pg_dump cron, S3 upload)
|
|-- Serial deployment: web tier uses serial: 1 for rolling updates
|-- Vault: database passwords, TLS keys encrypted
Network Automation¶
Inventory (dynamic + static)
|-- Routers (Cisco IOS) — ansible_network_os: cisco.ios.ios
|-- Switches (Arista EOS) — ansible_network_os: arista.eos.eos
|-- Firewalls (Juniper) — ansible_network_os: junipernetworks.junos.junos
|
Playbooks:
|-- network_baseline.yml
| |-- NTP, syslog, SNMP, AAA configuration
| |-- Banner, password policies
|
|-- network_changes.yml
| |-- VLAN provisioning
| |-- ACL updates
| |-- OSPF/BGP neighbor configuration
| |-- Pre-change: backup running config
| |-- Post-change: verify connectivity, save config
|
AWX Workflow:
Backup --> Apply Changes --> Verify --> (on failure) Rollback
Hybrid Cloud Configuration (Terraform + Ansible)¶
Phase 1: Terraform provisions infrastructure
|-- VPC, subnets, security groups
|-- EC2 instances, RDS, ALB
|-- Outputs: instance IPs, RDS endpoint
|-- Generates Ansible dynamic inventory tags
Phase 2: Ansible configures instances
|-- Dynamic inventory discovers tagged EC2 instances
|-- Role: base_os (hardening, patching, monitoring agent)
|-- Role: app_server (runtime, application code, environment variables)
|-- Role: log_shipping (Fluentd/Filebeat to central logging)
|-- Variables: RDS endpoint from Terraform output or SSM Parameter Store
Phase 3: Ansible handles day-2 operations
|-- Scheduled: compliance checks, patch reporting
|-- On-demand: application deployments, config updates
|-- Runbooks: incident response, failover procedures
See Also¶
general/iac-planning.md-- infrastructure as code planning patternsproviders/hashicorp/terraform.md-- Terraform for infrastructure provisioning (complementary to Ansible)providers/hashicorp/packer.md-- Packer with Ansible provisioner for image builds