Categories
Uncategorized

Ansible, Check_mode, and Async plays.

Have a task in your Ansible playbook that takes a long time to run, say a very large package installation or download across a slow network link? Depending on how long it takes, Ansible may think the command has failed and fail at that point in the playbook.

Async and Poll in a nutshell

The standard way to do this is to use the Ansible async: and poll: flags. The documentation isn’t really clear on this, so here’s how I think of their actions:

  • The async: B flag says “Run this command in the background for B seconds….”
  • The poll: P flag means “…and check the status every P seconds.”

Thus, a command like this:

- name: Download a big file
  shell:
        "wget -O /tmp/my_big_file.iso https://example.com/downloads/a_really_big_file.iso" 
      async: 120
      poll: 5

(Yes, I know there are more Ansible-friendly ways to download a file from a remote URL, but go along with the example…)

So on a good day when the download speeds are high, it might download and Ansible will continue on. On days when the Internet connection is slow, this tool will kick off the wget command, and every 5 seconds it will check if the command is done. When it completes, the playbook goes on. If the wget fails (network error, disk write, etc), or the command takes longer than 120 seconds, Ansible will fail this step as expected.

That’s all well and good. What’s the catch?

Check mode

One feature I love about Ansible is the --check mode option. A well written Ansible module will run in --check mode and do everything it can to validate that it will execute on the managed systems without making any changes to the remote system. This is key when you’re working on a playbook to maintain production systems.

Say you know that a configuration file needs a correction applied. You take the playbook you used to build the system originally, check it out of your source control to a new branch and modify the playbook.

But a cautious developer will check that the playbook runs as expected and doesn’t do anything else unexpected (reboot the server, stop services, fail mid-way through, etc.). To do this, run your playbook with the --check flag. The output looks identical to when it is run normally, but this time the lines that are changed: are actually not changes, rather telling you that this play would make a change.

Some commands are inherently un-safe for Ansible to generically run them, tasks such as shell:, command:, and others more “raw” command have this limitation. Ansible tries to make sure that a command run in check mode will make no changes whatsoever.

The check mode execution is handy when combined with the --diff command line flag, but that’s a story for another day.

Async and Check mode

So, using these together makes sense. I want to download a large file over an occasionally slow link but I do not want the download to run when I’m in check mode. You’d think something like the example code from above would be the correct combination:

- name: Download a big file
  shell:
        "wget -O /tmp/my_big_file.iso https://example.com/downloads/a_really_big_file.iso" 
      async: 120
      poll: 5

But when you run it with the --check flag, you get this error:

TASK [Download a big file] ***************************
task path: ./playbook.yml:71
fatal: [localhost]: FAILED! => {
    "changed": false,
    "msg": "check mode and async cannot be used on same task."
}

What to do?

I have to admit, I didn’t think up this workaround – a Mr. Alex Dzyoba documented it on his blog and I came across it here:

https://alex.dzyoba.com/blog/ansible-check-async/

What he documents is using the ansible_check_mode variable, then set the async: value to 0 if we’re in check mode, or 120 if we are not. Using our play above we would do this:

- name: Download a big file
  shell:
        "wget -O /tmp/my_big_file.iso https://example.com/downloads/a_really_big_file.iso" 
      async: "{{ ansible_check_mode | ternary(0,120) }}"
      poll: 5

What ends up happening is based on the ansible_check_mode variable:

  • If we are running in check mode (i.e. ansible_check_mode is true), then the value passed to async: is zero (the first value in the ternary() call, and Ansible doesn’t complain about the conflict.
  • When we are running in normal mode (i.e. ansible_check_mode is false), then the value passed to async: is the second value in the ternary() call, and the play will run for 120 seconds.

Why Ansible doesn’t automatically handle this is beyond me, but I’m glad to have come across Mr. Alex Dzyoba website and this method.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.