Using Puppet to perform “yum update”

Puppet is a great tool for configuration management.  Chef, slightly younger than Puppet, has matured to be a very good option when choosing a configuration management tool.  In this particular case, the focus is on Puppet.

The Puppet community frowns heavily on using puppet as the tool for global package management.  At least, that’s the perception. I have personally witnessed comments like “puppet is not the right tool for this” and “there are better tools out there for it,” (a good example is found in the comments on Stack Overflow http://stackoverflow.com/questions/18087104/yum-updatesand-work-arounds-using-puppet), but I believe those who respond that way just don’t get it.. they don’t see the bigger picture.

Puppet?

Puppet is a configuration management tool for managing your system configurations.  Why is it that those configurations stop at the individual package and related configuration files only? The configuration of the host is arguably the entire configuration of the host.  Certainly, one can go very far overboard and maintain the specifics about ever one of the 800+ packages installed via custom puppet modules, yet we all must admit that would be ludicrous. However, one of the primary goals with a configuration tool like puppet is to create completely reproducible systems for disaster recovery, and there has to be a more efficient way than writing a module for every package on the system.

Unix/Linux is built upon the foundation of layering; break a large job down to its most simplistic parts, build tools to handle the individual pieces, and then layer on another tool to manage the individual tools, and so on.

Managing updates via yum (or apt) for your systems via puppet is not a bad ideadespite what others seem to say. For it to be successful, it does require you to understand the risks you are taking, what is acceptable, and what your organizations limits are.

Maintain Your Own Repositories

This cannot be highlighted enough.  Mirror all of the repositories that are used to make systems at your organization and your clients must be configured to use your mirrors only.  The reason for this is simple: you want control of your systems, and this includes the software that resides on them.  By mirroring all of the repositories that are used to build systems at your organization, you control when the get updated and you have direct access to a ‘frozen’ version of those repositories.  “Frozen?”  Yes! Repositories are, in some cases, made up of thousands of packages, all maintained by different people, all being updated at different times.  In order to maintain the control of your systems, you have to control the updates.  You don’t want your repository being ever so slightly different with each system build as that goes against what you are trying to accomplish with Puppet in the first place: repeatable processes.

The mirrors should be refreshed by your determination. You maintain control.  Therefore, you sync the repositories to your local mirror on your schedule.  To support this, I built a simple bash script to perform a “reposync” (from the yum-utils package) of all of the appropriate repositories followed by a “createrepo” (from the createrepo package):

...
# Let's start in the top level of our repository tree
cd /var/www/html/repo

# Grab the EPEL tree
reposync --arch=x86_64 --newest-only --download_path=EPEL/6/ --repoid=epel --norepopath --delete

# Make sure our Redhat repo is up to date, both for the Server and the optional
# trees.  There's no need to keep syncing the OS repo as that never changes.
reposync --arch=x86_64 --newest-only --download_path=RHEL/6/Server-Optional/ --repoid=rhel-6-server-optional-rpms --norepopath --delete
reposync --arch=x86_64 --newest-only --download_path=RHEL/6/Server/ --repoid=rhel-6-server-rpms --norepopath --delete

...

# Rebuild the repository data for EPEL.
cd /var/www/html/repo/EPEL/6/
createrepo -d .

# Rebuild the repository data for RHEL.
cd /var/www/html/repo/RHEL/6/Server-Optional/
createrepo -d .
cd /var/www/html/repo/RHEL/6/Server/
createrepo -d .
...

Using your own mirrors will also drastically decrease the amount of time it takes to build or update a machine because you are pulling all of the packages over your local LAN (high speed) vs the WAN (low speed).

Do not Automatically Reboot

This, too, cannot be highlighted enough.  Rebooting a machine must be done with someone at the helm, with someone watching and waiting to react if something goes horribly wrong.  The last thing you want is all of your servers to reboot after updates where none of them reboot properly, and you don’t discover it until 8am the following morning.  If that is a risk you are willing to take, please be sure to also have an updated version of your resume handy.

Build a SANE schedule

No one wants to update their systems constantly, and taking downtime for each update is not feasible for a successful business.  If you do not have formal maintenance windows, create informal ones.  Pick a point in time (once a month, every quarter, etc) that is workable for your organization and team (if you are a team of one, balance what you are willing to manage).  This is your “go-live” date.

Now work backwards to build your T-minus dates:

  • How long do you need to test the updates (do you even test them)?  Let’s say 3 days of testing, including just running a machine with the updates on it to make sure it doesn’t blow up.
  • How long do you need to update your mirrored versions of the repositories? between 1/2 a day to a full day might be about accurate, including some minor testing.

So, you are looking at close to about 4 days from start to finish.  This may be excessive for your organization, or it may need to be longer.  However you approach it, keep a sprinkle of sanity in the mix.

Implement in Puppet

The implementation is actually quite simple, really.  Create a basic class module, let’s call it yum::update, and set the criteria accordingly.  In the example below, the updates will run on the 6th of every month between 11:00am and 11:59am

class yum::update
{
# Run a yum update on the 6th of every month between 11:00am and 11:59am.
# Notes: A longer timout is required for this particular run,
#        The time check can be overridden if a specific file exists in /var/tmp
exec
{
"monthly-yum-update":
command => "yum clean all; yum -q -y update --exclude cvs; rm -rf /var/tmp/forceyum",
timeout => 1800,
onlyif => "/usr/bin/test `/bin/date +%d` -eq 06 && test `/bin/date +%H` -eq 11 || test -e /var/tmp/forceyum",
}
}

It probably makes more sense to create a script which does all of the date/time comparison and override functionality for you, manage that script through puppet, and reference that script here in the “command” line above.  Consider that the next step of evolution.  However you choose to proceed is up to you.

Now, you can include that class in your manifests/nodes.pp per host, globally, or however you wish, as shown below.

node 'web-dev.my.domain.com' inherits default
{
include postfix
include mysql
include syslog::internal
include openldap::client
include names
include pam::client
include wordpress::dev
include ssh
include httpd
include db::backup
include yum::update
}

Control, Efficiency, and Security

Updating a system is important to maintaining its security, but it is a careful balance to maintain your system’s availability and integrity for your end users & customers.  Automation is king here.  Understand, though, that it also makes your puppet server that much more critical, and making a mistake here can have a much wider impact.  Hence, I stress the testing period and having someone on-hand for the final phases.  Once the level of comfort grows with the process, you may want to implement more automation based upon your individual environment (eg: rebooting development machines might be fine at your organization).