Deployment of Scripts and Static Content with Git, Rsync, and Simple Unix Tools

Web sites and web applications built on a mix of PHP and static content are commonplace in small non-technical organizations and non-profits, but many struggle with maintenance and indeed the whole process of development given their limited resources. The primary focus is on features and functionality, and it is frequently the case that automation of deployment is an afterthought, where it happens at all. You'll see people copying files into place manually, or cloning a Git repository as the webroot (and then forgetting to restrict access to /.git), and so forth. Mistakes and errors and security holes are the outcome. Thus when working with these smaller organizations, and especially non-profits, it is a good idea to provide software and devops solutions that are simple, well-documented, and easily understood. It is worth sacrificing features and the use of the latest and greatest technology in favor of these goals, as the system will most likely be used and maintained by junior or mid-level developers in the future. Complex setups will break or be abandoned because they are hard to understand, and the cost of developers who can maintain or fix such a system is prohibitive.

What I'll outline here is a fairly simple devops setup for deployment of a PHP or static web application to an established server using a mix of Git, rsync, and a few simple and standard Unix tools. I've used variants of this approach with some of the non-profits I've worked with, and it has worked well for groups that rely upon junior developers to maintain and extend their websites. Anyone with a basic understanding of Git and the Unix command line can quickly pick up and understand this system, given clear documentation.

Add a Deploy Script to the Repository

Create a deploy/deploy.sh bash script in the repository. The purpose of this deployment script is to (a) first copy environment-specific file or files into place, and then (b) run rsync to update the code already present in a specified destination directory. The script accepts an environment name and a destination directory as arguments, so it will run as follows, for example:

deploy/deploy.sh production /var/www/html

The code in the repository must be set up such that all of the environment-specific configuration files are named by environment. So if the web application code expects to find config/main.json, then create config/main.local.json, config/main.production.json, and so forth. Since rsync is being used, there must also be a configuration file to specify exclusions in both source and destination. This prevents the copying of unwanted files, or the deletion of files not present in the repository, such as user uploads and the like. The following is an example of the script and rsync filter file:

deploy/deploy.sh

#!/bin/bash
#
# Deploy the codebase.
#
# This must be run by the deploy user, correctly set up with limited sudo
# access, and keys for GitHub access. It takes the following actions:
#
# - Copy environment-specific files into place.
# - Rsync local webroot with repository webroot.
#

set -o nounset
set -o errexit

DIR="$( cd "$( dirname "$0" )" && pwd)"
REPO="${DIR}/.."

function usage () {
  cat <<EOF
Deploy the current checked out site code to the specified destination.

${0} <local|production> <destination>
e.g. ${0} production /var/www/html
EOF
  exit 1
}

# --------------------------------------------------------------------------
# Check arguments.
# --------------------------------------------------------------------------

if [ "${#}" -ne "2" ]; then
  usage
fi

ENV=""
WEBROOT="${2}"
# Important for exclusions; prevent uploaded files, etc, from being deleted,
# and prevent unwanted files from being copied to the destination.
FILTER_FILE="${DIR}/rsync-filter"

# Constrain the environment argument to allowable values.
case "${1}" in
  local|production)
    ENV="${1}"
    ;;

  *)
    usage
    ;;
esac

# --------------------------------------------------------------------------
# Set up configuration for the environment.
# --------------------------------------------------------------------------

# Sort out configuration by environment. This is application-specific,
# and the following is an example.
cp -f \
  "${REPO}/config/main.${ENV}.json" \
  "${REPO}/config/main.json"

# --------------------------------------------------------------------------
# Copy over the deployment.
# --------------------------------------------------------------------------

# Make sure that dotfiles are picked up.
GLOBIGNORE=.:..

# Files should be owned by the www-data user, but we need them owned by the
# deploy user for now.
#
# It is expected that the deploy user running this has very limited sudo
# permissions allowing it to run this command.
sudo chown -R deploy:www-data "${WEBROOT}"

# This will delete unprotected destination files that are not in the source
# directory and are not given a protect rule in the filter file. Beware!
#
# Destination folders may have differences such as updated WordPress files,
# user uploads, content not user version control, etc.
#
# Note that the source directory must end with "/" - otherwise rsync copies over
# the directory not its contents.
rsync \
  --omit-dir-times \
  --quiet \
  --chmod=ug=rwX \
  --perms -axv \
  --delete \
  --filter="merge ${FILTER_FILE}" \
  "${REPO}/" \
  "${WEBROOT}"

# Files should be owned by the www-data user.
#
# It is expected that the deploy user running this has very limited sudo
# permissions allowing it to run this.
sudo chown -R www-data:www-data "${WEBROOT}"

deploy/rsync-filter

# Don't copy over Git-related files.
exclude /.git
exclude /.gitignore

# Don't copy over deployment code.
exclude deploy/**

# Preserve files that don't exist in the repository, but should not be
# deleted by deployment.
protect /cache/**
protect /user-uploads/**

Add a Deploy User to the Server

To allow developers to trigger a deployment, the server is provisioned with a deploy user. Developer public keys are added to /home/deploy/.ssh/authorized_keys to allow access, and to control access on a case by case basis. The deploy user is given an SSH key pair allowing it read-only access to the Git repository for the web application hosted on the server. The deploy user is locked down with few permissions, but provided with very limited sudo powers that allow it to update ownership and permissions on the webroot, and thus update the content there.

The deploy user is provided with a thin wrapper script that clones the Git repository, checks out a specified branch, and runs the deployment script provided in that repository. In this way the bulk of the deployment automation is kept under version control.

The following is a provisioning script to create the deploy user and its tools in an Ubuntu 14.04 server. It assumes the use of Apache or Nginx and the standard location for the webroot at /var/www/html. The only thing is doesn't accomplish is (a) adding a suitable key pair to /home/deploy/.ssh so that the deploy user can access the Git repository, and (b) appending developer public keys to /home/deploy/.ssh/authorized_keys so that they can log in.

#!/bin/bash
#
# Intended for use on an Ubuntu 14.04 server.
#
# Add the deploy user and necessary permissions.
#
# After this script has run it is necessary to set the deploy user's key pair
# to one that will allow access to the web application Git repository.
#
# /home/deploy/.ssh/id_rsa
# /home/deploy/.ssh/id_rsa.pub
#
# It is also necessary to append developer public keys to:
#
# /home/deploy/.ssh/authorized_keys
#
# to enable them to log in as the deploy user and run deployments.
#

set -o nounset
set -o errexit

NAME="deploy"
DEPLOY_SCRIPT="/home/${NAME}/deploy.sh"

# ------------------------------------------------------------------------
# The deploy user will need git and rsync.
# ------------------------------------------------------------------------

apt-get install -y git rsync

# ------------------------------------------------------------------------
# Add the deploy user if not already present.
# ------------------------------------------------------------------------

id -u ${NAME} > /dev/null 2>&1 || {
  useradd -m -d /home/${NAME} -s /bin/bash ${NAME}
  mkdir -p /home/${NAME}/.ssh
  chmod 700 /home/${NAME}/.ssh
  chown -R ${NAME}:${NAME} /home/${NAME}/.ssh

  # Add the deploy user to the www-data group.
  usermod -a -G www-data ${NAME}
}

# ------------------------------------------------------------------------
# Deploy script stub.
# ------------------------------------------------------------------------

cat > "${DEPLOY_SCRIPT}" <<EOF
#!/bin/bash
#
# Update the local repo and run the deploy/deploy-updates script there.
# Passes through the <env> parameter to that script.
#
# Usage: deploy <env> <branch>
#

set -o nounset
set -o errexit

DIR="\$(cd "\$( dirname "\$0" )" && pwd)"
HOME="\$(echo ~)"

function usage () {
  echo "Usage: \${0} <env> <branch>"
  exit 1
}

# Check arguments.
if [ "\${#}" -ne "2" ]; then
  usage
fi

ENV="\${1}"
REPO_REF="\${2}"
REPO_REMOTE="origin"

if [ ! -d "\${HOME}/app" ]; then
  git clone "git@github.com:example/\${REPO}.git" "\${HOME}/app"
fi

cd "\${HOME}/app"
git fetch "\${REPO_REMOTE}"
git reset --hard "\${REPO_REMOTE}/\${REPO_REF}"

"\${HOME}/app/deploy/deploy.sh" "\${ENV}" /var/www/html
EOF

chown ${NAME}:${NAME} "${DEPLOY_SCRIPT}"
chmod u+x "${DEPLOY_SCRIPT}"

# ------------------------------------------------------------------------
# Add limited sudoer rights to the deploy user.
# ------------------------------------------------------------------------

cat > /etc/sudoers.d/deploy <<EOF
# Rights to set ownership before and after deployment.
Cmnd_Alias DC_1 = /bin/chown -R ${NAME}\:www-data /var/www/html
Cmnd_Alias DC_2 = /bin/chown -R www-data\:www-data /var/www/html
${NAME} ALL=(ALL) NOPASSWD: DC_1, DC_2
EOF

Deployment Workflow

Given this setup, a developer will take the following steps to deploy an update to the application:

Write, test, and approve new features.
Version the web application.
Create a branch for the new version and push it to the repository.
SSH to the deploy user on the web application server.
Deploy the version branch using a command like ./deploy.sh production release-1.0.0.

Limitations

This system works well, and can even be updated and maintained by junior developers, for web applications that run on a single persistent physical server or virtual instance. Once multiple servers or transient instances likely to come and go frequently become involved then other deployment approaches or automation layers are needed. Most small organizations don't need any more than a persistent single server for their web applications, however, and aiming higher than that is likely to cause more problems than it solves over the long term.

Rsync version varies widely across Linux distributions at the time of writing, and is under active development, so some juggling of options and functionality might be needed if adapting the script examples in this post for use in distributions other than Ubuntu. In particular you may find issues with setting permissions on destination files. If so it might be better to break that out into a separate command.

Out of scope of this discussion are other important items such as backups and documentation. A simple, robust backup solution for the application server and the Git repository is vital, probably more so than any other devops automation. Similarly, everything must be clearly documented in a place where that documentation will be maintained as code changes. I recommend using Markdown files in the relevant Git repository, especially if using GitHub or similar services that render the Markdown for easy reading while browsing a repository.