AWS Auto Scaling to Follow the Daily Web Traffic Cycle

AWS offers a configurable mechanism to scale server clusters in response to slow changes in demand, such as the cyclic variation in traffic over the course of a day seen by most web services and sites. This is achieved by deploying an Auto Scaling Group to manage instances and then attaching CloudWatch alarms based on a metric such as CPUUtilization and Scaling Policies to react to high and low alarm values. When average CPU utilization falls to a low value, an instance is dropped. When it rises to a high value, instances are added. This is best done using CloudFormation, so as to wrap all of the definitions needed into a single stack. The following is an example that has benefited from experience:

{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Description": "Example template with auto scaling",
  "Parameters": {
    ...
  },
  "Resources": {
    "AutoScalingGroupAlertsSNSTopic": {
      "Type": "AWS::SNS::Topic",
      "Properties": {
        "Subscription": [
          {
            "Endpoint": "asg-alerts@example.com",
            "Protocol": "email"
          }
        ]
      }
    },

    "ExampleAutoScalingGroup": {
      "Type": "AWS::AutoScaling::AutoScalingGroup",
      "Properties": {
        "AvailabilityZones": [
          "us-east-1a",
          "us-east-1b",
          "us-east-1c"
        ],
        "HealthCheckType": "ELB",
        "HealthCheckGracePeriod": 600,
        "LoadBalancerNames": [
          "example-elb"
        ],
        "LaunchConfigurationName": {
          "Ref": "LaunchConfiguration"
        },
        "DesiredCapacity": 15,
        "MinSize": 5,
        "MaxSize": 30,
        "NotificationConfiguration": {
          "TopicARN": {
            "Ref": "AutoScalingGroupAlertsSNSTopic"
          },
          "NotificationTypes": [
            "autoscaling:EC2_INSTANCE_LAUNCH_ERROR",
            "autoscaling:EC2_INSTANCE_TERMINATE_ERROR"
          ]
        },
        "Tags": []
      }
    },

    "LaunchConfiguration": {
      "Type": "AWS::AutoScaling::LaunchConfiguration",
      "Properties": {
        ...
      }
    },

    "ScaleUpPolicy": {
      "Type": "AWS::AutoScaling::ScalingPolicy",
      "Properties": {
        "AdjustmentType": "ChangeInCapacity",
        "AutoScalingGroupName": {
          "Ref": "ExampleAutoScalingGroup"
        },
        "EstimatedInstanceWarmup": "600",
        "PolicyType": "StepScaling",
        "StepAdjustments": [
          {
            "MetricIntervalLowerBound": "0",
            "MetricIntervalUpperBound": "10",
            "ScalingAdjustment": "1"
          },
          {
            "MetricIntervalLowerBound": "10",
            "MetricIntervalUpperBound": "20",
            "ScalingAdjustment": "2"
          },
          {
            "MetricIntervalLowerBound": "20",
            "ScalingAdjustment": "3"
          }
        ]
      }
    },

    "ScaleDownPolicy": {
      "Type": "AWS::AutoScaling::ScalingPolicy",
      "Properties": {
        "AdjustmentType": "ChangeInCapacity",
        "AutoScalingGroupName": {
          "Ref": "ExampleAutoScalingGroup"
        },
        "Cooldown": "600",
        "PolicyType": "SimpleScaling",
        "ScalingAdjustment": "-1"
      }
    },

    "CPUUtilizationHighAlarm": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "ActionsEnabled": "true",
        "AlarmDescription": "Scale up for average CPUUtilization >= 50%.",
        "MetricName": "CPUUtilization",
        "Namespace": "AWS/EC2",
        "Statistic": "Average",
        "Period": "60",
        "EvaluationPeriods": "1",
        "Threshold": "50",
        "Unit": "Percent",
        "ComparisonOperator": "GreaterThanOrEqualToThreshold",
        "AlarmActions": [
          {
            "Ref": "ScaleUpPolicy"
          }
        ],
        "Dimensions": [
          {
            "Name": "AutoScalingGroupName",
            "Value": {
              "Ref": "ExampleAutoScalingGroup"
            }
          }
        ]
      }
    },

    "CPUUtilizationLowAlarm": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "ActionsEnabled": "true",
        "AlarmDescription": "Scale down for average CPUUtilization <= 30%.",
        "MetricName": "CPUUtilization",
        "Namespace": "AWS/EC2",
        "Statistic": "Average",
        "Period": "60",
        "EvaluationPeriods": "3",
        "Threshold": "30",
        "Unit": "Percent",
        "ComparisonOperator": "LessThanOrEqualToThreshold",
        "AlarmActions": [
          {
            "Ref": "ScaleDownPolicy"
          }
        ],
        "Dimensions": [
          {
            "Name": "AutoScalingGroupName",
            "Value": {
              "Ref": "ExampleAutoScalingGroup"
            }
          }
        ]
      }
    },

    ...
}

Auto Scaling Cannot and Is Not Intended to Deal With Sudden Demand Spikes

It takes a good five minutes at minimum to bring up a new instance in EC2, and more if the instance has a heavy provisioning requirement. If that is the the case, then it is a very good idea to include creation of an AMI into your deployment process, so that instances can spin up more rapidly. Still, even with that, it takes five minutes to initialize an instance. Further, it may take a couple of minutes for the chain of AWS scaling processes to trigger the creation of a new instance in response to load, even with fairly sensitive settings on a Scaling Policy.

This means that auto scaling is great for adapting to slowly changing traffic levels, but it isn't the right tool to manage sudden spikes that take place on a very short time frame. For that, look to maintaining excess idle capacity, throttling, use of third party caching layers such as CDNs, and the like.

Expect to Save About a Third of Fully Provisioned Costs

For a high traffic consumer web application focused mainly on one geographic region, such as US customers, using auto scaling to follow the daily cycle of traffic should save about a third of the cost of maintaining fixed full capacity overnight. Details vary widely, however. If cost is largely located in the data layer, that may not be amenable to scaling, for example. Further, if a web application can be served from just a couple of medium-sized instances - and there are large sites with traffic characteristics that allow for this, given the use of CDNs for caching - then auto scaling may not be practical or useful. Or rather, it is already happening at the CDN level.

CPUUtilization is a Poor and Unreliable Metric

The default CloudWatch metric of CPUUtilization may or may not correlate well with actual application performance or system load. That is strongly dependent on the application in question, and I've yet to see any identifiable patterns in this behavior. CPUUtilization is worth trying, since it exists already, but it is almost always also worth exporting the actual load average as a metric to CloudWatch. It is easy enough to set up a script and cron job to do that as a part of instance provisioning.

The right level of average CPUUtilization at which to scale up is also very application specific. Too high and the application will run into a cliff before new instances have been provisioned. In some applications 75% CPUUtilization is too high, because it understates the real system load. In others 50% might be too low because it will rarely trigger, even at high loads. Looking over application behavior and comparing instance load averages to reported CPUUtilization at various times of the day before implementing auto scaling is a good idea. But a better idea is to go with exporting the real load average - or another metric that is a better reflection of application load and performance - to CloudWatch and use that.

Aggressive Step Scale Up, Conservative Simple Scale Down

There are two modes of auto scaling, Step Scaling and Simple Scaling. Step Scaling allows scaling of number of instances by a larger amount for higher load levels, while Simple Scaling offers the configuration option of a cooldown period after each scaling event that can apply to both scaling up and scaling down. In Step Scaling there is the analogous EstimatedInstanceWarmup setting, but this only acts as a cooldown when scaling up.

For scaling up, it is best to err on the side of creating new instances when uncertain. So set up an alert that reacts after a single short period of high average load. If it is triggered due to a false alert, the worst that can happen is the new servers are scaled down shortly after being created.

    "CPUUtilizationHighAlarm": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {

        ...

        "Statistic": "Average",
        "Period": "60",
        "EvaluationPeriods": "1",

        ...
    }

Step Scaling also allows a cluster to more rapidly dig itself out from an unexpected and sustained leap in demand. If a very high utilization metric value results in 3 instances being added rather than only 1, then it will take a third of the time to stabilize the cluster at the higher traffic level. The cluster still behaves normally for slowly increasing traffic, adding one instance at a time. In the example below, if the metric alarm threshold is defined to be 50, then for an alarm at 50-60, one instance is added. For an alarm at 60-70, two instances are added. For an alert with a greater metric value, more than 70, three instances are added.

    "ScaleUpPolicy": {
      "Type": "AWS::AutoScaling::ScalingPolicy",
      "Properties": {
        "AdjustmentType": "ChangeInCapacity",
        "PolicyType": "StepScaling",
        "StepAdjustments": [
          {
            "MetricIntervalLowerBound": "0",
            "MetricIntervalUpperBound": "10",
            "ScalingAdjustment": "1"
          },
          {
            "MetricIntervalLowerBound": "10",
            "MetricIntervalUpperBound": "20",
            "ScalingAdjustment": "2"
          },
          {
            "MetricIntervalLowerBound": "20",
            "ScalingAdjustment": "3"
          }
        ]

        ...

      }
    }

Scaling down should be conservative and have a long cooldown. The consequences of scaling down inappropriately are worse than the consequences of scaling up inappropriately, as it can drive a cluster into being overloaded. A particularly pernicious issue is when some other temporary systems outage causes cluster failure in such a way as to drive down the load to near zero. For example, a CDN or other proxy fails, or database failure means that the servers become only lightly loaded because they are returning error messages and nothing else. In this situation a Step Scaling policy will activate frequently because it has no defined cooldown. Fixing the overall problem then gains the additional step of manually managing the auto-scaling group.

I have found that 10 to 20 minutes is a decent value for a Simple Scaling cooldown. In any emergency that gives plenty of time to focus on the critical issues rather than what the auto scaling group is doing. In any case, it is the need for a defined cooldown to protect against unforeseen circumstances that is the reason for using Simple Scaling for scaling down.

    "ScaleDownPolicy": {
      "Type": "AWS::AutoScaling::ScalingPolicy",
      "Properties": {
        "AdjustmentType": "ChangeInCapacity",
        "AutoScalingGroupName": {
          "Ref": "ExampleAutoScalingGroup"
        },
        "Cooldown": "600",
        "PolicyType": "SimpleScaling",
        "ScalingAdjustment": "-1"
      }
    }

Use a ConnectionDrainingPolicy on Elastic Load Balancers

When an instance is set to be terminated due to a scaling policy action, it is important that open connections be allowed to complete before it is removed. Elastic Load Balancers can be configured to manage this with a ConnectionDrainingPolicy setting. The timeout should be set to a reasonable value for the application in question, which for high traffic applications should rarely be longer than a few seconds.

    "ExampleElasticLoadBalancer": {
      "Type": "AWS::ElasticLoadBalancing::LoadBalancer",
      "Properties": {
        "LoadBalancerName": "example-elb",
        "ConnectionDrainingPolicy": {
          "Enabled": true,
          "Timeout": 5
        },

        ...

      }
    }

False Alarms Happen

CloudWatch will deliver occasional false alarms for any check on a metric like average CPUUtilization over a single evaluation period. For an alarm that must have few or no false alerts, require the threshold to be passed in two or three consecutive evaluation periods to minimize this issue.

    "CPUUtilizationLowAlarm": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {

        ...

        "Statistic": "Average",
        "Period": "60",
        "EvaluationPeriods": "3",

        ...
    }

Set Up Alerts on Scaling Failure Cases

It is easy to set up alerts to an email address on scaling failure cases: instances failing to initialize or instances failing to terminate. Both of these should be rare, and are worth knowing about when they happen. Failure to scale up in particular may be the prelude to a disaster, but fortunately there is usually time to respond if scaling thresholds are set conservatively.

    "AutoScalingGroupAlertsSNSTopic": {
      "Type": "AWS::SNS::Topic",
      "Properties": {
        "Subscription": [
          {
            "Endpoint": "asg-alerts@example.com",
            "Protocol": "email"
          }
        ]
      }
    },

    "ExampleAutoScalingGroup": {
      "Type": "AWS::AutoScaling::AutoScalingGroup",
      "Properties": {

        ...

        "NotificationConfiguration": {
          "TopicARN": {
            "Ref": "AutoScalingGroupAlertsSNSTopic"
          },
          "NotificationTypes": [
            "autoscaling:EC2_INSTANCE_LAUNCH_ERROR",
            "autoscaling:EC2_INSTANCE_TERMINATE_ERROR"
          ]
        }
      }
    },