Hire Me
← All Writing AWS

Blue/Green Deployments on ECS Fargate with CodeDeploy and CDK

How to configure blue/green deployments for an ECS Fargate service using CodeDeploy and AWS CDK — shift traffic gradually, run validation hooks, and roll back automatically on failure.

A standard ECS rolling deployment replaces tasks in-place — old tasks stop as new ones start. If the new version has a startup failure or a runtime bug, some users hit the old version and some hit the broken new one during the window. Blue/green deployment eliminates this: the new version starts completely in a separate target group, traffic shifts only after health checks pass, and rollback is instantaneous.

How blue/green works on ECS

Blue/green on ECS uses CodeDeploy to manage the traffic shift:

  1. New task set starts in a separate (“green”) target group behind the same load balancer
  2. CodeDeploy runs health checks against the green target group
  3. Traffic shifts — either all at once or gradually (canary / linear)
  4. After a configurable bake time, the old (“blue”) task set is terminated
  5. On any failure during validation, CodeDeploy rolls back by shifting traffic back to blue

CDK configuration

// ECS service configured for CODE_DEPLOY deployment controller
FargateService service = FargateService.Builder.create(this, "TradingService")
    .cluster(cluster)
    .taskDefinition(taskDef)
    .desiredCount(2)
    .deploymentController(DeploymentController.builder()
        .type(DeploymentControllerType.CODE_DEPLOY)
        .build())
    .build();

Wire the load balancer with two target groups — blue (production) and green (replacement):

ApplicationTargetGroup blueTargetGroup = ApplicationTargetGroup.Builder.create(this, "BlueTarget")
    .vpc(vpc)
    .port(8080)
    .protocol(ApplicationProtocol.HTTP)
    .healthCheck(HealthCheck.builder()
        .path("/actuator/health/readiness")
        .healthyThresholdCount(2)
        .unhealthyThresholdCount(3)
        .interval(Duration.seconds(10))
        .build())
    .build();

ApplicationTargetGroup greenTargetGroup = ApplicationTargetGroup.Builder.create(this, "GreenTarget")
    .vpc(vpc)
    .port(8080)
    .protocol(ApplicationProtocol.HTTP)
    .healthCheck(HealthCheck.builder()
        .path("/actuator/health/readiness")
        .healthyThresholdCount(2)
        .unhealthyThresholdCount(3)
        .interval(Duration.seconds(10))
        .build())
    .build();

// Production listener — initially pointing to blue
ApplicationListener listener = alb.addListener("ProductionListener",
    BaseApplicationListenerProps.builder()
        .port(443)
        .defaultTargetGroups(List.of(blueTargetGroup))
        .build());

// Test listener — used by CodeDeploy to validate green before shifting production traffic
ApplicationListener testListener = alb.addListener("TestListener",
    BaseApplicationListenerProps.builder()
        .port(8443)
        .defaultTargetGroups(List.of(greenTargetGroup))
        .build());

CodeDeploy deployment group

EcsDeploymentGroup.Builder.create(this, "DeploymentGroup")
    .service(service)
    .blueGreenDeploymentConfig(EcsBlueGreenDeploymentConfig.builder()
        .blueTargetGroup(blueTargetGroup)
        .greenTargetGroup(greenTargetGroup)
        .listener(listener)
        .testListener(testListener)
        .deploymentApprovalWaitTime(Duration.minutes(0))   // auto-approve
        .terminationWaitTime(Duration.minutes(5))           // wait before killing blue
        .build())
    .deploymentConfig(EcsDeploymentConfig.CANARY_10_PERCENT_5_MINUTES)
    .build();

CANARY_10_PERCENT_5_MINUTES shifts 10% of traffic to green, waits 5 minutes, then shifts the remaining 90%. If any alarms fire during the 5-minute window, CodeDeploy rolls back.

Alternative deployment configs:

Validation hooks with Lambda

CodeDeploy lifecycle hooks let you run validation before and after traffic shifts:

// AppSpec hook — Lambda is invoked at each lifecycle event
CfnDeploymentGroup.ECSServiceProperty.builder()
    .build();

Or define via appspec.yaml deployed with your task definition:

version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        TaskDefinition: <TASK_DEFINITION>
        LoadBalancerInfo:
          ContainerName: trading-service
          ContainerPort: 8080
Hooks:
  - BeforeAllowTraffic: "arn:aws:lambda:eu-west-2:123456789:function:ValidateGreenDeployment"
  - AfterAllowTraffic:  "arn:aws:lambda:eu-west-2:123456789:function:SmokeTestProduction"

The BeforeAllowTraffic Lambda hits the test listener (port 8443) to run smoke tests against the green deployment before any production traffic shifts.

Validation Lambda

public class DeploymentValidationHandler implements RequestHandler<Map<String, Object>, Void> {

    @Override
    public Void handleRequest(Map<String, Object> event, Context context) {
        String deploymentId = (String) event.get("DeploymentId");
        String lifecycleHook = (String) event.get("LifecycleEventHookExecutionId");

        try {
            validateGreenTarget();
            codeDeploy.putLifecycleEventHookExecutionStatus(
                new PutLifecycleEventHookExecutionStatusRequest()
                    .withDeploymentId(deploymentId)
                    .withLifecycleEventHookExecutionId(lifecycleHook)
                    .withStatus(LifecycleEventStatus.Succeeded));
        } catch (Exception e) {
            codeDeploy.putLifecycleEventHookExecutionStatus(
                new PutLifecycleEventHookExecutionStatusRequest()
                    .withDeploymentId(deploymentId)
                    .withLifecycleEventHookExecutionId(lifecycleHook)
                    .withStatus(LifecycleEventStatus.Failed));
        }
        return null;
    }

    private void validateGreenTarget() {
        // Hit the test listener (port 8443) to verify the green deployment is healthy
        String response = httpClient.get("http://alb-test-listener:8443/actuator/health");
        if (!response.contains("\"status\":\"UP\"")) {
            throw new RuntimeException("Green target health check failed");
        }
    }
}

CloudWatch alarms for automatic rollback

Configure alarms that CodeDeploy monitors during the deployment. If any alarm enters ALARM state, the deployment rolls back:

Alarm errorRateAlarm = Alarm.Builder.create(this, "ErrorRateAlarm")
    .metric(errorRateMetric)
    .threshold(5.0)     // 5% error rate
    .evaluationPeriods(2)
    .build();

EcsDeploymentGroup.Builder.create(this, "DeploymentGroup")
    // ...
    .alarms(List.of(errorRateAlarm))
    .autoRollback(AutoRollbackConfig.builder()
        .failedDeployment(true)
        .alarmThreshold(true)
        .build())
    .build();

With automatic rollback on alarm, a deployment that increases error rate is reversed without human intervention — production returns to the known-good blue version in seconds.

If you’re building AWS deployment pipelines and want help with zero-downtime deployment strategies, get in touch.

Samuel Jackson

Samuel Jackson

Senior Java Back End Developer & Contractor

Senior Java Back End Developer — Betfair Exchange API specialist, Spring Boot, AWS, and event-driven architecture. 20+ years delivering high-performance systems across betting, finance, energy, retail, and government. Available for Java contracting.