How to configure blue/green deployments for an ECS Fargate service using CodeDeploy and AWS CDK — shift traffic gradually, run validation hooks, and roll back automatically on failure.
A standard ECS rolling deployment replaces tasks in-place — old tasks stop as new ones start. If the new version has a startup failure or a runtime bug, some users hit the old version and some hit the broken new one during the window. Blue/green deployment eliminates this: the new version starts completely in a separate target group, traffic shifts only after health checks pass, and rollback is instantaneous.
Blue/green on ECS uses CodeDeploy to manage the traffic shift:
// ECS service configured for CODE_DEPLOY deployment controller
FargateService service = FargateService.Builder.create(this, "TradingService")
.cluster(cluster)
.taskDefinition(taskDef)
.desiredCount(2)
.deploymentController(DeploymentController.builder()
.type(DeploymentControllerType.CODE_DEPLOY)
.build())
.build();
Wire the load balancer with two target groups — blue (production) and green (replacement):
ApplicationTargetGroup blueTargetGroup = ApplicationTargetGroup.Builder.create(this, "BlueTarget")
.vpc(vpc)
.port(8080)
.protocol(ApplicationProtocol.HTTP)
.healthCheck(HealthCheck.builder()
.path("/actuator/health/readiness")
.healthyThresholdCount(2)
.unhealthyThresholdCount(3)
.interval(Duration.seconds(10))
.build())
.build();
ApplicationTargetGroup greenTargetGroup = ApplicationTargetGroup.Builder.create(this, "GreenTarget")
.vpc(vpc)
.port(8080)
.protocol(ApplicationProtocol.HTTP)
.healthCheck(HealthCheck.builder()
.path("/actuator/health/readiness")
.healthyThresholdCount(2)
.unhealthyThresholdCount(3)
.interval(Duration.seconds(10))
.build())
.build();
// Production listener — initially pointing to blue
ApplicationListener listener = alb.addListener("ProductionListener",
BaseApplicationListenerProps.builder()
.port(443)
.defaultTargetGroups(List.of(blueTargetGroup))
.build());
// Test listener — used by CodeDeploy to validate green before shifting production traffic
ApplicationListener testListener = alb.addListener("TestListener",
BaseApplicationListenerProps.builder()
.port(8443)
.defaultTargetGroups(List.of(greenTargetGroup))
.build());
EcsDeploymentGroup.Builder.create(this, "DeploymentGroup")
.service(service)
.blueGreenDeploymentConfig(EcsBlueGreenDeploymentConfig.builder()
.blueTargetGroup(blueTargetGroup)
.greenTargetGroup(greenTargetGroup)
.listener(listener)
.testListener(testListener)
.deploymentApprovalWaitTime(Duration.minutes(0)) // auto-approve
.terminationWaitTime(Duration.minutes(5)) // wait before killing blue
.build())
.deploymentConfig(EcsDeploymentConfig.CANARY_10_PERCENT_5_MINUTES)
.build();
CANARY_10_PERCENT_5_MINUTES shifts 10% of traffic to green, waits 5 minutes, then shifts the remaining 90%. If any alarms fire during the 5-minute window, CodeDeploy rolls back.
Alternative deployment configs:
LINEAR_10_PERCENT_EVERY_1_MINUTES — 10% more every minute until 100%ALL_AT_ONCE — immediate cut-over (no canary)CodeDeploy lifecycle hooks let you run validation before and after traffic shifts:
// AppSpec hook — Lambda is invoked at each lifecycle event
CfnDeploymentGroup.ECSServiceProperty.builder()
.build();
Or define via appspec.yaml deployed with your task definition:
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: <TASK_DEFINITION>
LoadBalancerInfo:
ContainerName: trading-service
ContainerPort: 8080
Hooks:
- BeforeAllowTraffic: "arn:aws:lambda:eu-west-2:123456789:function:ValidateGreenDeployment"
- AfterAllowTraffic: "arn:aws:lambda:eu-west-2:123456789:function:SmokeTestProduction"
The BeforeAllowTraffic Lambda hits the test listener (port 8443) to run smoke tests against the green deployment before any production traffic shifts.
public class DeploymentValidationHandler implements RequestHandler<Map<String, Object>, Void> {
@Override
public Void handleRequest(Map<String, Object> event, Context context) {
String deploymentId = (String) event.get("DeploymentId");
String lifecycleHook = (String) event.get("LifecycleEventHookExecutionId");
try {
validateGreenTarget();
codeDeploy.putLifecycleEventHookExecutionStatus(
new PutLifecycleEventHookExecutionStatusRequest()
.withDeploymentId(deploymentId)
.withLifecycleEventHookExecutionId(lifecycleHook)
.withStatus(LifecycleEventStatus.Succeeded));
} catch (Exception e) {
codeDeploy.putLifecycleEventHookExecutionStatus(
new PutLifecycleEventHookExecutionStatusRequest()
.withDeploymentId(deploymentId)
.withLifecycleEventHookExecutionId(lifecycleHook)
.withStatus(LifecycleEventStatus.Failed));
}
return null;
}
private void validateGreenTarget() {
// Hit the test listener (port 8443) to verify the green deployment is healthy
String response = httpClient.get("http://alb-test-listener:8443/actuator/health");
if (!response.contains("\"status\":\"UP\"")) {
throw new RuntimeException("Green target health check failed");
}
}
}
Configure alarms that CodeDeploy monitors during the deployment. If any alarm enters ALARM state, the deployment rolls back:
Alarm errorRateAlarm = Alarm.Builder.create(this, "ErrorRateAlarm")
.metric(errorRateMetric)
.threshold(5.0) // 5% error rate
.evaluationPeriods(2)
.build();
EcsDeploymentGroup.Builder.create(this, "DeploymentGroup")
// ...
.alarms(List.of(errorRateAlarm))
.autoRollback(AutoRollbackConfig.builder()
.failedDeployment(true)
.alarmThreshold(true)
.build())
.build();
With automatic rollback on alarm, a deployment that increases error rate is reversed without human intervention — production returns to the known-good blue version in seconds.
If you’re building AWS deployment pipelines and want help with zero-downtime deployment strategies, get in touch.