Deploying Spring Boot to AWS ECS with Fargate

Running a Spring Boot service on ECS Fargate is straightforward once you’ve done it a few times, but the first time you encounter the combination of ECR, task definitions, ALB target groups, Parameter Store injection, and GitHub Actions all needing to agree with each other, it can feel like an awful lot of moving parts. This post walks through the whole chain — from Dockerfile to a live, auto-deploying service — with the decisions that actually matter explained.

The setup I describe here is close to what I’ve used for microservices at DWP Digital: containerised Spring Boot, Fargate compute (no EC2 instances to babysit), ALB for HTTPS termination, Parameter Store for secrets, and GitHub Actions for the deployment pipeline.

Dockerfile: Optimise for Java

The most common mistake is treating a Java Dockerfile like a Node or Python one. Java has two specific requirements: a JDK at build time, a JRE at runtime, and the layer cache needs to be structured around Maven’s dependency resolution.

# Stage 1: dependency resolution (cached unless pom.xml changes)
FROM eclipse-temurin:21-jdk-alpine AS dependencies
WORKDIR /app
COPY .mvn/ .mvn/
COPY mvnw pom.xml ./
RUN ./mvnw dependency:go-offline -q

# Stage 2: build
FROM dependencies AS build
COPY src/ src/
RUN ./mvnw package -DskipTests -q

# Stage 3: runtime image (JRE only — ~100MB smaller than JDK)
FROM eclipse-temurin:21-jre-alpine
WORKDIR /app
RUN addgroup -S app && adduser -S app -G app
COPY --from=build /app/target/*.jar app.jar
USER app
EXPOSE 8080
ENTRYPOINT ["java", \
  "-XX:+UseContainerSupport", \
  "-XX:MaxRAMPercentage=75.0", \
  "-jar", "app.jar"]

-XX:+UseContainerSupport tells the JVM to respect cgroup memory limits rather than using the host’s total RAM. Without it, the JVM may size its heap based on the EC2 host and immediately trigger an OOM kill in the container. -XX:MaxRAMPercentage=75.0 allocates 75% of the container’s memory limit to the heap, leaving room for off-heap memory (Metaspace, thread stacks, direct buffers).

Publishing to ECR

Create a private ECR repository and push from your local machine to verify the setup before wiring GitHub Actions:

aws ecr get-login-password --region eu-west-1 \
  | docker login --username AWS \
    --password-stdin 123456789.dkr.ecr.eu-west-1.amazonaws.com

docker build -t my-service .
docker tag my-service:latest \
  123456789.dkr.ecr.eu-west-1.amazonaws.com/my-service:latest
docker push 123456789.dkr.ecr.eu-west-1.amazonaws.com/my-service:latest

Enable ECR image scanning on push — it’s one checkbox in the console and gives you a free CVE scan of every image before it runs in production.

ECS Task Definition

The task definition is where CPU, memory, environment variables, and the container image come together. Define it as JSON and keep it in source control:

{
  "family": "my-service",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "executionRoleArn": "arn:aws:iam::123456789:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::123456789:role/my-service-task-role",
  "containerDefinitions": [
    {
      "name": "my-service",
      "image": "123456789.dkr.ecr.eu-west-1.amazonaws.com/my-service:latest",
      "portMappings": [{ "containerPort": 8080, "protocol": "tcp" }],
      "environment": [
        { "name": "SPRING_PROFILES_ACTIVE", "value": "prod" }
      ],
      "secrets": [
        {
          "name": "SPRING_DATASOURCE_PASSWORD",
          "valueFrom": "arn:aws:ssm:eu-west-1:123456789:parameter/my-service/prod/db-password"
        }
      ],
      "healthCheck": {
        "command": ["CMD-SHELL",
          "wget -qO- http://localhost:8080/actuator/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 60
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/my-service",
          "awslogs-region": "eu-west-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ]
}

The startPeriod: 60 on the health check is important. Fargate marks a container unhealthy if it fails health checks before Spring Boot has finished starting. Without startPeriod, a slow JVM startup (common on first boot when the JIT is cold) can trigger a replacement cycle before the service is even ready.

Parameter Store for Secrets

Never bake secrets into environment variables in the task definition plaintext. Use SSM Parameter Store with SecureString parameters:

// In application.properties (prod profile):
// spring.datasource.password=${SPRING_DATASOURCE_PASSWORD}
// The ECS task injects SPRING_DATASOURCE_PASSWORD from Parameter Store at runtime

The ECS task execution role needs ssm:GetParameters and kms:Decrypt permissions for the parameter ARNs referenced in the secrets block. Scope the IAM policy tightly — only the parameters this specific service needs:

{
  "Effect": "Allow",
  "Action": ["ssm:GetParameters", "kms:Decrypt"],
  "Resource": [
    "arn:aws:ssm:eu-west-1:123456789:parameter/my-service/prod/*"
  ]
}

GitHub Actions: Automated Deployment

Wire ECR push and ECS deployment into GitHub Actions on merge to main:

name: Deploy to ECS

on:
  push:
    branches: [main]

env:
  AWS_REGION: eu-west-1
  ECR_REGISTRY: 123456789.dkr.ecr.eu-west-1.amazonaws.com
  ECR_REPOSITORY: my-service
  ECS_CLUSTER: my-cluster
  ECS_SERVICE: my-service
  CONTAINER_NAME: my-service

jobs:
  deploy:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Set up JDK 21
        uses: actions/setup-java@v4
        with:
          java-version: '21'
          distribution: 'temurin'
          cache: 'maven'

      - name: Build JAR
        run: mvn -B package -DskipTests

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Log in to ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Build and push Docker image
        id: build-image
        run: |
          IMAGE_TAG=${{ github.sha }}
          docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
          docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
          echo "image=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT

      - name: Download task definition
        run: |
          aws ecs describe-task-definition \
            --task-definition my-service \
            --query taskDefinition > task-definition.json

      - name: Update ECS task definition with new image
        id: task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: task-definition.json
          container-name: ${{ env.CONTAINER_NAME }}
          image: ${{ steps.build-image.outputs.image }}

      - name: Deploy to ECS
        uses: aws-actions/amazon-ecs-deploy-task-definition@v1
        with:
          task-definition: ${{ steps.task-def.outputs.task-definition }}
          service: ${{ env.ECS_SERVICE }}
          cluster: ${{ env.ECS_CLUSTER }}
          wait-for-service-stability: true

wait-for-service-stability: true holds the GitHub Actions workflow open until ECS has replaced the old task with the new one and the health checks pass. If deployment fails, the action fails — you get a red build rather than a silent bad deployment.

Right-Sizing Fargate Tasks

ProTip: Java workloads are consistently undersized on Fargate. The temptation is to start with 256 CPU / 512MB memory and scale up. In practice, Spring Boot with a typical set of dependencies (security, data, actuator, web) needs at least 512 CPU / 1024MB to run comfortably. Below that threshold, startup times lengthen (the JIT is CPU-starved), GC pressure increases, and you see latency spikes under load.

For a trading service where response time consistency matters, I run 1024 CPU / 2048MB for the primary containers. It costs more than you might expect (Fargate pricing adds up quickly for always-on services), but the performance consistency is worth it. If cost is a concern, consider moving steady-state workloads to EC2 Spot with ECS capacity providers and reserving Fargate for burst — but that’s a topic for another post.

Cold start impact is real but manageable. The first request after a deployment hits a cold JVM. Configure Spring Boot’s lazy initialisation (spring.main.lazy-initialization=true) and set the ECS health check startPeriod generously — the ALB won’t route traffic until the health check passes, so the cold start happens before any real traffic hits.

If you’re deploying Java services to AWS and want an engineer who has been through this in production, get in touch.