Spring Boot Microservices — From Java Dev to Architect

Chapter 1

The Palace vs The City

Before a single annotation — you need to feel the difference. Here is the story that will never leave your brain.

🏰

The Monolith — A Medieval Palace

ANALOGY → A medieval palace. The kitchen, treasury, hospital and throne room are all under one roof. If the kitchen catches fire — the king loses his throne. One disaster = full shutdown.

In your 10 years of Java, you built this. One WAR/EAR file. One shared database. One deployment. Change the Order module → rebuild everything → deploy at 3AM → pray nothing breaks.

The real pain you've felt: Long build cycles · One bug kills everything · Scale the whole app just because Payments is slow · All teams step on each other · Can't upgrade Java version in just one module

🏙️

Microservices — A Modern City

ANALOGY → A modern city. The hospital, bank, post office, and fire station are separate buildings. Each has its own staff, its own entrance, its own hours. The bank burning down doesn't close the hospital.

Each microservice is an independent Spring Boot application. Its own database, its own deployment, its own team. They talk over the network. You scale only what's hot. One service going down doesn't bring down the city.

Visual Map

🏙️ Your Microservice City — Every Building Has a Job

🚦

API Gateway

City Entrance

📋

Service Registry

City Directory

🏛️

Config Server

City Hall

🏥

Order Service

The Hospital

🏦

Payment Service

The Bank

🏪

Inventory Service

The Warehouse

📬

Message Bus

Post Office

⚡

Circuit Breaker

Electrical Fuse

🔐

Auth Service

City Police

📡

Observability

CCTV Network

Client Request→ API Gateway→ Service Registry lookup→ Target Service→ Response

Dimension	Monolith (Your Past)	Microservices (Your Future)
Deployment	One WAR for everything	Each service deployed independently
Scaling	Scale everything — even for one hot feature	Scale only Payment Service ×10
Failure	One bug = full outage	Payment fails, Orders still work
Database	One shared database for all teams	Each service owns its own DB
Teams	All devs touch the same codebase	Each team owns one service
Tech stack	Locked to one version everywhere	Each service can use different tech

⚠️ When NOT to use Microservices — Interviewers Love This

The "distributed systems tax": network latency between services, no ACID transactions across DBs, operational complexity (K8s, distributed tracing, service meshes). For a small team or early startup → a modular monolith is often the right answer. Extract services when you hit real scaling or team-size problems.

Chapter 2

The Full Request Journey

Trace every step a request takes from the client to your business logic and back.

The 5 Laws

Single Responsibility

One service does one thing. Order Service only manages orders. Never let it touch payments or user profiles. Conway's Law: your service boundaries should mirror your team structure.

Own Your Data

Each service has its own private database. No other service may access it directly. This is the hardest and most important rule. Violate it and you've created a distributed monolith — all the complexity, none of the benefits.

Communicate Over the Network

Services talk via REST (synchronous) or messaging (asynchronous). No shared memory. No direct method calls between services. This forces loose coupling at the cost of latency.

Design for Failure

Any service can go down at any time. Circuit breakers, retries, timeouts, and graceful fallbacks are not optional — they are the price of being distributed. Assume everything will fail.

Decentralise Everything

No central orchestration bus. No shared libraries that couple releases. Each team deploys independently, using its own CI/CD pipeline, on its own schedule.

Chapter 3

Service Discovery — Eureka

How do services find each other when IPs change with every deployment?

📋

The Yellow Pages Analogy

PROBLEM: You have 50 Order Service instances. Each has a different IP. Payment Service needs to call one. Hardcoding IPs? They change every deploy. So how do services find each other?

SOLUTION: Before mobile phones, there was a Yellow Pages directory. You looked up "pizza delivery" and got the current phone number — not a hardcoded address. Eureka is the Yellow Pages for your microservices. Services register their address when they start. Others look them up by name.

Register on startup

When Order Service starts, it tells Eureka: "Hi! I'm order-service, I'm at 192.168.1.x:8081". Eureka stores this. It also sends a heartbeat every 30s. If heartbeat stops → Eureka removes the instance.

Discover on demand

When Payment Service wants to call Orders, it asks Eureka: "Give me a healthy instance of order-service". Eureka returns the IP + port. This happens automatically with Feign.

Load balance automatically

Eureka returns ALL healthy instances. Spring Cloud LoadBalancer picks one (round-robin by default). If an instance is unhealthy, it's removed. Zero manual IP management.

Server Setup

Client (any service)

Feign Client

EurekaServerApplication.java + application.yml

@SpringBootApplication
@EnableEurekaServer     // ← ONE annotation = full registry server
public class EurekaServerApplication {
  public static void main(String[] args) {
    SpringApplication.run(EurekaServerApplication.class, args);
  }
}

# application.yml
server:
  port: 8761
eureka:
  client:
    register-with-eureka: false  # server doesn't register itself
    fetch-registry: false

# Visit http://localhost:8761 — you get a beautiful dashboard
# showing all registered services in real time!

application.yml (every client service)

# Every microservice adds these lines to register itself
spring:
  application:
    name: order-service    # ← THIS is the name others use to find you
eureka:
  client:
    service-url:
      defaultZone: http://eureka-server:8761/eureka/
  instance:
    prefer-ip-address: true
    lease-renewal-interval-in-seconds: 30  # heartbeat frequency
    lease-expiration-duration-in-seconds: 90

# Maven dependency needed:
# spring-cloud-starter-netflix-eureka-client

InventoryClient.java (inside Order Service)

// Feign = declarative HTTP. Zero boilerplate. Looks like a local call!
@FeignClient(name = "inventory-service")  // name = Eureka registration name
public interface InventoryClient {

  @GetMapping("/api/inventory/{skuCode}")
  Boolean isInStock(@PathVariable String skuCode);

  @PostMapping("/api/inventory/reserve")
  ReserveResponse reserve(@RequestBody ReserveRequest request);
}

// Usage in OrderService — no URL, no IP, no port, no RestTemplate!
@Service
public class OrderService {
  @Autowired private InventoryClient inventoryClient;

  public Order placeOrder(OrderRequest request) {
    if (!inventoryClient.isInStock(request.getSkuCode())) {
      throw new OutOfStockException("Item not available");
    }
    // ... save order
  }
}

// Enable Feign in main class:
@EnableFeignClients

🧠

Memory Hook — Yellow Pages

"Eureka = Yellow Pages. Services look up each other by name, not address. One annotation @EnableEurekaServer = the whole directory. Clients say their name in application.yml. Others call them by that name in @FeignClient. No IP ever needed."

Chapter 4

API Gateway — The City Entrance

🚦

Airport Security Analogy

ANALOGY → Every traveller enters through ONE security gate. Security checks your ID once. They route you to the right terminal. Without it: 50 different doors, 50 different security checks, total chaos for travellers and staff alike.

Without a gateway: every client knows every service URL. Auth logic lives in every service. CORS configured everywhere. Rate limiting — nowhere. With a gateway: one URL for everything. Auth once. Route anywhere.

Without Gateway

Client knows all 20 service URLs
JWT validation in every service
CORS config in every service
Rate limiting — manual, inconsistent
Load balancing — client must handle
SSL certs on every service
Versioning chaos

With Gateway

Client calls ONE URL
JWT validated at the door
CORS configured centrally
Rate limiting built-in
Routes to Eureka instances (lb://)
SSL terminates here only
/v1/, /v2/ versioning managed here

Route Config

Global Auth Filter

Fallback Controller

application.yml — Gateway Routes

spring:
  cloud:
    gateway:
      routes:
        - id: order-service
          uri: lb://order-service       # lb:// = load balanced via Eureka
          predicates:
            - Path=/api/orders/**        # match this URL pattern
          filters:
            - StripPrefix=1            # remove /api prefix before forwarding
            - name: CircuitBreaker
              args:
                name: orderCB
                fallbackUri: forward:/fallback/orders
            - name: RequestRateLimiter
              args:
                redis-rate-limiter.replenishRate: 10
                redis-rate-limiter.burstCapacity: 20

        - id: payment-service
          uri: lb://payment-service
          predicates:
            - Path=/api/payments/**
            - Method=POST                 # only route POST requests
          filters:
            - AddRequestHeader=X-Gateway-Source, spring-cloud-gateway

AuthFilter.java — runs on EVERY request

@Component
public class AuthFilter implements GlobalFilter, Ordered {

  @Autowired private JwtUtil jwtUtil;

  @Override
  public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
    String path = exchange.getRequest().getPath().toString();

    // Skip auth for public endpoints
    if (path.startsWith("/api/public")) {
      return chain.filter(exchange);
    }

    String authHeader = exchange.getRequest().getHeaders().getFirst("Authorization");
    if (authHeader == null || !authHeader.startsWith("Bearer ")) {
      exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
      return exchange.getResponse().setComplete();
    }

    String token = authHeader.substring(7);
    if (!jwtUtil.isValid(token)) {
      exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
      return exchange.getResponse().setComplete();
    }
    // Optionally forward userId to downstream services
    ServerHttpRequest mutated = exchange.getRequest().mutate()
      .header("X-User-Id", jwtUtil.extractUserId(token))
      .build();
    return chain.filter(exchange.mutate().request(mutated).build());
  }

  @Override public int getOrder() { return -1; }  // run first
}

FallbackController.java — graceful degradation

@RestController
public class FallbackController {

  @GetMapping("/fallback/orders")
  public ResponseEntity<?> ordersFallback() {
    return ResponseEntity
      .status(HttpStatus.SERVICE_UNAVAILABLE)
      .body(Map.of(
        "status", "degraded",
        "message", "Order service temporarily unavailable. Try again shortly.",
        "timestamp", Instant.now()
      ));
  }

  @GetMapping("/fallback/payments")
  public ResponseEntity<?> paymentsFallback() {
    return ResponseEntity.accepted()
      .body(Map.of("message", "Payment queued. You'll receive confirmation via email."));
  }
}

🧠

Memory Hook — Airport Security

"Gateway = airport security. ONE entrance. lb://service-name = load-balanced route via Eureka. GlobalFilter = security scanner all passengers pass through. Circuit Breaker at gateway level = if a terminal is broken, redirect to waiting area (fallback)."

Chapter 5

Config Server — City Hall

🏛️

City Hall Analogy

ANALOGY → City Hall. All city laws (DB URLs, timeouts, feature flags, passwords) live in one building. Change a law once → all buildings must follow. You don't need to rebuild a building when the law changes. Spring Cloud Config + @RefreshScope = change config in Git, all services update without restart.

Config Server

Client Setup

Live Refresh

Git Structure

ConfigServerApplication.java

@SpringBootApplication
@EnableConfigServer           // ← That's the whole server. One annotation.
public class ConfigServerApplication { ... }

# application.yml
server:
  port: 8888
spring:
  cloud:
    config:
      server:
        git:
          uri: https://github.com/your-org/config-repo
          search-paths: '{application}'   # folder per service name
          default-label: main
          username: ${GIT_USERNAME}        # keep secrets in env vars
          password: ${GIT_TOKEN}

bootstrap.yml (each client service)

# Spring Boot 2.x needs bootstrap.yml
# Spring Boot 3.x uses spring.config.import
spring:
  config:
    import: optional:configserver:http://config-server:8888
  application:
    name: order-service      # Config Server fetches order-service.yml from Git
  profiles:
    active: dev              # fetches order-service-dev.yml

# URL pattern served by Config Server:
# /{application}/{profile}  →  order-service/dev  →  order-service-dev.yml
# /{application}/{profile}/{label}  (label = git branch)

Live refresh — no restart needed!

// 1. Annotate beans that use config values with @RefreshScope
@RefreshScope   // ← re-creates this bean when config changes
@RestController
public class OrderController {

  @Value("${order.max-items:10}")     // default value = 10
  private int maxItemsPerOrder;

  @Value("${feature.express-delivery:false}")
  private boolean expressDeliveryEnabled;
}

// 2. Update the value in Git repo
// 3. Call this endpoint — NO restart needed:
//    POST http://order-service:8081/actuator/refresh
// 4. @RefreshScope beans are recreated with new values ✅

# Enable in application.yml:
management:
  endpoints:
    web:
      exposure:
        include: refresh, health, info, metrics

Git repo structure (config-repo/)

config-repo/
├── application.yml              # shared by ALL services
├── order-service/
│   ├── order-service.yml        # order service defaults
│   ├── order-service-dev.yml    # dev profile overrides
│   └── order-service-prod.yml   # prod profile overrides
├── payment-service/
│   ├── payment-service.yml
│   └── payment-service-prod.yml
└── inventory-service/
    └── inventory-service.yml

# Profile precedence (highest to lowest):
# order-service-prod.yml  →  order-service.yml  →  application.yml

🧠

Memory Hook — City Hall

"Config Server = City Hall. All laws in one Git repo. Service fetches its own file by application.name. @RefreshScope + POST /actuator/refresh = update the law live, no demolishing and rebuilding the building."

Chapter 6

Circuit Breaker — The Electrical Fuse

⚡

Why You NEED This — The Cascade Failure Story

PROBLEM: Payment Service is slow. Order Service calls it and waits 30s × 5,000 req/min → Order Service runs out of threads → Order Service itself crashes → API Gateway can't reach Order Service → entire system down. ONE slow service brought down EVERYTHING. This is cascading failure.

SOLUTION — Electrical Fuse: When too many wires short-circuit (calls fail), the fuse OPENS and stops all current. The house doesn't burn down. After a cool-down, it cautiously tests if the wire is fixed (HALF-OPEN). That is Resilience4J Circuit Breaker.

The 3 states — every interview will ask this

CLOSED

Normal flow
Failures counted

── 50% fail ──▶

OPEN

Fail fast
Fallback returned

── wait 10s ──▶

HALF-OPEN

3 test calls
Success→CLOSED
Fail→OPEN

Service Code

application.yml

Bulkhead Pattern

OrderService.java — Resilience4J (NOT Hystrix — Hystrix is deprecated!)

@Service
public class OrderService {

  // Stack order: CB wraps Retry wraps TimeLimiter
  @CircuitBreaker(name = "paymentService", fallbackMethod = "paymentFallback")
  @Retry(name = "paymentService")          // retry 3 times before CB counts failure
  @TimeLimiter(name = "paymentService")     // timeout after 3s
  public CompletableFuture<String> processPayment(Order order) {
    return CompletableFuture.supplyAsync(() ->
      paymentClient.charge(order.getId(), order.getAmount()));
  }

  // ALWAYS provide a fallback. Never fail silently.
  // Method signature must match + add Exception parameter
  public CompletableFuture<String> paymentFallback(Order order, Exception e) {
    log.error("Payment failed, queuing for retry: {}", e.getMessage());
    paymentQueue.enqueue(order);  // save to retry later
    return CompletableFuture.completedFuture(
      "Order confirmed. Payment will be processed shortly.");
  }
}

application.yml — Resilience4J config

resilience4j:
  circuitbreaker:
    instances:
      paymentService:
        sliding-window-size: 10           # evaluate last 10 calls
        failure-rate-threshold: 50        # open if ≥50% fail
        wait-duration-in-open-state: 10s   # stay open 10 seconds
        permitted-number-of-calls-in-half-open-state: 3
        slow-call-rate-threshold: 80      # also open if 80% are slow
        slow-call-duration-threshold: 2s

  retry:
    instances:
      paymentService:
        max-attempts: 3
        wait-duration: 500ms
        retry-exceptions:
          - java.io.IOException
          - java.util.concurrent.TimeoutException

  timelimiter:
    instances:
      paymentService:
        timeout-duration: 3s               # fail fast after 3 seconds
        cancel-running-future: true

Bulkhead — separate thread pools per dependency

// BULKHEAD ANALOGY: Ship compartments — if one floods, ship doesn't sink
// Without bulkhead: slow Payment hogs ALL threads → Inventory calls also fail
// With bulkhead: Payment has its OWN pool → Inventory pool unaffected

@Bulkhead(name = "paymentService", type = Bulkhead.Type.THREADPOOL)
@CircuitBreaker(name = "paymentService", fallbackMethod = "paymentFallback")
public CompletableFuture<String> processPayment(Order order) {
  return CompletableFuture.supplyAsync(() -> paymentClient.charge(order));
}

# application.yml
resilience4j:
  thread-pool-bulkhead:
    instances:
      paymentService:
        max-thread-pool-size: 10       # Payment gets max 10 threads
        core-thread-pool-size: 5
        queue-capacity: 20
      inventoryService:
        max-thread-pool-size: 15      # Inventory gets its own pool
        core-thread-pool-size: 8

🧠

Memory Hook — Electrical Fuse

"Circuit Breaker = electrical fuse. CLOSED (normal), OPEN (blown, fail fast), HALF-OPEN (testing recovery). Use Resilience4J — Hystrix is dead. Always write a fallback. Bulkhead = ship compartments = separate thread pools."

Chapter 7

Async Messaging — The Post Office

📞 REST / Feign — Phone Call

You dial, you wait for answer
Both parties must be available
Use: Real-time queries ("is item in stock?")
Failure: if callee is down → caller fails
Tight coupling between services

📬 Kafka / RabbitMQ — Post Office

Drop a letter in the box, walk away
Recipient reads when ready
Use: Events ("order placed → notify all")
Failure: message queued, delivered later
Loose coupling — sender doesn't know receivers

Kafka Producer

Kafka Consumer

RabbitMQ

Config

OrderService.java — Kafka Producer

@Service
public class OrderService {

  @Autowired private KafkaTemplate<String, OrderEvent> kafkaTemplate;

  public Order placeOrder(OrderRequest request) {
    Order order = orderRepository.save(buildOrder(request));

    // Publish event — returns IMMEDIATELY. Doesn't wait for anyone.
    OrderEvent event = new OrderEvent(order.getId(), order.getAmount(),
                                     order.getSkuCode(), order.getUserId());
    kafkaTemplate.send("order-placed-topic", event);

    // Order Service is DONE. It doesn't know or care that:
    // - Inventory Service will reserve the stock
    // - Notification Service will send confirmation email
    // - Analytics Service will update dashboards
    // They ALL get this event independently. Loose coupling!
    return order;
  }
}

NotificationService.java — Kafka Consumer

@Service
public class NotificationService {

  @KafkaListener(topics = "order-placed-topic", groupId = "notification-group")
  public void handleOrderPlaced(OrderEvent event) {
    // ⚠️ CRITICAL: Kafka guarantees at-least-once delivery.
    // The SAME message may arrive TWICE (e.g., after a consumer crash).
    // Your consumer MUST be idempotent!

    if (processedEventRepo.exists(event.getOrderId())) {
      log.info("Duplicate event, skipping: {}", event.getOrderId());
      return;  // already processed — idempotent check!
    }

    emailService.sendConfirmation(event.getUserId(), event.getOrderId());
    processedEventRepo.markProcessed(event.getOrderId());
  }
}

// InventoryService also consumes the same event, independently:
@KafkaListener(topics = "order-placed-topic", groupId = "inventory-group")
public void reserveStock(OrderEvent event) {
  inventoryService.reserve(event.getSkuCode(), event.getQuantity());
}

RabbitMQ (alternative to Kafka)

// RabbitMQ: better for task queues, complex routing
// Kafka: better for event streaming, high throughput, replay

// Producer
@Autowired private RabbitTemplate rabbitTemplate;

rabbitTemplate.convertAndSend("order.exchange", "order.placed", event);

// Consumer
@RabbitListener(queues = "order.notification.queue")
public void handleOrder(OrderEvent event) {
  emailService.send(event);
}

// Config
@Bean
public Queue orderQueue() { return new Queue("order.notification.queue"); }

@Bean
public TopicExchange exchange() { return new TopicExchange("order.exchange"); }

@Bean
public Binding binding(Queue q, TopicExchange e) {
  return BindingBuilder.bind(q).to(e).with("order.#");
}

application.yml — Kafka config

spring:
  kafka:
    bootstrap-servers: localhost:9092
    producer:
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
      acks: all              # wait for all replicas before success
      retries: 3
    consumer:
      group-id: notification-group
      auto-offset-reset: earliest
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      value-deserializer: org.springframework.kafka.support.serializer.JsonDeserializer
      properties:
        spring.json.trusted.packages: "*"
        max.poll.records: 50

🧠

Memory Hook — Post Office

"Kafka = Post Office. Producer drops the letter (kafkaTemplate.send). Consumer picks it up when ready (@KafkaListener). Idempotent consumer = processing the same letter twice gives the same result. At-least-once delivery = design for duplicates. Different groupId = different consumers EACH get their own copy."

Chapter 8 — Senior Level

Advanced Patterns — Saga, CQRS, Event Sourcing

These patterns separate junior from senior in every interview. Learn the WHY first.

🎭

Saga Pattern — Distributed Transactions

ANALOGY → Booking a wedding. Venue confirms → Caterer books → Photographer reserves. If Photographer cancels, you must call Caterer and Venue to cancel too. These are compensating transactions. No single wedding coordinator has a COMMIT button for all three.

Problem: In microservices, you can't do BEGIN TRANSACTION across 3 different databases. If Order DB commits but Payment DB fails — you have a corrupt state.

Choreography

Orchestration

Choreography Saga — event chain, no central coordinator

// Step 1: Order Service creates order and publishes event
kafkaTemplate.send("order-created", new OrderCreatedEvent(order.getId()));

// Step 2: Payment Service listens and processes
@KafkaListener(topics = "order-created")
public void onOrderCreated(OrderCreatedEvent e) {
  try {
    paymentService.charge(e.getOrderId());
    kafkaTemplate.send("payment-completed", e);      // success
  } catch (Exception ex) {
    kafkaTemplate.send("payment-failed", e);          // compensate!
  }
}

// Step 3: Order Service listens to payment-failed → compensate
@KafkaListener(topics = "payment-failed")
public void onPaymentFailed(OrderCreatedEvent e) {
  orderService.cancel(e.getOrderId());  // COMPENSATING TRANSACTION
  kafkaTemplate.send("order-cancelled", e);
}

Orchestration Saga — central coordinator tells each step what to do

// Orchestrator knows the full flow. Easier to visualise and debug.
@Service
public class OrderSagaOrchestrator {

  public void startOrderSaga(Order order) {
    try {
      // Step 1
      inventoryClient.reserve(order.getSkuCode(), order.getQty());
      try {
        // Step 2
        paymentClient.charge(order.getId(), order.getAmount());
        try {
          // Step 3
          notificationClient.sendConfirmation(order.getUserId());
          orderService.markCompleted(order.getId());
        } catch(Exception e3) {
          // Step 3 failed — compensate steps 1 and 2
          paymentClient.refund(order.getId());
          inventoryClient.release(order.getSkuCode());
          orderService.markFailed(order.getId());
        }
      } catch(Exception e2) {
        inventoryClient.release(order.getSkuCode()); // compensate step 1
      }
    } catch(Exception e1) {
      orderService.markFailed(order.getId());
    }
  }
}

📖

CQRS — Command Query Responsibility Segregation

ANALOGY → Hospital with separate departments. Emergency Room handles urgent writes (Commands — fast, transactional, strict). Diagnostic Lab handles detailed reads (Queries — complex, slow, can be eventually consistent). Don't run blood tests in the ER.

Problem: Your order history dashboard needs 5 JOINs across different tables. These complex reads are killing your write performance. CQRS: separate the write model (normalised, optimised for consistency) from the read model (denormalised, optimised for performance, possibly in Elasticsearch).

CQRS — Split controllers

// COMMAND side — writes to MySQL (strong consistency)
@RestController
public class OrderCommandController {
  @PostMapping("/orders")
  public ResponseEntity<?> placeOrder(@RequestBody CreateOrderCommand cmd) {
    orderCommandService.handle(cmd);          // writes to SQL DB
    kafkaTemplate.send("order-events", cmd);   // publish for read model sync
    return ResponseEntity.accepted().build();
  }
}

// QUERY side — reads from Elasticsearch (fast, denormalised)
@RestController
public class OrderQueryController {
  @GetMapping("/orders/{userId}/history")
  public List<OrderSummary> getHistory(@PathVariable String userId) {
    return orderQueryService.findByUser(userId); // reads Elasticsearch
  }
}

🧾

Event Sourcing — The Bank Ledger

ANALOGY → A bank's accounting ledger. You never erase old entries. You only ADD new entries (Deposit +₹500, Withdraw -₹200). Current balance = replay all entries. Complete audit trail. You can time-travel to any point in history.

Instead of storing current state (balance: ₹300), you store every event (AccountCreated, Deposited-500, Withdrew-200). To get current state → replay all events. Benefits: full audit trail, time-travel debugging, natural fit with CQRS. Works perfectly with Axon Framework in Spring Boot.

Chapter 9

Security — JWT + OAuth2

🎫

Two Analogies You'll Never Forget

JWT = Hotel Key Card. Security desk verifies your ID once and gives you a signed key card. You use that card to open any room without going back to the desk. Card has expiry. Can't be forged (it's cryptographically signed).

OAuth2 = Passport System. Your passport (issued by Keycloak / Okta = Government) lets you enter any country (service) that trusts that Government. Service validates the signature without calling the Government on every request.

JWT Filter

Resource Server

Service-to-Service

JwtAuthFilter.java

@Component
public class JwtAuthFilter extends OncePerRequestFilter {

  @Override
  protected void doFilterInternal(HttpServletRequest req, ...) {
    String header = req.getHeader("Authorization");

    if (header != null && header.startsWith("Bearer ")) {
      String token = header.substring(7);

      if (jwtUtil.validateToken(token)) {
        String userId = jwtUtil.extractUserId(token);
        List<GrantedAuthority> roles = jwtUtil.extractRoles(token);

        // ⚠️ CRITICAL: NEVER trust userId from request body!
        // Always extract from the VALIDATED JWT claims.
        // A malicious caller can put any userId in the body.
        var auth = new UsernamePasswordAuthenticationToken(userId, null, roles);
        SecurityContextHolder.getContext().setAuthentication(auth);
      }
    }
    filterChain.doFilter(req, res);
  }
}

SecurityConfig.java — Spring Boot 3 OAuth2 Resource Server

@Configuration
public class SecurityConfig {

  @Bean
  SecurityFilterChain filterChain(HttpSecurity http) throws Exception {
    http
      .csrf(csrf -> csrf.disable())
      .sessionManagement(s -> s.sessionCreationPolicy(
        SessionCreationPolicy.STATELESS))       // No sessions with JWT!
      .authorizeHttpRequests(auth -> auth
        .requestMatchers("/api/public/**").permitAll()
        .requestMatchers("/api/admin/**").hasRole("ADMIN")
        .anyRequest().authenticated())
      .oauth2ResourceServer(oauth2 ->
        oauth2.jwt(jwt ->
          jwt.jwtAuthenticationConverter(jwtAuthenticationConverter())));
    return http.build();
  }
}

# application.yml — points to Keycloak / your Auth Server JWKS
spring:
  security:
    oauth2:
      resourceserver:
        jwt:
          jwk-set-uri: http://keycloak:8080/realms/myapp/protocol/openid-connect/certs

Service-to-Service Auth — Client Credentials Flow (no user involved)

// Service-to-service: use OAuth2 Client Credentials grant
// No user. Service authenticates AS ITSELF to get a token.
// Spring Security OAuth2 Client handles this automatically with WebClient

@Bean
public WebClient paymentWebClient(OAuth2AuthorizedClientManager clientManager) {
  ServletOAuth2AuthorizedClientExchangeFilterFunction oauth2 =
    new ServletOAuth2AuthorizedClientExchangeFilterFunction(clientManager);
  oauth2.setDefaultClientRegistrationId("payment-service");
  return WebClient.builder()
    .baseUrl("http://payment-service")
    .apply(oauth2.oauth2Configuration())
    .build();
}

# application.yml — register as OAuth2 client
spring:
  security:
    oauth2:
      client:
        registration:
          payment-service:
            authorization-grant-type: client_credentials
            client-id: order-service
            client-secret: ${ORDER_SERVICE_SECRET}
        provider:
          payment-service:
            token-uri: http://keycloak:8080/realms/myapp/protocol/openid-connect/token

Chapter 10

Observability — The Three Pillars

You can't manage what you can't see. In a distributed system, observability is not optional — it's how you sleep at night.

📋

Logs

What happened? Structured JSON logs with Trace IDs. Stack: ELK (Elasticsearch + Logstash + Kibana) or Grafana Loki. Search all service logs from one UI.

📊

Metrics

How is it performing? Micrometer → Prometheus → Grafana dashboards. CPU, heap, request rate, error rate, circuit breaker state, custom counters.

🔍

Traces

Why is it slow? Zipkin / Jaeger. One Trace ID follows a request across all services. Visualise: Gateway (200ms) → Order (50ms) → Payment (120ms). Instant bottleneck detection.

DISTRIBUTED TRACING ANALOGY: A courier tracking number. One package (HTTP request) crosses 5 warehouses (services). The tracking number (Trace ID) shows every checkpoint, every delay, every handoff. Spring Cloud Sleuth injects this automatically into every log line and HTTP header.

application.yml — observability stack config

management:
  endpoints:
    web:
      exposure:
        include: health, info, metrics, prometheus, circuitbreakers
  metrics:
    tags:
      application: ${spring.application.name}  # tag all metrics with service name
  tracing:
    sampling:
      probability: 1.0   # sample 100% in dev (0.1 in prod)

# Zipkin endpoint
management:
  zipkin:
    tracing:
      endpoint: http://zipkin:9411/api/v2/spans

# Structured JSON logging for ELK
logging:
  pattern:
    console: "%d{yyyy-MM-dd HH:mm:ss} [%X{traceId}] %-5level %logger{36} - %msg%n"

Chapter 11

Memory Palace — Instant Recall System

One walk through your city and you'll never forget the stack. Guaranteed.

🏙️ The SACRED CITY — walk this every morning

Service Discovery — Eureka

The city Signboard. Find any building by name, not address. @EnableEurekaServer. @FeignClient(name="svc").

API Gateway — Spring Cloud Gateway

Airport Security. ONE entrance. Auth + Rate Limit + Route. lb://service-name = load-balanced via Eureka.

Config Server — Spring Cloud Config

City Hall. All laws in one Git repo. @RefreshScope + POST /actuator/refresh = update without restart.

Resilience — Circuit Breaker + Bulkhead

Electrical fuse + Ship compartments. CLOSED → OPEN → HALF-OPEN. Resilience4J not Hystrix.

Events — Kafka / RabbitMQ

Post Office. At-least-once delivery. Idempotent consumers. Different groupId = each consumer gets own copy.

Distributed Tracing — Zipkin + Sleuth

CCTV network with timestamps. One TraceId follows the request across all services.

Flash Cards — Test Yourself

Eureka server annotation?

@EnableEurekaServer

💡 "Enable the Yellow Pages"

Circuit breaker 3 states?

CLOSED → OPEN → HALF-OPEN

💡 Normal → Blown → Testing

Feign client annotation?

@FeignClient(name="svc")

💡 "Feign = pretend it's local"

Config refresh no restart?

@RefreshScope + POST /actuator/refresh

💡 "Update law, all buildings notified"

Circuit breaker tool 2026?

Resilience4J (NOT Hystrix!)

💡 Hystrix deprecated since 2018

Kafka delivery guarantee?

At-least-once → be idempotent

💡 Letter may arrive twice

Gateway load-balanced URI?

lb://service-name

💡 lb = lookup in Eureka

JWT session policy?

SessionCreationPolicy.STATELESS

💡 Key card = no front desk needed

Saga compensates how?

Undo each previous step on failure

💡 Wedding cancellation chain

CQRS splits what?

Write model vs Read model

💡 ER vs Diagnostic Lab

Bulkhead pattern purpose?

Separate thread pools per service

💡 Ship compartments

Event Sourcing stores?

Events (facts), not current state

💡 Bank ledger entries, not balance

Component	Tool	The Analogy
Service Registry	Netflix Eureka	Yellow Pages directory
API Gateway	Spring Cloud Gateway	Airport security + routing
Config Management	Spring Cloud Config + Git	City Hall (laws change once)
Circuit Breaker	Resilience4J	Electrical fuse
Bulkhead	Resilience4J ThreadPool	Ship compartments
Declarative HTTP	OpenFeign	Looks like a local method call
Async Messaging	Apache Kafka	Post office (fire and forget)
Auth Token	JWT + Keycloak	Hotel key card + passport
Distributed Tracing	Zipkin + Sleuth	Parcel tracking number
Metrics	Micrometer + Prometheus + Grafana	City health dashboard
Distributed Transactions	Saga Pattern (Axon)	Wedding booking chain
Read/Write Separation	CQRS	ER vs Diagnostic Lab

Chapter 12

Interview Q&A — Senior Level

The exact questions in senior Java interviews, with the depth of answer interviewers actually want. Click to reveal model answers.

Core MS Spring Specific System Design Advanced Patterns

CoreMonolith vs Microservices — when would you NOT use Microservices? ▾

A monolith is a single deployable unit; microservices are independently deployable services communicating over a network. The answer interviewers want to hear: Microservices add a "distributed systems tax" — network latency between services, no ACID transactions across service boundaries, significant operational complexity (container orchestration, distributed tracing, service meshes, eventual consistency). You would NOT use microservices for a small team, an early-stage startup, or when the complexity of distribution outweighs the benefits. The right answer is often to start with a well-structured modular monolith and extract services only when you hit real team-size or scaling problems. Conway's Law: your architecture mirrors your team structure.

CoreExplain Service Discovery. Client-side vs Server-side? ▾

Client-side discovery (Eureka + Spring Cloud LoadBalancer): The client queries the registry, gets a list of healthy instances, and picks one itself using a load balancing algorithm. The client does the routing. Spring Boot: @FeignClient(name="service") + Eureka = client-side. Server-side discovery (Kubernetes Service / AWS ALB): A load balancer sits between client and services. Client calls one stable DNS name; the infrastructure routes to a healthy instance. Client doesn't know about individual instances. Neither is universally better — K8s environments naturally use server-side, Spring Cloud on bare metal/AWS uses client-side. Eureka heartbeat: services register on startup, send heartbeat every 30s, removed if heartbeat stops.

CoreExplain Circuit Breaker. What are its 3 states? ▾

CLOSED — normal operation, all calls go through, failures counted in a sliding window (last N calls). OPEN — failure rate threshold exceeded; all calls immediately return the fallback without even calling the downstream service; this prevents cascading failure. HALF-OPEN — after the configured wait duration, a limited number of test calls are made; if they succeed, circuit closes again; if they fail, it re-opens. Use Resilience4J — Hystrix has been in maintenance mode since 2018 and should not be used in new code. Always provide a fallback method — returning a cached response, a sensible default, or queuing for retry is far better than propagating a 500 error up the chain.

DesignHow do you handle distributed transactions across microservices? ▾

You cannot use 2PC (Two-Phase Commit) in practice — it creates tight coupling and is a distributed deadlock risk. The pattern is Saga. Two styles: Choreography — each service publishes events and others react; decoupled but the overall flow is implicit and hard to visualise. Orchestration — a central Saga Orchestrator coordinates each step in sequence; easier to debug, monitor, and reason about. The key concept is compensating transactions — each step has an "undo" operation. Design the business flow so every step is reversible. Axon Framework provides production-grade Saga support in Spring Boot with built-in state management and replay. Key interview answer: Embrace eventual consistency rather than fighting it. Design operations to be idempotent so retries are safe.

SpringREST vs Event-Driven communication — when do you choose each? ▾

REST/Feign (synchronous): Use when you need an immediate response — "Is this item in stock before I confirm the order?" Both services must be alive. Tighter coupling. Simpler to debug with standard HTTP tooling. Kafka/Messaging (asynchronous): Use when you don't need an immediate response — "Order placed, notify inventory, send email, update analytics." Loose coupling — sender doesn't know or care about consumers. Message persisted so consumer can be offline. High throughput. Critical interview point: Design Kafka consumers to be idempotent because Kafka guarantees at-least-once delivery — the same message may arrive twice after a consumer crash or rebalance. Processing it twice must produce the same result. Check if you've already processed an event ID before acting on it.

AdvancedWhat is CQRS and when would you use it? ▾

Command Query Responsibility Segregation separates the write model (optimised for consistency and validation, usually normalised SQL) from the read model (optimised for query performance, possibly denormalised, possibly in Elasticsearch or MongoDB). Use CQRS when: read and write loads are dramatically different (e.g. 100:1 read-to-write ratio), read queries require complex joins that contend with writes, or you need to scale reads independently. The read model stays eventually consistent with the write model via domain events. Tradeoff to explicitly mention: eventual consistency — the read model may lag slightly behind. This is acceptable for reporting/dashboards but not for "is this ticket still available?" Pair with Event Sourcing for the most powerful combination — writes emit events, reads consume them and build their own denormalised views.

SpringHow do you secure microservices? Explain JWT and OAuth2. ▾

Three layers: Edge security — API Gateway handles JWT validation, rate limiting, HTTPS termination. Service-to-service — OAuth2 Client Credentials flow (service authenticates as itself, no user involved) or mutual TLS. Per-service — each service is a Resource Server, validates the JWT locally using the public key from the JWKS endpoint (no round-trip to Auth Server on every request). JWT = header.payload.signature. Stateless — no server-side session. Services verify the signature using the public key. Critical rule: Never trust the userId from the request body — always extract it from the validated JWT claims. Keycloak or Spring Authorization Server as the identity provider. Use SessionCreationPolicy.STATELESS — no HttpSession with JWT-based auth.

DesignHow do you trace a request across microservices? ▾

Spring Cloud Sleuth automatically injects a Trace ID (unique per original request) and Span ID (unique per service call) into every log line and outgoing HTTP header. When Service A calls Service B, the Trace ID propagates in the request header, so both log lines carry the same ID. Zipkin or Jaeger aggregates these spans and renders the complete call tree with timing: API Gateway (200ms) → Order Service (50ms) → Inventory Service (30ms) → Payment Service (120ms). Immediately pinpoints the bottleneck. In interviews, always mention: the importance of also logging the Trace ID in your application logs so you can grep across all service logs in ELK/Loki for a single request. Also mention sampling rate (100% in dev, ~10% in prod to manage volume).

DesignDesign an Order Management System using microservices. ▾

Services: Order Service (create/track orders, MySQL), Inventory Service (stock management, MySQL), Payment Service (charge/refund, MySQL), Notification Service (email/SMS, stateless), User Service (auth, MySQL). Infrastructure: API Gateway (Spring Cloud Gateway), Eureka (service discovery), Config Server (Git-backed), Kafka (event bus), Redis (cart caching, session). Flow: POST /orders → Gateway validates JWT → Order Service creates order (PENDING) → publishes OrderCreated event → Inventory Service reserves stock (or publishes StockInsufficient) → Payment Service charges (or publishes PaymentFailed) → Notification Service sends confirmation. Resilience: Circuit breaker on all Feign calls, Saga pattern for distributed transaction, outbox pattern to ensure events are published reliably. Observability: Zipkin traces, Prometheus metrics, Grafana dashboards, ELK for logs.

AdvancedWhat is the Bulkhead pattern and how does it differ from Circuit Breaker? ▾

Circuit Breaker: Prevents calls to a failing downstream service. Detects failure rate and stops making calls temporarily. Protects against cascading failure due to a downstream being down or slow. Bulkhead: Separates thread pools per downstream dependency. Prevents one slow downstream from exhausting the ENTIRE application's thread pool. Even if the circuit is closed (service is up but slow), bulkhead ensures Inventory calls won't be starved by slow Payment calls. Analogy: Circuit breaker is the fuse (cuts power when there's a short). Bulkhead is the ship compartment (flood in one room doesn't sink the whole ship). Use both together: Bulkhead limits thread pool per service → Circuit Breaker opens when failure threshold is hit → Fallback returns a sensible default. This combination makes your service resilient to any downstream misbehaviour.

From Java Developerto Microservices Architect

Single Responsibility

Own Your Data

Communicate Over the Network

Design for Failure

Decentralise Everything

Register on startup

Discover on demand

Load balance automatically

Without Gateway

With Gateway

📞 REST / Feign — Phone Call

📬 Kafka / RabbitMQ — Post Office

Service Discovery — Eureka

API Gateway — Spring Cloud Gateway

Config Server — Spring Cloud Config

Resilience — Circuit Breaker + Bulkhead

Events — Kafka / RabbitMQ

Distributed Tracing — Zipkin + Sleuth

From Java Developer
to Microservices Architect