Spring Batch — MultiThreaded Step (Parallel Processing Deep Dive)

Spring Batch — MultiThreaded Step (Parallel Processing Deep Dive)

Processing large volumes quickly often requires parallelism. Spring Batch supports parallel execution via multi-threaded steps (TaskExecutor), partitioning, and split/flow. Choosing and implementing the correct approach requires understanding thread-safety, transaction boundaries, chunk semantics, and where shared state can cause subtle bugs.


๐Ÿ“Œ What you’ll learn

  • Difference: multi-threaded step vs partitioning
  • How to configure a multi-threaded step with TaskExecutor
  • Thread-safety rules for readers, processors, and writers
  • Transactional and chunk implications
  • Performance tuning: chunk size, pool size, and commit intervals
  • Common pitfalls and best practices for production

๐Ÿง  Multi-threaded Step vs Partitioning — Quick Comparison

AspectMulti-threaded StepPartitioning
Concurrency modelMultiple threads process the same step (shared Step instance)Master creates partitions; each partition runs a separate step (can be remote)
Use caseMedium datasets where reader/writer are thread-safeLarge datasets, heavy processing, or when data can be split by key
IsolationShared memory, must handle thread-safetyEach partition can have isolated resources & transactions
ComplexityLower to configureHigher; can scale across machines

๐Ÿ”ง Configure a Multi-threaded Step (TaskExecutor)

The simplest way to run a step with multiple threads is to provide a TaskExecutor to the step builder. The example below uses ThreadPoolTaskExecutor.

@Bean
public Step multiThreadedStep() {
  return stepBuilderFactory.get("multiThreadedStep")
    .<Input, Output>chunk(50)
    .reader(threadSafeReader())
    .processor(threadSafeProcessor())
    .writer(threadSafeWriter())
    .taskExecutor(taskExecutor())
    .throttleLimit(10)   // maximum concurrently running tasks
    .build();
}

@Bean
public TaskExecutor taskExecutor() {
  ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor();
  exec.setCorePoolSize(10);
  exec.setMaxPoolSize(20);
  exec.setQueueCapacity(50);
  exec.setThreadNamePrefix("batch-thread-");
  exec.initialize();
  return exec;
}

Note: throttleLimit(int) controls how many concurrent executions are allowed — set this thoughtfully.


๐Ÿงฉ Deep Dive: ThreadPoolTaskExecutor configuration explained

Below is a production-grade, line-by-line explanation of the ThreadPoolTaskExecutor bean and how each property affects runtime behavior, queuing, and scaling.

@Bean
public TaskExecutor taskExecutor() {
  ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor();
  exec.setCorePoolSize(10);
  exec.setMaxPoolSize(20);
  exec.setQueueCapacity(50);
  exec.setThreadNamePrefix("batch-thread-");
  exec.initialize();
  return exec;
}

๐Ÿ” 1) Core Pool Size — setCorePoolSize(10)

- The core pool size is the number of threads that are kept alive and ready to process tasks even when idle. - When the step starts, Spring will immediately use up to 10 threads to process chunks in parallel. - These threads are long-lived — they reduce startup latency and keep throughput steady.

๐Ÿ” 2) Max Pool Size — setMaxPoolSize(20)

- This is the absolute ceiling of concurrent threads the executor may create. - Extra threads are created only when the queue is full and all core threads are busy. - Execution order for capacity expansion:

1) Use up to corePoolSize threads
2) If core threads busy → tasks are placed in the queue
3) If queue fills → create extra threads up to maxPoolSize
4) If queue is full and max threads reached → tasks are rejected (TaskRejectedException)

๐Ÿ” 3) Queue Capacity — setQueueCapacity(50)

- The queue holds tasks waiting to be executed when all core threads are busy. - A large queue means the pool is less likely to create extra threads (maxPoolSize may never be reached). - A small queue forces the executor to create threads more aggressively, which can cause thread churn.

Guidance:

  • If your workload is IO-bound, a larger queue helps smooth bursts (e.g., corePoolSize × 3).
  • If your workload is CPU-bound, keep the queue smaller to avoid long waiting delays (e.g., corePoolSize × 1).
  • If queue fills and tasks are rejected, you'll see TaskRejectedException — treat that as a signal to increase capacity or throttle job submissions.

๐Ÿ” 4) Thread Name Prefix — setThreadNamePrefix("batch-thread-")

- Gives each thread a readable name such as batch-thread-1, which is invaluable for debugging and log correlation. - Use meaningful prefixes in production for easier searching/filtering in logs and thread dumps.

๐Ÿ” 5) initialize() — eager initialization

- Calling initialize() prepares the underlying ThreadPoolExecutor and thread factory beforehand. - Without it, initialization may be lazy at first task submission — which can make timing and monitoring inconsistent.


⚙️ Runtime Behavior Summary

Incoming task (chunk)
│
├── If core threads (10) available → execute immediately
│
├── Else → push into queue (up to 50 waiting chunks)
│
├── If queue is full → create new threads (up to 20 max)
│
└── If max threads used AND queue full → reject task → possible Step failure

Key takeaway: queueCapacity largely controls whether maxPoolSize is ever used. Tune both together — they interact in non-obvious ways.


⚠ Important Spring Batch Considerations

  • Transactions: Each chunk is committed in the thread that handled it. More threads mean more concurrent transactions — ensure your DB can handle that.
  • Connection pool sizing: Set HikariCP (or your pool) max size to at least the maximum number of concurrent writer threads:
    spring.datasource.hikari.maximum-pool-size = <= maxPoolSize (or slightly higher)
  • Idempotency: Because chunks may be retried/rolled back and reprocessed, make external side-effects idempotent (upserts, outbox).

⚙️ Recommended Tuning Patterns

Use these starting suggestions and iterate based on metrics:

  • CPU-bound:
    corePoolSize = numberOfCores
    maxPoolSize  = numberOfCores + 2
    queueCapacity = corePoolSize × 1
  • I/O-bound:
    corePoolSize = numberOfCores × 2
    maxPoolSize  = numberOfCores × 4
    queueCapacity = corePoolSize × 3
  • Mixed:
    corePoolSize = numberOfCores
    maxPoolSize  = numberOfCores × 2
    queueCapacity = corePoolSize × 2

๐Ÿงช How to Verify Parallelism — Quick checks

log.info("Processing {} on {}", item.getId(), Thread.currentThread().getName());

If multiple distinct thread names appear in logs, the step is executing in parallel. If you see only one thread name, your executor or throttleLimit is not effective.


๐Ÿ“ท Screenshot — Example Layout Issue (for reference)

screenshot example

๐Ÿ”€ When to prefer Partitioning over Multi-threaded Step

  • When input can be split by key (date ranges, ID ranges)
  • When you need horizontal scalability across nodes
  • When isolation of resources per partition reduces contention
  • When readers are expensive and easier to duplicate per partition
ASCII overview:
Master Step
  |
  +-- Partition 1 (Step on thread/process) -> reads slice A -> processes -> writes
  +-- Partition 2 (Step on thread/process) -> reads slice B -> processes -> writes
  ...

✅ Best Practices Checklist

  • Prefer stateless processors and writers that use transactional resources provided per thread.
  • Use partitioning when input can be sliced or when scaling across machines is required.
  • Make side-effects idempotent or use an outbox pattern.
  • Tune chunk size and pool size incrementally and measure.
  • Ensure datasource and external services can handle concurrency.
  • Use metrics (timers, DB pool usage, queue size) to guide tuning.