Spring Batch — MultiThreaded Step (Parallel Processing Deep Dive)

Processing large volumes quickly often requires parallelism. Spring Batch supports parallel execution via multi-threaded steps (TaskExecutor), partitioning, and split/flow. Choosing and implementing the correct approach requires understanding thread-safety, transaction boundaries, chunk semantics, and where shared state can cause subtle bugs.

📌 What you’ll learn

Difference: multi-threaded step vs partitioning
How to configure a multi-threaded step with TaskExecutor
Thread-safety rules for readers, processors, and writers
Transactional and chunk implications
Performance tuning: chunk size, pool size, and commit intervals
Common pitfalls and best practices for production

🧠 Multi-threaded Step vs Partitioning — Quick Comparison

Aspect	Multi-threaded Step	Partitioning
Concurrency model	Multiple threads process the same step (shared Step instance)	Master creates partitions; each partition runs a separate step (can be remote)
Use case	Medium datasets where reader/writer are thread-safe	Large datasets, heavy processing, or when data can be split by key
Isolation	Shared memory, must handle thread-safety	Each partition can have isolated resources & transactions
Complexity	Lower to configure	Higher; can scale across machines

🔧 Configure a Multi-threaded Step (TaskExecutor)

The simplest way to run a step with multiple threads is to provide a TaskExecutor to the step builder. The example below uses ThreadPoolTaskExecutor.

@Bean
public Step multiThreadedStep() {
  return stepBuilderFactory.get("multiThreadedStep")
    .<Input, Output>chunk(50)
    .reader(threadSafeReader())
    .processor(threadSafeProcessor())
    .writer(threadSafeWriter())
    .taskExecutor(taskExecutor())
    .throttleLimit(10)   // maximum concurrently running tasks
    .build();
}

@Bean
public TaskExecutor taskExecutor() {
  ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor();
  exec.setCorePoolSize(10);
  exec.setMaxPoolSize(20);
  exec.setQueueCapacity(50);
  exec.setThreadNamePrefix("batch-thread-");
  exec.initialize();
  return exec;
}

Note: throttleLimit(int) controls how many concurrent executions are allowed — set this thoughtfully.

🧩 Deep Dive: ThreadPoolTaskExecutor configuration explained

Below is a production-grade, line-by-line explanation of the ThreadPoolTaskExecutor bean and how each property affects runtime behavior, queuing, and scaling.

@Bean
public TaskExecutor taskExecutor() {
  ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor();
  exec.setCorePoolSize(10);
  exec.setMaxPoolSize(20);
  exec.setQueueCapacity(50);
  exec.setThreadNamePrefix("batch-thread-");
  exec.initialize();
  return exec;
}

🔍 1) Core Pool Size — `setCorePoolSize(10)`

- The core pool size is the number of threads that are kept alive and ready to process tasks even when idle. - When the step starts, Spring will immediately use up to 10 threads to process chunks in parallel. - These threads are long-lived — they reduce startup latency and keep throughput steady.

🔍 2) Max Pool Size — `setMaxPoolSize(20)`

- This is the absolute ceiling of concurrent threads the executor may create. - Extra threads are created only when the queue is full and all core threads are busy. - Execution order for capacity expansion:

1) Use up to corePoolSize threads
2) If core threads busy → tasks are placed in the queue
3) If queue fills → create extra threads up to maxPoolSize
4) If queue is full and max threads reached → tasks are rejected (TaskRejectedException)

🔍 3) Queue Capacity — `setQueueCapacity(50)`

- The queue holds tasks waiting to be executed when all core threads are busy. - A large queue means the pool is less likely to create extra threads (maxPoolSize may never be reached). - A small queue forces the executor to create threads more aggressively, which can cause thread churn.

Guidance:

If your workload is IO-bound, a larger queue helps smooth bursts (e.g., corePoolSize × 3).
If your workload is CPU-bound, keep the queue smaller to avoid long waiting delays (e.g., corePoolSize × 1).
If queue fills and tasks are rejected, you'll see TaskRejectedException — treat that as a signal to increase capacity or throttle job submissions.

🔍 4) Thread Name Prefix — `setThreadNamePrefix("batch-thread-")`

- Gives each thread a readable name such as batch-thread-1, which is invaluable for debugging and log correlation. - Use meaningful prefixes in production for easier searching/filtering in logs and thread dumps.

🔍 5) initialize() — eager initialization

- Calling initialize() prepares the underlying ThreadPoolExecutor and thread factory beforehand. - Without it, initialization may be lazy at first task submission — which can make timing and monitoring inconsistent.

⚙️ Runtime Behavior Summary

Incoming task (chunk)
│
├── If core threads (10) available → execute immediately
│
├── Else → push into queue (up to 50 waiting chunks)
│
├── If queue is full → create new threads (up to 20 max)
│
└── If max threads used AND queue full → reject task → possible Step failure

Key takeaway: queueCapacity largely controls whether maxPoolSize is ever used. Tune both together — they interact in non-obvious ways.

⚠ Important Spring Batch Considerations

Transactions: Each chunk is committed in the thread that handled it. More threads mean more concurrent transactions — ensure your DB can handle that.
Connection pool sizing: Set HikariCP (or your pool) max size to at least the maximum number of concurrent writer threads:
```
spring.datasource.hikari.maximum-pool-size = <= maxPoolSize (or slightly higher)
```
Idempotency: Because chunks may be retried/rolled back and reprocessed, make external side-effects idempotent (upserts, outbox).

⚙️ Recommended Tuning Patterns

Use these starting suggestions and iterate based on metrics:

CPU-bound:

corePoolSize = numberOfCores
maxPoolSize  = numberOfCores + 2
queueCapacity = corePoolSize × 1

I/O-bound:

corePoolSize = numberOfCores × 2
maxPoolSize  = numberOfCores × 4
queueCapacity = corePoolSize × 3

Mixed:

corePoolSize = numberOfCores
maxPoolSize  = numberOfCores × 2
queueCapacity = corePoolSize × 2

🧪 How to Verify Parallelism — Quick checks

log.info("Processing {} on {}", item.getId(), Thread.currentThread().getName());

If multiple distinct thread names appear in logs, the step is executing in parallel. If you see only one thread name, your executor or throttleLimit is not effective.

📷 Screenshot — Example Layout Issue (for reference)

🔀 When to prefer Partitioning over Multi-threaded Step

When input can be split by key (date ranges, ID ranges)
When you need horizontal scalability across nodes
When isolation of resources per partition reduces contention
When readers are expensive and easier to duplicate per partition

ASCII overview:

Master Step
  |
  +-- Partition 1 (Step on thread/process) -> reads slice A -> processes -> writes
  +-- Partition 2 (Step on thread/process) -> reads slice B -> processes -> writes
  ...

✅ Best Practices Checklist

Prefer stateless processors and writers that use transactional resources provided per thread.
Use partitioning when input can be sliced or when scaling across machines is required.
Make side-effects idempotent or use an outbox pattern.
Tune chunk size and pool size incrementally and measure.
Ensure datasource and external services can handle concurrency.
Use metrics (timers, DB pool usage, queue size) to guide tuning.

Spring Java Lab

📂 Explore Topics

🔥 Featured Posts