Spring Batch — MultiThreaded Step (Parallel Processing Deep Dive)
Processing large volumes quickly often requires parallelism. Spring Batch supports parallel execution via multi-threaded steps (TaskExecutor), partitioning, and split/flow. Choosing and implementing the correct approach requires understanding thread-safety, transaction boundaries, chunk semantics, and where shared state can cause subtle bugs.
๐ What you’ll learn
- Difference: multi-threaded step vs partitioning
- How to configure a multi-threaded step with
TaskExecutor - Thread-safety rules for readers, processors, and writers
- Transactional and chunk implications
- Performance tuning: chunk size, pool size, and commit intervals
- Common pitfalls and best practices for production
๐ง Multi-threaded Step vs Partitioning — Quick Comparison
| Aspect | Multi-threaded Step | Partitioning |
|---|---|---|
| Concurrency model | Multiple threads process the same step (shared Step instance) | Master creates partitions; each partition runs a separate step (can be remote) |
| Use case | Medium datasets where reader/writer are thread-safe | Large datasets, heavy processing, or when data can be split by key |
| Isolation | Shared memory, must handle thread-safety | Each partition can have isolated resources & transactions |
| Complexity | Lower to configure | Higher; can scale across machines |
๐ง Configure a Multi-threaded Step (TaskExecutor)
The simplest way to run a step with multiple threads is to provide a TaskExecutor to the step builder. The example below uses ThreadPoolTaskExecutor.
@Bean
public Step multiThreadedStep() {
return stepBuilderFactory.get("multiThreadedStep")
.<Input, Output>chunk(50)
.reader(threadSafeReader())
.processor(threadSafeProcessor())
.writer(threadSafeWriter())
.taskExecutor(taskExecutor())
.throttleLimit(10) // maximum concurrently running tasks
.build();
}
@Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor();
exec.setCorePoolSize(10);
exec.setMaxPoolSize(20);
exec.setQueueCapacity(50);
exec.setThreadNamePrefix("batch-thread-");
exec.initialize();
return exec;
}
Note: throttleLimit(int) controls how many concurrent executions are allowed — set this thoughtfully.
๐งฉ Deep Dive: ThreadPoolTaskExecutor configuration explained
Below is a production-grade, line-by-line explanation of the ThreadPoolTaskExecutor bean and how each property affects runtime behavior, queuing, and scaling.
@Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor();
exec.setCorePoolSize(10);
exec.setMaxPoolSize(20);
exec.setQueueCapacity(50);
exec.setThreadNamePrefix("batch-thread-");
exec.initialize();
return exec;
}
๐ 1) Core Pool Size — setCorePoolSize(10)
- The core pool size is the number of threads that are kept alive and ready to process tasks even when idle. - When the step starts, Spring will immediately use up to 10 threads to process chunks in parallel. - These threads are long-lived — they reduce startup latency and keep throughput steady.
๐ 2) Max Pool Size — setMaxPoolSize(20)
- This is the absolute ceiling of concurrent threads the executor may create. - Extra threads are created only when the queue is full and all core threads are busy. - Execution order for capacity expansion:
2) If core threads busy → tasks are placed in the queue
3) If queue fills → create extra threads up to maxPoolSize
4) If queue is full and max threads reached → tasks are rejected (TaskRejectedException)
๐ 3) Queue Capacity — setQueueCapacity(50)
- The queue holds tasks waiting to be executed when all core threads are busy. - A large queue means the pool is less likely to create extra threads (maxPoolSize may never be reached). - A small queue forces the executor to create threads more aggressively, which can cause thread churn.
Guidance:
- If your workload is IO-bound, a larger queue helps smooth bursts (e.g.,
corePoolSize × 3). - If your workload is CPU-bound, keep the queue smaller to avoid long waiting delays (e.g.,
corePoolSize × 1). - If queue fills and tasks are rejected, you'll see
TaskRejectedException— treat that as a signal to increase capacity or throttle job submissions.
๐ 4) Thread Name Prefix — setThreadNamePrefix("batch-thread-")
- Gives each thread a readable name such as batch-thread-1, which is invaluable for debugging and log correlation.
- Use meaningful prefixes in production for easier searching/filtering in logs and thread dumps.
๐ 5) initialize() — eager initialization
- Calling initialize() prepares the underlying ThreadPoolExecutor and thread factory beforehand.
- Without it, initialization may be lazy at first task submission — which can make timing and monitoring inconsistent.
⚙️ Runtime Behavior Summary
Incoming task (chunk) │ ├── If core threads (10) available → execute immediately │ ├── Else → push into queue (up to 50 waiting chunks) │ ├── If queue is full → create new threads (up to 20 max) │ └── If max threads used AND queue full → reject task → possible Step failure
Key takeaway: queueCapacity largely controls whether maxPoolSize is ever used. Tune both together — they interact in non-obvious ways.
⚠ Important Spring Batch Considerations
- Transactions: Each chunk is committed in the thread that handled it. More threads mean more concurrent transactions — ensure your DB can handle that.
-
Connection pool sizing: Set HikariCP (or your pool) max size to at least the maximum number of concurrent writer threads:
spring.datasource.hikari.maximum-pool-size = <= maxPoolSize (or slightly higher)
- Idempotency: Because chunks may be retried/rolled back and reprocessed, make external side-effects idempotent (upserts, outbox).
⚙️ Recommended Tuning Patterns
Use these starting suggestions and iterate based on metrics:
- CPU-bound:
corePoolSize = numberOfCores maxPoolSize = numberOfCores + 2 queueCapacity = corePoolSize × 1
- I/O-bound:
corePoolSize = numberOfCores × 2 maxPoolSize = numberOfCores × 4 queueCapacity = corePoolSize × 3
- Mixed:
corePoolSize = numberOfCores maxPoolSize = numberOfCores × 2 queueCapacity = corePoolSize × 2
๐งช How to Verify Parallelism — Quick checks
log.info("Processing {} on {}", item.getId(), Thread.currentThread().getName());
If multiple distinct thread names appear in logs, the step is executing in parallel. If you see only one thread name, your executor or throttleLimit is not effective.
๐ท Screenshot — Example Layout Issue (for reference)
๐ When to prefer Partitioning over Multi-threaded Step
- When input can be split by key (date ranges, ID ranges)
- When you need horizontal scalability across nodes
- When isolation of resources per partition reduces contention
- When readers are expensive and easier to duplicate per partition
Master Step | +-- Partition 1 (Step on thread/process) -> reads slice A -> processes -> writes +-- Partition 2 (Step on thread/process) -> reads slice B -> processes -> writes ...
✅ Best Practices Checklist
- Prefer stateless processors and writers that use transactional resources provided per thread.
- Use partitioning when input can be sliced or when scaling across machines is required.
- Make side-effects idempotent or use an outbox pattern.
- Tune chunk size and pool size incrementally and measure.
- Ensure datasource and external services can handle concurrency.
- Use metrics (timers, DB pool usage, queue size) to guide tuning.