Spring Batch — Import CSV to Database (Complete Guide)
Importing CSV files into a database is one of the most common batch tasks. This guide shows a practical, production-ready approach using Spring Batch: how to read CSVs, validate records, skip faulty rows, write to DB efficiently, and design for restarts and observability.
FlatFileItemReader + ItemProcessor for validation + JdbcBatchItemWriter or JpaItemWriter for writes. Add faultTolerant() with skip or SkipPolicy, tune chunk size, and ensure DB connection pool >= concurrent writers.
Why this guide is different
- Focus on real-world pitfalls (validation, duplicates, partial failures)
- Production advice: retries, skip listeners, idempotency
- Configuration and tuning: chunk size, DB pool, throughput
- Complete runnable snippets (drop into Spring Boot)
Maven dependencies
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-batch</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-jdbc</artifactId> </dependency> <dependency> <groupId>com.h2database</groupId> <artifactId>h2</artifactId> <scope>runtime</scope> </dependency>
application.properties (example)
spring.datasource.url=jdbc:h2:mem:testdb spring.datasource.username=sa spring.datasource.password= spring.h2.console.enabled=true # Spring Batch spring.batch.job.enabled=false # Hikari tuning (example) spring.datasource.hikari.maximum-pool-size=30
Sample CSV (employees.csv)
id,name,email,department,salary 1,John Doe,john@example.com,Engineering,72000 2,Jane Smith,jane@example.com,HR,50000 3,Broken,invalid-email,Sales,45000 4,Sam Blue,sam@example.com,Engineering,80000
Entity (POJO)
public class Employee {
private Long id;
private String name;
private String email;
private String department;
private BigDecimal salary;
// getters & setters
}
FlatFileItemReader — robust setup
Use FlatFileItemReader with a LineMapper (tokenizer + field set mapper). Add @StepScope to inject job parameters like file path.
@Bean
@StepScope
public FlatFileItemReader<Employee> reader(@Value("#{jobParameters['file']}") String file) {
return new FlatFileItemReaderBuilder<Employee>()
.name("employeeReader")
.resource(new FileSystemResource(file))
.linesToSkip(1)
.delimited()
.names("id","name","email","department","salary")
.fieldSetMapper(fieldSet -> {
Employee e = new Employee();
e.setId(fieldSet.readLong("id"));
e.setName(fieldSet.readString("name"));
e.setEmail(fieldSet.readString("email"));
e.setDepartment(fieldSet.readString("department"));
e.setSalary(fieldSet.readBigDecimal("salary"));
return e;
})
.build();
}
Validation in ItemProcessor
Validate business rules in the processor. Throw an exception for invalid data so SkipPolicy or skip configuration can handle it.
public class EmployeeProcessor implements ItemProcessor<Employee, Employee> {
@Override
public Employee process(Employee emp) throws Exception {
if (emp.getEmail() == null || !emp.getEmail().contains("@")) {
throw new IllegalArgumentException("Invalid email: " + emp.getEmail());
}
if (emp.getSalary() == null || emp.getSalary().doubleValue() < 0) {
throw new IllegalArgumentException("Invalid salary");
}
// trim / normalize
emp.setName(emp.getName().trim());
return emp;
}
}
JdbcBatchItemWriter — efficient batch writes
JdbcBatchItemWriter is fast for bulk inserts. Use parameterized SQL and map fields from the item.
@Bean
public JdbcBatchItemWriter<Employee> writer(DataSource ds) {
return new JdbcBatchItemWriterBuilder<Employee>()
.dataSource(ds)
.sql("INSERT INTO employee (id, name, email, department, salary) VALUES (:id, :name, :email, :department, :salary)")
.beanMapped()
.build();
}
Job configuration — chunk + fault tolerance
Use chunk-oriented step, tune chunk size (commit interval), and enable fault tolerance with skips. Also attach a SkipListener to log & audit skipped rows.
@Bean
public Job importJob(JobRepository jobRepo, Step importStep) {
return new JobBuilder("importJob", jobRepo)
.start(importStep)
.build();
}
@Bean
public Step importStep(JobRepository jobRepo, PlatformTransactionManager txManager,
FlatFileItemReader<Employee> reader,
EmployeeProcessor processor,
JdbcBatchItemWriter<Employee> writer) {
return new StepBuilder("importStep", jobRepo)
.<Employee, Employee>chunk(100, txManager)
.reader(reader)
.processor(processor)
.writer(writer)
.faultTolerant()
.skip(IllegalArgumentException.class)
.skipLimit(50)
.listener(new LoggingSkipListener())
.build();
}
SkipListener — audit skipped records
public class LoggingSkipListener implements SkipListener<Employee, Employee> {
@Override
public void onSkipInRead(Throwable t) { /* log read errors */ }
@Override
public void onSkipInWrite(Employee item, Throwable t) { /* log write errors */ }
@Override
public void onSkipInProcess(Employee item, Throwable t) {
// persist to error table or write to failed CSV for reprocessing
System.err.println("Skipped: " + item + " because " + t.getMessage());
}
}
Transactional behavior & restartability
- Chunk commits happen per transaction. If a write fails, that chunk rolls back.
- To support job restarts, rely on Spring Batch metadata (JobRepository / execution context).
- Avoid in-processor state that cannot be reconstructed on restart (use StepExecutionContext if needed).
Idempotency: make writes safe
If a write could be attempted multiple times (retries, restarts), ensure idempotency. Use DB upserts or a unique constraint + insert-if-not-exists logic.
-- Example Postgres upsert (for JdbcBatchItemWriter) INSERT INTO employee (id, name, email, department, salary) VALUES (:id, :name, :email, :department, :salary) ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name, email = EXCLUDED.email, department = EXCLUDED.department, salary = EXCLUDED.salary;
Performance tuning checklist
- Chunk size: start with 100, test 50/200/500 — larger reduces commits but increases rollback cost
- JDBC batch size: ensure your writer batches SQL statements (JdbcBatchItemWriter does by default)
- DB pool: maxPoolSize >= concurrent writer threads
- Monitoring: measure job duration, DB latency, connection usage, GC pauses
Observability: instrumenting with Micrometer
@Autowired MeterRegistry registry;
Timer processTimer = registry.timer("batch.employee.process.time");
public Employee process(Employee e) {
return processTimer.record(() -> {
// validation and enrichment
return e;
});
}
Testing & local debugging
# Trigger job via REST (example)
curl -s "http://localhost:8080/jobs/import?file=/tmp/employees.csv"
# Use JobLauncherTestUtils for integration tests
JobExecution exec = jobLauncherTestUtils.launchJob(new JobParametersBuilder()
.addString("file", "/tmp/test.csv")
.toJobParameters());
assertEquals(BatchStatus.COMPLETED, exec.getStatus());
Troubleshooting common issues
| Symptom | Possible cause | Fix |
|---|---|---|
| Many skipped rows | Validation rules too strict or bad CSV | Log failed rows, relax/adjust validation, provide sample to users |
| Job slow / DB connections exhausted | Too many concurrent writers vs pool size | Increase Hikari maxPoolSize or reduce parallelism |
| Duplicates after restart | Writes not idempotent | Use upsert or unique constraints + dedupe logic |
Real-world pattern: Failed-items dead-letter table
Persist failed rows into an error table with failure reason and job metadata. This allows analysts to inspect and reprocess only problematic items.
CREATE TABLE employee_failed ( id BIGINT, payload CLOB, error_message VARCHAR(1000), job_name VARCHAR(200), failed_at TIMESTAMP );
Sample REST controller to trigger job
@RestController
@RequestMapping("/jobs")
public class JobLauncherController {
@Autowired JobLauncher jobLauncher;
@Autowired Job importJob;
@GetMapping("/import-csv")
public String importCsv(@RequestParam String file) throws Exception {
JobParameters params = new JobParametersBuilder()
.addString("file", file)
.addLong("time", System.currentTimeMillis())
.toJobParameters();
jobLauncher.run(importJob, params);
return "Job launched";
}
}
FAQ
- Is FlatFileItemReader thread-safe? No — use StepScope or partitioning to avoid sharing a reader across threads.
- How many records per second? Depends on DB, network, and chunk size — benchmark! Use Micrometer counters.
- Should I use JdbcBatchItemWriter or JpaItemWriter? JdbcBatchItemWriter is usually faster for bulk inserts; JPA can be used if you need entity lifecycle features.
Conclusion
This pattern (reader → processor → writer) with robust validation, skip & audit, idempotent writes, and sensible tuning will make CSV import jobs reliable and maintainable in production. If you'd like, I can convert this into a runnable GitHub repo (pom, Spring Boot app, sample CSV) so you can clone and run locally.