Spring Batch — Import CSV to Database (Complete Guide)

Spring Batch — Import CSV to Database (Complete Guide)

Importing CSV files into a database is one of the most common batch tasks. This guide shows a practical, production-ready approach using Spring Batch: how to read CSVs, validate records, skip faulty rows, write to DB efficiently, and design for restarts and observability.

Use FlatFileItemReader + ItemProcessor for validation + JdbcBatchItemWriter or JpaItemWriter for writes. Add faultTolerant() with skip or SkipPolicy, tune chunk size, and ensure DB connection pool >= concurrent writers.

Why this guide is different

  • Focus on real-world pitfalls (validation, duplicates, partial failures)
  • Production advice: retries, skip listeners, idempotency
  • Configuration and tuning: chunk size, DB pool, throughput
  • Complete runnable snippets (drop into Spring Boot)

Maven dependencies

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<dependency>
  <groupId>com.h2database</groupId>
  <artifactId>h2</artifactId>
  <scope>runtime</scope>
</dependency>

application.properties (example)

spring.datasource.url=jdbc:h2:mem:testdb
spring.datasource.username=sa
spring.datasource.password=
spring.h2.console.enabled=true

# Spring Batch
spring.batch.job.enabled=false

# Hikari tuning (example)
spring.datasource.hikari.maximum-pool-size=30

Sample CSV (employees.csv)

id,name,email,department,salary
1,John Doe,john@example.com,Engineering,72000
2,Jane Smith,jane@example.com,HR,50000
3,Broken,invalid-email,Sales,45000
4,Sam Blue,sam@example.com,Engineering,80000

Entity (POJO)

public class Employee {
  private Long id;
  private String name;
  private String email;
  private String department;
  private BigDecimal salary;
  // getters & setters
}

FlatFileItemReader — robust setup

Use FlatFileItemReader with a LineMapper (tokenizer + field set mapper). Add @StepScope to inject job parameters like file path.

@Bean
@StepScope
public FlatFileItemReader<Employee> reader(@Value("#{jobParameters['file']}") String file) {
  return new FlatFileItemReaderBuilder<Employee>()
    .name("employeeReader")
    .resource(new FileSystemResource(file))
    .linesToSkip(1)
    .delimited()
    .names("id","name","email","department","salary")
    .fieldSetMapper(fieldSet -> {
      Employee e = new Employee();
      e.setId(fieldSet.readLong("id"));
      e.setName(fieldSet.readString("name"));
      e.setEmail(fieldSet.readString("email"));
      e.setDepartment(fieldSet.readString("department"));
      e.setSalary(fieldSet.readBigDecimal("salary"));
      return e;
    })
    .build();
}

Validation in ItemProcessor

Validate business rules in the processor. Throw an exception for invalid data so SkipPolicy or skip configuration can handle it.

public class EmployeeProcessor implements ItemProcessor<Employee, Employee> {

  @Override
  public Employee process(Employee emp) throws Exception {
    if (emp.getEmail() == null || !emp.getEmail().contains("@")) {
      throw new IllegalArgumentException("Invalid email: " + emp.getEmail());
    }
    if (emp.getSalary() == null || emp.getSalary().doubleValue() < 0) {
      throw new IllegalArgumentException("Invalid salary");
    }
    // trim / normalize
    emp.setName(emp.getName().trim());
    return emp;
  }
}

JdbcBatchItemWriter — efficient batch writes

JdbcBatchItemWriter is fast for bulk inserts. Use parameterized SQL and map fields from the item.

@Bean
public JdbcBatchItemWriter<Employee> writer(DataSource ds) {
  return new JdbcBatchItemWriterBuilder<Employee>()
    .dataSource(ds)
    .sql("INSERT INTO employee (id, name, email, department, salary) VALUES (:id, :name, :email, :department, :salary)")
    .beanMapped()
    .build();
}

Job configuration — chunk + fault tolerance

Use chunk-oriented step, tune chunk size (commit interval), and enable fault tolerance with skips. Also attach a SkipListener to log & audit skipped rows.

@Bean
public Job importJob(JobRepository jobRepo, Step importStep) {
  return new JobBuilder("importJob", jobRepo)
    .start(importStep)
    .build();
}

@Bean
public Step importStep(JobRepository jobRepo, PlatformTransactionManager txManager,
                       FlatFileItemReader<Employee> reader,
                       EmployeeProcessor processor,
                       JdbcBatchItemWriter<Employee> writer) {

  return new StepBuilder("importStep", jobRepo)
    .<Employee, Employee>chunk(100, txManager)
    .reader(reader)
    .processor(processor)
    .writer(writer)
    .faultTolerant()
      .skip(IllegalArgumentException.class)
      .skipLimit(50)
      .listener(new LoggingSkipListener())
    .build();
}

SkipListener — audit skipped records

public class LoggingSkipListener implements SkipListener<Employee, Employee> {
  @Override
  public void onSkipInRead(Throwable t) { /* log read errors */ }
  @Override
  public void onSkipInWrite(Employee item, Throwable t) { /* log write errors */ }
  @Override
  public void onSkipInProcess(Employee item, Throwable t) {
    // persist to error table or write to failed CSV for reprocessing
    System.err.println("Skipped: " + item + " because " + t.getMessage());
  }
}

Transactional behavior & restartability

  • Chunk commits happen per transaction. If a write fails, that chunk rolls back.
  • To support job restarts, rely on Spring Batch metadata (JobRepository / execution context).
  • Avoid in-processor state that cannot be reconstructed on restart (use StepExecutionContext if needed).

Idempotency: make writes safe

If a write could be attempted multiple times (retries, restarts), ensure idempotency. Use DB upserts or a unique constraint + insert-if-not-exists logic.

-- Example Postgres upsert (for JdbcBatchItemWriter)
INSERT INTO employee (id, name, email, department, salary)
VALUES (:id, :name, :email, :department, :salary)
ON CONFLICT (id) DO UPDATE SET
  name = EXCLUDED.name,
  email = EXCLUDED.email,
  department = EXCLUDED.department,
  salary = EXCLUDED.salary;

Performance tuning checklist

  • Chunk size: start with 100, test 50/200/500 — larger reduces commits but increases rollback cost
  • JDBC batch size: ensure your writer batches SQL statements (JdbcBatchItemWriter does by default)
  • DB pool: maxPoolSize >= concurrent writer threads
  • Monitoring: measure job duration, DB latency, connection usage, GC pauses

Observability: instrumenting with Micrometer

@Autowired MeterRegistry registry;
Timer processTimer = registry.timer("batch.employee.process.time");

public Employee process(Employee e) {
  return processTimer.record(() -> {
    // validation and enrichment
    return e;
  });
}

Testing & local debugging

# Trigger job via REST (example)
curl -s "http://localhost:8080/jobs/import?file=/tmp/employees.csv"

# Use JobLauncherTestUtils for integration tests
JobExecution exec = jobLauncherTestUtils.launchJob(new JobParametersBuilder()
  .addString("file", "/tmp/test.csv")
  .toJobParameters());
assertEquals(BatchStatus.COMPLETED, exec.getStatus());

Troubleshooting common issues

SymptomPossible causeFix
Many skipped rows Validation rules too strict or bad CSV Log failed rows, relax/adjust validation, provide sample to users
Job slow / DB connections exhausted Too many concurrent writers vs pool size Increase Hikari maxPoolSize or reduce parallelism
Duplicates after restart Writes not idempotent Use upsert or unique constraints + dedupe logic

Real-world pattern: Failed-items dead-letter table

Persist failed rows into an error table with failure reason and job metadata. This allows analysts to inspect and reprocess only problematic items.

CREATE TABLE employee_failed (
  id BIGINT,
  payload CLOB,
  error_message VARCHAR(1000),
  job_name VARCHAR(200),
  failed_at TIMESTAMP
);

Sample REST controller to trigger job

@RestController
@RequestMapping("/jobs")
public class JobLauncherController {

  @Autowired JobLauncher jobLauncher;
  @Autowired Job importJob;

  @GetMapping("/import-csv")
  public String importCsv(@RequestParam String file) throws Exception {
    JobParameters params = new JobParametersBuilder()
      .addString("file", file)
      .addLong("time", System.currentTimeMillis())
      .toJobParameters();
    jobLauncher.run(importJob, params);
    return "Job launched";
  }
}

FAQ

  • Is FlatFileItemReader thread-safe? No — use StepScope or partitioning to avoid sharing a reader across threads.
  • How many records per second? Depends on DB, network, and chunk size — benchmark! Use Micrometer counters.
  • Should I use JdbcBatchItemWriter or JpaItemWriter? JdbcBatchItemWriter is usually faster for bulk inserts; JPA can be used if you need entity lifecycle features.

Conclusion

This pattern (reader → processor → writer) with robust validation, skip & audit, idempotent writes, and sensible tuning will make CSV import jobs reliable and maintainable in production. If you'd like, I can convert this into a runnable GitHub repo (pom, Spring Boot app, sample CSV) so you can clone and run locally.