Spring Batch Hibernate Example

Rajagopal ParthaSarathiMarch 15th, 2018Last Updated: February 13th, 2019

4 910 6 minutes read

This article is a tutorial about Spring Batch with Hibernate. We will use Spring Boot to speed our development process.

1. Introduction

Spring Batch is a lightweight, scale-able and comprehensive batch framework to handle data at massive scale. Spring Batch builds upon the spring framework to provide intuitive and easy configuration for executing batch applications. Spring Batch provides reusable functions essential for processing large volumes of records, including cross-cutting concerns such as logging/tracing, transaction management, job processing statistics, job restart, skip and resource management.

Spring Batch has a layered architecture consisting of three components:

Application – Contains custom code written by developers.
Batch Core – Classes to launch and control batch job.
Batch Infrastructure – Reusable code for common functionalities needed by core and Application.

Let us dive into spring batch with a simple example of reading persons from a CSV file and loading them into embedded HSQL Database. Since we use the embedded database, data will not be persisted across sessions.

2. Technologies Used

Java 1.8.101 (1.8.x will do fine)
Gradle 4.4.1 (4.x will do fine)
IntelliJ Idea (Any Java IDE would work)
Rest will be part of the Gradle configuration.

3. Spring Batch Project

Spring Boot Starters provides more than 30 starters to ease the dependency management for your project. The easiest way to generate a Spring Boot project is via Spring starter tool with the steps below:

Navigate to https://start.spring.io/.
Select Gradle Project with Java and Spring Boot version 2.0.0.
Add Batch, JPA and HSqlDB in the “search for dependencies”.
Enter the group name as com.JCG and artifact as SpringBatchHibernate.
Click the Generate Project button.

A Gradle Project will be generated. If you prefer Maven, use Maven instead of Gradle before generating the project. Import the project into your Java IDE.

3.1 Gradle File

Below we can see the generated build file for our project.

build.gradle

buildscript {
ext {
springBootVersion = '2.0.0.RELEASE'
}
repositories {
mavenCentral()
}
dependencies {
classpath("org.springframework.boot:spring-boot-gradle-plugin:${springBootVersion}")
}
}
apply plugin: 'java'
apply plugin: 'eclipse'
apply plugin: 'idea'
apply plugin: 'org.springframework.boot'
apply plugin: 'io.spring.dependency-management'
group = 'com.JCG'
version = '0.0.1-SNAPSHOT'
sourceCompatibility = 1.8
repositories {
mavenCentral()
}
dependencies {
compile('org.springframework.boot:spring-boot-starter-batch')
compile('org.springframework.boot:spring-boot-starter-data-jpa')
runtime('org.hsqldb:hsqldb')
compile "org.projectlombok:lombok:1.16.8"
testCompile('org.springframework.boot:spring-boot-starter-test')
testCompile('org.springframework.batch:spring-batch-test')
}

Spring Boot Version 2.0 is specified in line 3.
Idea plugin has been applied to support Idea IDE in line 14.
Lines 23-29 declare the dependencies needed for the project with each downloading the latest version from spring.io.
Line 27 declares the Lombok dependency which is used to reduce typing boilerplate code.

3.2 Data file

Create a sample file sample-data.csv.
It consists of two columns – First Name and Last Name.
The file should be in the path src/main/resources.

Sample CSV

FirstName,LastName
Jill,Doe
Joe,Doe
Justin,Doe
Jane,Doe
John,Doe

Line1 indicates the header for the CSV file. It will be ignored by spring batch while reading the file.

3.3 Spring Batch Configuration

Below we will cover the Java configuration for Spring Boot, Batch and Hibernate. We will discuss each part of the configuration below.

Application Class

package  com.JCG;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class Application {

    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }
}

We specify our application as the springboot application in Line 6. It takes care of all the Auto Configuration magic. Spring boot works on the philosophy of convention over configuration. It provides sensible defaults and allows overriding with the appropriate configuration.
Line 10 starts our application with the configuration specified in below section.

Batch Configuration

package com.JCG.config;

import com.JCG.model.Person;
import com.JCG.model.PersonRepository;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.core.*;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.database.JpaItemWriter;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;

import javax.persistence.EntityManagerFactory;

@Configuration
@EnableBatchProcessing
public class BatchConfiguration {
    @Autowired
    public JobBuilderFactory jobBuilderFactory;

    @Autowired
    public StepBuilderFactory stepBuilderFactory;

    @Autowired
    EntityManagerFactory emf;

    @Autowired
    PersonRepository personRepository;

    private static final Logger log = LoggerFactory.getLogger(BatchConfiguration.class);

    @Bean
    public FlatFileItemReader reader() {
        FlatFileItemReader reader = new FlatFileItemReader<>();
        reader.setResource(new ClassPathResource("sample-data.csv"));
        reader.setLinesToSkip(1);

        DefaultLineMapper lineMapper = new DefaultLineMapper<>();
        DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
        tokenizer.setNames("firstName", "lastName");

        BeanWrapperFieldSetMapper fieldSetMapper = new BeanWrapperFieldSetMapper<>();
        fieldSetMapper.setTargetType(Person.class);

        lineMapper.setFieldSetMapper(fieldSetMapper);
        lineMapper.setLineTokenizer(tokenizer);
        reader.setLineMapper(lineMapper);

        return reader;
    }


    @Bean
    public JpaItemWriter writer() {
        JpaItemWriter writer = new JpaItemWriter();
        writer.setEntityManagerFactory(emf);
        return writer;
    }

    @Bean
    public ItemProcessor<Person, Person> processor() {
        return (item) -> {
            item.concatenateName();
            return item;
        };
    }

    @Bean
    public Job importUserJob(JobExecutionListener listener) {
        return jobBuilderFactory.get("importUserJob")
                .incrementer(new RunIdIncrementer())
                .listener(listener)
                .flow(step1())
                .end()
                .build();
    }

    @Bean
    public Step step1() {
        return stepBuilderFactory.get("step1")
                .<Person, Person>chunk(10)
                .reader(reader())
                .processor(processor())
                .writer(writer())
                .build();
    }

    @Bean
    public JobExecutionListener listener() {
        return new JobExecutionListener() {


            @Override
            public void beforeJob(JobExecution jobExecution) {
                /**
                 * As of now empty but can add some before job conditions
                 */
            }

            @Override
            public void afterJob(JobExecution jobExecution) {
                if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
                    log.info("!!! JOB FINISHED! Time to verify the results");
                    personRepository.findAll().
                            forEach(person -> log.info("Found <" + person + "> in the database."));
                }
            }
        };
    }
}

Lines 25 indicates that it is a configuration class and should be picked up by spring boot to wire up the beans and dependencies. Line 26 is used to enable batch support for our application. Spring defines a Job which contains multiple Step to be executed. In our example, we use only a single step for our importUserJob. We use a JobExecutionListener to track the job execution which we will cover below. A Step could be a TaskletStep(contains a single function for execution) or Step which includes a Reader, Processor and Writer. In the above example, We have used Step.

3.3.1 Reader

Lines 42-60 include our reader configuration. We use FlatFileItemReader to read from our CSV file. The advantage of using an inbuilt reader is that it handles application failures gracefully and will support restarts. It can also skip lines during errors with a configurable skip limit.

It needs the following parameters to successfully read the file line by line.

Resource – The application reads from a class path resource as specified in line 45. We skip the header line by specifying setLinesToSkip.
Line Mapper – This is used to map a line read from the file into a representation usable by our application. We use DefaultLineMapper from Spring Infrastructure. This, in turn, uses two classes to map the line to our model Person. It uses a LineTokenizer to split one single line into tokens based on the criteria specified and a FieldSetMapper to map the tokens into a fieldset usable by our application.
- Line Tokenizer – We use DelimitedLineTokenizer to tokenize the lines by splitting with a comma. By default, the comma is used as the tokenizer. We also specify the token names to match the fields of our model class.
- FieldSetMapper – Here we are using BeanWrapperFieldSetMapper to map the data to a bean by its property names. The exact field names are specified in the tokenizer which will be used.
Line Mapper is mapped to the reader in line 57.

Reader reads the items in the chunk(10) which is specified by the chunk config in line 91.

3.3.2 Processor

Spring does not offer an inbuilt processor and is usually left to the custom implementation. Here, We are using a lambda function to transform the incoming Person object. We call the concatenateName function to concatenate the first name and last name. We return the modified item to the writer. Processor does its execution one item at a time.

3.3.3 Writer

Here, we are using JpaItemWriter to write the model object into the database. JPA uses hibernate as the persistence provider to persist the data. The writer just needs the model to be written to the database. It aggregates the items received from the processor and flushes the data.

3.3.4 Listener

JobExecutionListener offers the methods beforeJob to execute before the job starts and afterJob which executes after job has been completed. Generally, these methods are used to collect various job metrics and sometimes initialize constants. Here, we use afterJob to check whether the data got persisted. We use a repository method findAll to fetch all the persons from our database and display it.

3.4 Model/Hibernate configuration

application.properties

spring.jpa.hibernate.ddl-auto=create-drop
spring.jpa.show-sql=true

Here we specified that tables should be created before use and destroyed when the application terminates. Also, we have specified configuration to show SQL ran by hibernate in the console for debugging. Rest of the configuration of wiring Datasource to hibernate and then in turn to JPA EntityManagerfactory is handled by JpaRepositoriesAutoConfiguration and HibernateJpaAutoConfiguration.

Model Class(Person)

package com.JCG.model;

import lombok.*;

import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.Transient;

@Entity
@Getter
@Setter
@NoArgsConstructor
@RequiredArgsConstructor
@ToString(exclude={"firstName","lastName"})
public class Person {

    @Id
    @GeneratedValue
    private int id;

    @Transient
    @NonNull
    private String lastName;

    @Transient
    @NonNull
    private String firstName;

    @NonNull
    private String name;

    public void concatenateName(){
        this.setName(this.firstName+" "+this.lastName);
    }
}

A model class should be annotated with Entity to be utilized by spring container. We have used Lombok annotations to generate getter, setter and Constructor from our fields. Fields firstName and lastName are annotated as Transient to indicate that these fields should not be persisted to the database. There is an id field which is annotated to generate the hibernate sequence while saving to the database.

Repository Class(PersonRepository)

package com.JCG.model;

import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.stereotype.Repository;

@Repository
public interface PersonRepository extends JpaRepository<Person,Integer> {
}

This is just a repository implementation of Spring JPA repository. For a detailed example refer JPA Repository example.

4. Summary

Run the Application class from a Java IDE. Output similar to the below screenshot will be displayed. In this example, we saw a simple way to configure a Spring Batch Project Application.