Batch

Spring Batch Introduction Example

In this post, we feature a comprehensive a Spring Batch Introduction article. Many enterprise applications need bulk processing to perform many business operations. These business operations typically include time-based events or complex business rules across very large data sets. Batch processing is used to handle these workloads efficiently. In this post, We will look at Spring Batch as a solution for these batch processing needs.

1. Spring Batch Introduction

Spring Batch is a lightweight, comprehensive batch framework which builds upon the POJO-based development approach. Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job and resource management. Spring Batch is designed to work in conjunction with various commercial and open-source schedulers such as Quartz, Tivoli, Control-M, etc.

Spring Batch follows a layered architecture with three major components – Application, Batch Core and Batch Infrastructure. Application is the client code written by developers to achieve the intended functionality. The Batch Core contains the core runtime classes necessary to launch and control a batch job while the infrastructure contains common services needed for the Batch core and Application.

Let’s start with a simple batch processing use case in the next section. Before that, We will look at the stack involved in creating the example. We will use Maven for managing the build and dependencies with Java 8 as the programming language. All the dependencies required for the example are listed in maven’s pom.xml given below

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
         https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.1.7.RELEASE</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>com.jcg</groupId>
    <artifactId>springBatch</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>springBatch</name>
    <description>Demo project for Spring Batch</description>

    <properties>
        <java.version>1.8</java.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-batch</artifactId>
        </dependency>
        <dependency>
            <groupId>org.hsqldb</groupId>
            <artifactId>hsqldb</artifactId>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>

</project>
  • This maven configuration indicates Spring Boot Starter Parent as the dependency and the version is specified as 2.1.7. All the other Spring dependencies inherit from the parent.
  • Java Version is specified as 1.8 for the project.
  • Spring Batch is specified as the dependency for the project which is the topic of our example.
  • Now, Spring Batch requires the Job metadata such as start and end to be saved into a persistent store. For this purpose, HSQLDB is specified as a dependency. This is an embedded database which saves the information and gets destroyed as the application exits. Spring Batch auto-creates the required tables for maintaining the job information.

2. Batch Example

A typical Spring Batch Job typically involves a Reader, Writer and optionally a Processor. A Processor is typically involved when we need to apply business rules on the data read. There is alternatively a Tasklet involved which we will delve into the next section.

In this section, We will consume a movie JSON dataset and write it to a CSV file. We will look at the entity structure of Movie which helps to understand the JSON structure.

Movie.java

package com.jcg.springBatch.entity;
import java.util.List;

public class Movie {
    private String title;

    private long year;

    private List cast;

    private List genres;

    public String getTitle() {
        return title;
    }

    public void setYear(long year) {
        this.year = year;
    }

    public void setCast(List cast) {
        this.cast = cast;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public List getGenres() {
        return genres;
    }

    public void setGenres(List genres) {
        this.genres = genres;
    }

}
  • Movie Class has four fields
    • Title – This holds the movie name
    • Year – This is the year in which movie was released
    • Cast – This includes the actors in the movie.
    • Genre – This represents the genre of the movie such as action, Comedy and thriller
  • The movies.json is a public dataset obtained from GitHub

We will create a SpringBoot Application capable of running the Spring Batch Job. Our job is going to read all the movies and output a CSV file containing the movie and its corresponding genres.

Application.java

package com.jcg.springBatch;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class Application {

    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }
}
  • This is a typical SpringBoot application setup where we annotate the class to enable SpringBoot.
  • Spring Boot takes an opinionated view of the Spring platform and third-party libraries. Most Spring Boot applications need very little Spring configuration reducing development time.

In the sections below, We will see various steps involved in configuring the batch job. We are going to break the Java Class BatchConfiguration into various snippets for understanding.

BatchConfiguration.java

@Configuration
@EnableBatchProcessing
public class BatchConfiguration {

    @Autowired
    JobBuilderFactory jobBuilderFactory;

    @Autowired
    StepBuilderFactory stepBuilderFactory;
}
  • The class is annotated with @Configuration to ensure this is a configuration to be processed by the Spring Boot. Previously these were XML files but now Spring Boot favors Java configuration.
  • The other annotation @EnableBatchProcessing indicates that this is a batch project.
  • We have two builders specified
    • JobBuilderFactory – used to build the movie Job. In Spring Batch Job is the top-level abstraction. Job indicates the business functionality which needs to be achieved.
    • StepBuilderFactory – used to build the steps involved in the Job. A job can contain multiple steps with each step fulfilling a particular task. For our Simple Job, we have only one step.

A step is where all the action begins. As indicated in the top of the section, a step contains the three components of ItemReader, ItemProcessor and ItemWriter. Spring provides out of the box reader and writer for various file formats. Considering our JSON dataset, we will look at the JsonItemReader below.

ItemReader

@Bean
    public JsonItemReader jsonItemReader() throws MalformedURLException {
        return new JsonItemReaderBuilder()
                .jsonObjectReader(new JacksonJsonObjectReader(Movie.class))
                .resource(new UrlResource(
"https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json"))
                .name("movieJsonItemReader")
                .build();
    }
  • Spring follows the builder pattern where we provide various pieces of input required to build the entire object.
  • We load the JSON data from the URL by specifying a URLResource as input.
  • We also specify the Movie entity to be the type to which data has to be transformed.
  • Rest of the configurations are just providing a suitable name for the class.

Once the reader reads the data, data is available to be consumed by the further components in the step. In our Step, We have a custom processor which processes the data from the Reader.

ItemProcessor

    @Bean
    public ItemProcessor movieListItemProcessor() {
        return movie -> new MovieGenre(movie.getTitle(), movie.getGenres().toString());
    }
  • The processor is written as an inline lambda
  • It takes in each movie and converts it to another entity MovieGenre which has two fields
    • Title – Movie Name
    • Genre – Genres comma separated instead of a List
  • MovieGenre class is listed below which is self-explanatory

MovieGenre.java

package com.jcg.springBatch.entity;

public class MovieGenre {

    private String genre;

    public String getGenre() {
        return genre;
    }

    public String getTitle() {
        return title;
    }

    private String title;

    public MovieGenre(String title, String genre) {
        this.genre = genre;
        this.title = title;
    }
}

Now we come to the final component in the step – ItemWriter.

ItemWriter

    @Bean
    public FlatFileItemWriter movieGenreWriter() {
        return new FlatFileItemWriterBuilder()
                .name("movieGenreWriter")
                .resource(new FileSystemResource("out/movies.csv"))
                .delimited()
                .delimiter(",")
                .names(new String[]{"title", "genre"})
                .build();
    }
  • We use FlatFileItemWriter to write the output to a CSV file which is specified as the resource.
  • We specify the delimiter to be used within a line – can be space or any other character. Since it is a CSV, a comma is specified as the delimiter.
  • The column names to be consumed from the entity are specified to the names argument.

All of these components are Bean definitions specified in the configuration class. Now, a Step definition is the one which glues together all of these components.

MovieStep

    
    @Bean
    public Step movieStep() throws MalformedURLException {
        return stepBuilderFactory
                .get("movieStep")
                .<Movie, MovieGenre>chunk(10)
                .reader(jsonItemReader())
                .processor(movieListItemProcessor())
                .writer(movieGenreWriter())
                .build();
    }
  • Spring Batch processes the records(items) in chunks. We specify chunk size as 10 which indicates ItemReader to read 10 records at a time.
  • The input (Reader Datatype) and output(Writer Datatype) type are specified explicitly in the step.
  • These are then fed to the processor one by one but the output from the processor is aggregated and sent to the Writer with the specified chunk size.

The final component is the MovieJob which is explained below

MovieJob

   @Bean
    public Job movieJob(Step movieStep) {
        return jobBuilderFactory.get("movieJob")
                .incrementer(new RunIdIncrementer())
                .flow(movieStep)
                .end()
                .build();
    }
  • A Spring Batch Job can run multiple times. Hence to differentiate each run of the job, Spring provides a RunIdIncrementer which increments the run id every time the job is run.
  • Flow is analogous to a Step and the movieStep is provided here. But there are other execution flows which can also be provided.

Now to execute the job, run the class Application and CSV file similar to the one below is generated.

movies.csv

After Dark in Central Park,[]
Boarding School Girls' Pajama Parade,[]
Buffalo Bill's Wild West Parad,[]
Caught,[]
Clowns Spinning Hats,[]
Capture of Boer Battery by British,[Short, Documentary]
The Enchanted Drawing,[]
Feeding Sea Lions,[]
....

But this does not give information about the records in the file. To specify column headings, FlatFileItemWriter has a header callback which can be specified as .headerCallback(writer -> writer.write("Movie Title,Movie Genres")). This writes the header of the file even before any of the other records are written.

2.1 Listener

In the previous section, We saw the batch processing capability of Spring. But after the job completes, We did not get any statistics about the Job or step. Spring provides a listener interface using which we can listen during the lifecycle of the job. We will see the example of a StepExecutionListener which will be executed before and after the step.

Listener

@Bean
    public StepExecutionListener movieStepListener() {
        return new StepExecutionListener() {

            @Override
            public void beforeStep(StepExecution stepExecution) {
                stepExecution.getExecutionContext().put("start",
new Date().getTime());
                System.out.println("Step name:" + stepExecution.getStepName() 
+ " Started");
            }

            @Override
            public ExitStatus afterStep(StepExecution stepExecution) {
                long elapsed = new Date().getTime() 
 - stepExecution.getExecutionContext().getLong("start");
                System.out.println("Step name:" + stepExecution.getStepName() 
+ " Ended. Running time is "+ elapsed +" milliseconds.");
                System.out.println("Read Count:" + stepExecution.getReadCount() +
                        " Write Count:" + stepExecution.getWriteCount());
                return ExitStatus.COMPLETED;
            }
        };
    }
  • In the beforeStep method, We obtain the step name and log to the console.
  • We store the start time in Step’s ExecutionContext which is similar to a map containing a string key and can take any object as the value.
  • In the afterStep method, we log the running time using the start time stored in ExecutionContext.
  • We log the read record count and write record count for the step which is the original intention of adding the listener.

We have just defined the listener but have not associated the listener to the created step. We will see how we can associate the listener to the moviestep.

Listener to Step

@Bean
    public Step movieStep() throws MalformedURLException {
        return stepBuilderFactory
                .get("movieStep")
                .listener(movieStepListener())
                .chunk(10)
                .reader(jsonItemReader())
                .processor(movieListItemProcessor())
                .writer(movieGenreWriter())
                .build();
    }

This is just one listener. We also have other listeners similar to it. For Example, there is another listener – JobExecutionListener which executes before and after the job. It has its own ExecutionContext for storing the job-related information. Running the job produces the following output.

Logs

2019-08-31 15:11:06.163  INFO 24381 --- [           main] o.s.b.a.b.JobLauncherCommandLineRunner   : Running default command line with: []
2019-08-31 15:11:06.214  INFO 24381 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=movieJob]] launched with the following parameters: [{run.id=1}]
2019-08-31 15:11:06.226  INFO 24381 --- [           main] o.s.batch.core.job.SimpleStepHandler     : Executing step: [movieStep]
Step name:movieStep Started
Step name:movieStep Ended. Running time is 3340 milliseconds.
Read Count:28795 Write Count:28795
2019-08-31 15:11:09.572  INFO 24381 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=movieJob]] completed with the following parameters: [{run.id=1}] and the following status: [COMPLETED]
2019-08-31 15:11:09.575  INFO 24381 --- [       Thread-5] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown initiated...
2019-08-31 15:11:09.577  INFO 24381 --- [       Thread-5] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown completed.

3. Tasklet

In this section, We will see another form of Spring Batch step – Tasklet Step. This comes in handy when the flow does not fit the pattern of Reader, Writer and processor. This is a single step executing with the same safety guarantees of restartability and fault tolerance.

ListStep

@Bean
    public Step listStep() {
        return stepBuilderFactory.get("listStep")
.tasklet((stepContribution, chunkContext) -> {
            Resource directory = new FileSystemResource("out");
            System.out.println(directory.getFile() 
+ " directory is available");
            for (File file : directory.getFile().listFiles()) {
                System.out.println(file.getName() 
+ " is available");
            }
            return RepeatStatus.FINISHED;
        }).build();
    }
  • A simple TaskletStep named listStep is created.
  • It has two parameters – StepContribution and ChunkContext
    • StepContribution is much similar to the StepExecutionContext providing context for the step.
    • ChunkContext is similar to StepContribution but it provides context around the chunk being processed.
  • The current step looks at the output directory and lists all the files inside the directory.

Job Definition

  
@Bean
    public Job movieJob(Step movieStep, Step listStep) {
        return jobBuilderFactory.get("movieJob")
                .incrementer(new RunIdIncrementer())
                .flow(movieStep)
                .next(listStep)
                .end()
                .build();
    }

We wire the listStep to the movieJob in the above code snippet to chain the sequence of steps. This verifies the creation of output CSV file in the out directory.

Logs

...
2019-08-31 15:12:07.472  INFO 24390 --- [           main] o.s.batch.core.job.SimpleStepHandler     : Executing step: [listStep]
out directory is available
movies.csv is available
2019-08-31 15:12:07.473  INFO 24390 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=movieJob]] completed with the following parameters: [{run.id=1}] and the following status: [COMPLETED]
2019-08-31 15:12:07.476  INFO 24390 --- [       Thread-5] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown initiated...
2019-08-31 15:12:07.478  INFO 24390 --- [       Thread-5] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown completed.

4. Download the Source Code

Download
You can download the full source code of this example here: Spring Batch Introduction Example

Rajagopal ParthaSarathi

Rajagopal works in software industry solving enterprise-scale problems for customers across geographies specializing in distributed platforms. He holds a masters in computer science with focus on cloud computing from Illinois Institute of Technology. His current interests include data science and distributed computing.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button