Spring Batch ETL Job Example
Through this article, we are going to demonstrate an ETL use case leveraging the advantages of Spring Batch, A typical batch program generally reads a large number of records from a database, file, or queue, processes the data in some fashion, and then writes back data in a modified form.
The main advantage of batch applications is that they do not require any manual intervention. As a result, they can be scheduled to run at times when resources aren’t being utilized.
As an example, We will look at an ETL tool which runs in batch mode to calculate the financial stock market prices (Open, Low, High, Close). Huge financial stock market trades logs need to be parsed on a daily basis to fetch the required useful information. The input files are extracted and processed to obtain the required information, and the output data gets loaded to a CSV files. This whole process is carried out in batch mode.
1. Project Environment
- Spring Boot 1.3.3.RELEASE
- Apache Maven 3.0.5
- JDK 1.8
- Eclipse 4.4 (Luna)
2. Project Structure
3. Dependencies
We have the following dependencies inside our below POM file.
pom.xml:
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.springframework</groupId> <artifactId>springbatch-example-code</artifactId> <version>0.1.0</version> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>1.3.3.RELEASE</version> </parent> <properties> <java.version>1.8</java.version> </properties> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-batch</artifactId> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> </plugin> </plugins> </build> </project>
4. Reader
FxMarketEventReader
is an ItemReader
that reads lines from input CSV file trades.csv which defined by setResource(Resource)
, then skip the file header at the start of a file using setLinesToSkip(int linesToSkip)
, after that map each line to an item FxMarketEvent
using setLineMapper(LineMapper<T> lineMapper).
FxMarketEventReader.java:
package com.fx.batch.reader; import org.springframework.batch.item.file.FlatFileItemReader; import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper; import org.springframework.batch.item.file.mapping.DefaultLineMapper; import org.springframework.batch.item.file.transform.DelimitedLineTokenizer; import org.springframework.core.io.ClassPathResource; import com.fx.batch.model.FxMarketEvent; /** * The Class FxMarketEventReader. * * @author ashraf */ public class FxMarketEventReader extends FlatFileItemReader<FxMarketEvent> { public FxMarketEventReader() { //Set input file this.setResource(new ClassPathResource("trades.csv")); //Skip the file header line this.setLinesToSkip(1); //Line is mapped to item (FxMarketEvent) using setLineMapper(LineMapper) this.setLineMapper(new DefaultLineMapper<FxMarketEvent>() { { setLineTokenizer(new DelimitedLineTokenizer() { { setNames(new String[] { "stock", "time", "price", "shares" }); } }); setFieldSetMapper(new BeanWrapperFieldSetMapper<FxMarketEvent>() { { setTargetType(FxMarketEvent.class); } }); } }); } }
trades.csv:
stock,time,price,shares JHX,09:30:00.00,57,95 JNJ,09:30:00.00,91.14,548 OPK,09:30:00.00,8.3,300 OPK,09:30:00.00,8.3,63 OMC,09:30:00.00,74.53,100 OMC,09:30:00.00,74.53,24 TWTR,09:30:00.00,64.89,100 TWTR,09:30:00.00,64.89,25 TWTR,09:30:00.00,64.89,245 TWTR,09:30:00.00,64.89,55 USB,09:30:00.00,39.71,400 USB,09:30:00.00,39.71,359 USB,09:30:00.00,39.71,41 USB,09:30:00.00,39.71,259 USB,09:30:00.00,39.71,100 VALE,09:30:00.00,14.88,900 VALE,09:30:00.00,14.88,1000 VALE,09:30:00.00,14.88,100 VALE,09:30:00.00,14.88,1000 VALE,09:30:00.00,14.88,260 VALE,09:30:00.00,14.88,100 BSBR,09:30:00.00,5.87,1100 BSBR,09:30:00.00,5.87,800 BRK.B,09:30:00.00,118.35,422
5. Processor
FxMarketEventProcessor
is an ItemProcessor
, takes FxMarketEvent
as an input and converts it to Trade
as an output. Although, It’s possible to return the same or different type than the one provided, returning null indicates that the item should not be continued to be processed.
FxMarketEventProcessor.java:
package com.fx.batch.processor; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.batch.item.ItemProcessor; import com.fx.batch.model.FxMarketEvent; import com.fx.batch.model.Trade; /** * The Class FxMarketEventProcessor. * * @author ashraf */ public class FxMarketEventProcessor implements ItemProcessor<FxMarketEvent, Trade> { private static final Logger log = LoggerFactory.getLogger(FxMarketEventProcessor.class); @Override public Trade process(final FxMarketEvent fxMarketEvent) throws Exception { final String stock = fxMarketEvent.getStock(); final String time = fxMarketEvent.getTime(); final double price = Double.valueOf(fxMarketEvent.getPrice()); final long shares = Long.valueOf(fxMarketEvent.getShares()); final Trade trade = new Trade(stock, time, price, shares); log.trace("Converting (" + fxMarketEvent + ") into (" + trade + ")"); return trade; } }
6. Writer
StockPriceAggregator
is an ItemWriter
that aggregates the trading day stocks prices to calculates the Open
, Low
, High
and Close
for each stock, then update the FxMarketPricesStore
.
StockPriceAggregator.java:
package com.fx.batch.writer; import java.util.List; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.batch.item.ItemWriter; import org.springframework.beans.factory.annotation.Autowired; import com.fx.batch.model.FxMarketPricesStore; import com.fx.batch.model.StockPriceDetails; import com.fx.batch.model.Trade; /** * The Class StockPriceAggregator. * * @author ashraf */ public class StockPriceAggregator implements ItemWriter<Trade> { @Autowired private FxMarketPricesStore fxMarketPricesStore; private static final Logger log = LoggerFactory.getLogger(StockPriceAggregator.class); @Override public void write(List<? extends Trade> trades) throws Exception { trades.forEach(t -> { if (fxMarketPricesStore.containsKey(t.getStock())) { double tradePrice = t.getPrice(); StockPriceDetails priceDetails = fxMarketPricesStore.get(t.getStock()); // Set highest price if (tradePrice > priceDetails.getHigh()) { priceDetails.setHigh(tradePrice); } // Set lowest price if (tradePrice < priceDetails.getLow()) { priceDetails.setLow(tradePrice); } // Set close price priceDetails.setClose(tradePrice); } else { log.trace("Adding new stock {}", t.getStock()); fxMarketPricesStore.put(t.getStock(), new StockPriceDetails(t.getStock(), t.getPrice(), t.getPrice(), t.getPrice(), t.getPrice())); } }); } }
7. Listener
JobCompletionNotificationListener
is a JobExecutionListener
that provides a callback function afterJob(JobExecution jobExecution)
to load the stocks prices into CSV file prices.csv
after the ETL Job
completion.
JobCompletionNotificationListener.java:
package com.fx.batch.listener; import java.io.BufferedWriter; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.batch.core.BatchStatus; import org.springframework.batch.core.JobExecution; import org.springframework.batch.core.listener.JobExecutionListenerSupport; import org.springframework.beans.factory.annotation.Autowired; import com.fx.batch.model.FxMarketPricesStore; import com.fx.batch.model.StockPriceDetails; /** * The Class JobCompletionNotificationListener * * @author ashraf */ public class JobCompletionNotificationListener extends JobExecutionListenerSupport { private static final Logger log = LoggerFactory.getLogger(JobCompletionNotificationListener.class); private static final String HEADER = "stock,open,close,low,high"; private static final String LINE_DILM = ","; @Autowired private FxMarketPricesStore fxMarketPricesStore; @Override public void afterJob(JobExecution jobExecution) { if (jobExecution.getStatus() == BatchStatus.COMPLETED) { log.trace("Loading the results into file"); Path path = Paths.get("prices.csv"); try (BufferedWriter fileWriter = Files.newBufferedWriter(path)) { fileWriter.write(HEADER); fileWriter.newLine(); for (StockPriceDetails pd : fxMarketPricesStore.values()) { fileWriter.write(new StringBuilder().append(pd.getStock()) .append(LINE_DILM).append(pd.getOpen()) .append(LINE_DILM).append(pd.getClose()) .append(LINE_DILM).append(pd.getLow()) .append(LINE_DILM).append(pd.getHigh()).toString()); fileWriter.newLine(); } } catch (Exception e) { log.error("Fetal error: error occurred while writing {} file", path.getFileName()); } } } }
8. Configuring and Running a Job
8.1. Job Java Configuration
Batch application java based configuration has two main components the @EnableBatchConfiguration
annotation and two builders (JobBuilderFactory
, StepBuilderFactory
).
The @EnableBatchConfiguration
provides a base configuration for building batch jobs. Within this base configuration, an instance of StepScope
and JobScope
so your beans inside steps can have @Scope("step")
and @Scope("job")
respectively. Also, there is a number of beans made available to be autowired:
JobRepository
JobLauncher
JobRegistry
PlatformTransactionManager
JobBuilderFactory
StepBuilderFactory
BatchConfiguration.java:
package com.fx.batch; import org.springframework.batch.core.Job; import org.springframework.batch.core.JobExecutionListener; import org.springframework.batch.core.Step; import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing; import org.springframework.batch.core.configuration.annotation.JobBuilderFactory; import org.springframework.batch.core.configuration.annotation.StepBuilderFactory; import org.springframework.batch.core.launch.support.RunIdIncrementer; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import com.fx.batch.listener.JobCompletionNotificationListener; import com.fx.batch.model.FxMarketEvent; import com.fx.batch.model.FxMarketPricesStore; import com.fx.batch.model.Trade; import com.fx.batch.processor.FxMarketEventProcessor; import com.fx.batch.reader.FxMarketEventReader; import com.fx.batch.writer.StockPriceAggregator; /** * The Class BatchConfiguration. * * @author ashraf */ @Configuration @EnableBatchProcessing public class BatchConfiguration { @Autowired public JobBuilderFactory jobBuilderFactory; @Autowired public StepBuilderFactory stepBuilderFactory; @Bean public FxMarketPricesStore fxMarketPricesStore() { return new FxMarketPricesStore(); } // FxMarketEventReader (Reader) @Bean public FxMarketEventReader fxMarketEventReader() { return new FxMarketEventReader(); } // FxMarketEventProcessor (Processor) @Bean public FxMarketEventProcessor fxMarketEventProcessor() { return new FxMarketEventProcessor(); } // StockPriceAggregator (Writer) @Bean public StockPriceAggregator stockPriceAggregator() { return new StockPriceAggregator(); } // JobCompletionNotificationListener (File loader) @Bean public JobExecutionListener listener() { return new JobCompletionNotificationListener(); } // Configure job step @Bean public Job fxMarketPricesETLJob() { return jobBuilderFactory.get("FxMarket Prices ETL Job").incrementer(new RunIdIncrementer()).listener(listener()) .flow(etlStep()).end().build(); } @Bean public Step etlStep() { return stepBuilderFactory.get("Extract -> Transform -> Aggregate -> Load").<FxMarketEvent, Trade> chunk(10000) .reader(fxMarketEventReader()).processor(fxMarketEventProcessor()) .writer(stockPriceAggregator()) .build(); } }
8.2. Running a Job
Launching a batch job requires two things: the Job
to be launched and a JobLauncher
. For example, if launching a job from the command line, a new JVM will be instantiated for each Job, and thus every job will have its own JobLauncher
.
prices.csv:
stock,open,close,low,high CVCO,68.4,68.7,68.4,68.7 SCS,15.77,15.58,15.58,15.836 SCU,26.11,26.11,26.11,26.11 BBD,12.21,12.18,12.1599,12.26 BBG,26.72,26.17,26.07,26.98 BBF,12.46,12.39,12.39,12.46 BBH,87.97,88.19,87.81,88.76 SCON,2.15,2.15,2.15,2.15 SCX,14.57,14.57,14.57,14.57 BBK,13.78,13.76,13.76,13.78 SCOK,1.16,1.16,1.16,1.16 SCZ,50.6,50.54,50.5,50.84 STPZ,52.88,52.9,52.84,52.9 JIVE,11.16,11.2,11.16,11.24 BBL,61.35,61.27,61.25,61.37 BBN,19.06,19.0503,19.05,19.06 SDD,12.14,12.14,12.14,12.14 TWTC,30.58,30.32,30.29,30.58 BBT,37.11,36.96,36.91,37.18 SCOR,28.47,28.445,28.21,28.79 CEAI,0.298,0.298,0.298,0.298 BBW,7.59,7.59,7.59,7.59 BBY,39.75,40.24,39.61,40.3 BBX,15.62,15.6,15.6,15.62 FNLC,17.12,17.49,17.12,17.49
9. Download the Source Code
This was an example to show how to create an ETL Spring Batch Job.
You can download the full source code of this example here: SpringBatchExampleCode.zip