regex

Parse an Apache logfile with regular expressions

In this example we shall show you how to parse an Apache logfile with regular expressions. To parse an Apache logfile with regular expressions we have followed the steps below:

  • We have created an interface with a static final int that is the number of fields to be found and a static final String that is the log entry to be parsed.
  • We have also created an implementation of the interface, that creates a StringTokenizer with the String logEntryLine and uses countTokens() API method of StringTokenizer to calculate the number of times that this tokenizer’s nextToken() method can be called before it generates an exception.
  • Then it uses nextToken() API method of StringTokenizer to return the next token, and nextToken(String delim) API method of StringTokenizer to get the next token using specified delimiters, according to the log entry delimiters,

as described in the code snippet below.

package com.javacodegeeks.snippets.core;

import java.util.StringTokenizer;

/**
 * Parse an Apache log file with StringTokenizer
 */
public class Apache implements LogExample {

    public static void main(String argv[]) {


  StringTokenizer matcher = new StringTokenizer(logEntryLine);


  System.out.println("tokens = " + matcher.countTokens());

  // StringTokenizer CAN NOT count if you are changing the delimiter!

  // if (matcher.countTokens() != NUM_FIELDS) {

  //   System.err.println("Bad log entry (or bug in StringTokenizer?):");

  //   System.err.println(logEntryLine);

  // }


  System.out.println("Hostname: " + matcher.nextToken());

  // StringTokenizer makes you ask for tokens in order to skip them:

  matcher.nextToken(); // eat the "-"

  matcher.nextToken(); // again

  System.out.println("Date/Time: " + matcher.nextToken("]"));

  //matcher.nextToken(" "); // again

  System.out.println("Request: " + matcher.nextToken("""));

  matcher.nextToken(" "); // again

  System.out.println("Response: " + matcher.nextToken());

  System.out.println("ByteCount: " + matcher.nextToken());

  System.out.println("Referer: " + matcher.nextToken("""));

  matcher.nextToken(" "); // again

  System.out.println("User-Agent: " + matcher.nextToken("""));
    }
}
/**
 * Common fields for Apache Log demo.
 */
interface LogExample {

    /**
     * The number of fields that must be found.
     */
    public static final int NUM_FIELDS = 9;
    /**
     * The sample log entry to be parsed.
     */
    public static final String logEntryLine = "123.45.67.89 - - [27/Oct/2000:09:27:09 -0400] "GET /java/javaResources.html HTTP/1.0" 200 10450 "-" "Mozilla/4.6 [en] (X11; U; OpenBSD 2.8 i386; Nav)"";
}

Output:

tokens = 19
Hostname: 123.45.67.89
Date/Time:  [27/Oct/2000:09:27:09 -0400
Request: ] 
Response: /java/javaResources.html
ByteCount: HTTP/1.0"
Referer:  200 10450 
User-Agent:  

 
This was an example of how to parse an Apache logfile with regular expressions in Java.

Byron Kiourtzoglou

Byron is a master software engineer working in the IT and Telecom domains. He is an applications developer in a wide variety of applications/services. He is currently acting as the team leader and technical architect for a proprietary service creation and integration platform for both the IT and Telecom industries in addition to a in-house big data real-time analytics solution. He is always fascinated by SOA, middleware services and mobile development. Byron is co-founder and Executive Editor at Java Code Geeks.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button