regex

Word frequency count example

With this example we are going to demonstrate how to count the frequency of words in a file. In short, to count the frequency of words in a file you should:

  • Create a new FileInputStream with a given String path by opening a connection to a file.
  • Get the FileChannel object associated with the FileInputStream, with getChannel() API method of FileInputStream.
  • Get the current size of this channel’s file, using size() API method of FileChannel.
  • Create a MappedByteBuffer, using map(MapMode mode, long position, long size) API method of FileChannel that maps a region of this channel’s file directly into memory.
  • Convert the byte buffer to character buffer. Create a new Charset for a specified charset name, using forName(String charsetName) API method of Charset and then a new CharsetDecoder, using newDecoder() API method of Charset. Then use decode(ByteBuffer in) API method of CharBuffer to decode the remaining content of a single input byte buffer into a newly-allocated character buffer.
  • Create a new word pattern and a new line pattern, by compiling given String regular expressions to a Pattern, using compile(string regex) API method of Pattern.
  • Match the line pattern to the buffer, using matcher(CharSequence input) API method of Pattern.
  • For each line get the line and the array of words in the line, using find() and group() API methods of Matcher, for the matcher created for the line pattern.
  • Then for each word get the word and add it in a TreeMap.

Let’s take a look at the code snippet that follows:

package com.javacodegeeks.snippets.core;
import java.io.FileInputStream;
import java.nio.CharBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.util.Map;
import java.util.TreeMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class WordFreq {

    public static void main(String args[]) throws Exception {

  String filePath = "C:/Users/nikos7/Desktop/file.odt";


  // Map File from filename to byte buffer

  FileInputStream in = new FileInputStream(filePath);

  FileChannel filech = in.getChannel();

  int fileLen = (int) filech.size();

  MappedByteBuffer buf = filech.map(FileChannel.MapMode.READ_ONLY, 0,


    fileLen);


  // Convert to character buffer

  Charset chars = Charset.forName("ISO-8859-1");

  CharsetDecoder dec = chars.newDecoder();

  CharBuffer charBuf = dec.decode(buf);


  // Create line pattern

  Pattern linePatt = Pattern.compile(".*$", Pattern.MULTILINE);


  // Create word pattern

  Pattern wordBrkPatt = Pattern.compile("[\\p{Punct}\s}]");


  // Match line pattern to buffer

  Matcher lineM = linePatt.matcher(charBuf);


  Map m = new TreeMap();

  Integer one = new Integer(1);


  // For each line

  while (lineM.find()) {


// Get line


CharSequence lineSeq = lineM.group();



// Get array of words on line


String words[] = wordBrkPatt.split(lineSeq);



// For each word


for (int i = 0, n = words.length; i < n; i++) {


    if (words[i].length() > 0) {



  Integer frequency = (Integer) m.get(words[i]);



  if (frequency == null) {




frequency = one;



  } else {




int value = frequency.intValue();




frequency = new Integer(value + 1);



  }



  m.put(words[i], frequency);


    }


}

  }

  System.out.println(m);
    }
}

Output:

WordPress=2, Working=1, Your=3, You’ll=1, a=136, able=1, about=8, above=2, absolutely=1, absurd=1, accept=.....

 
This was an example of how to count the frequency of words in a file in Java.

Want to know how to develop your skillset to become a Java Rockstar?

Join our newsletter to start rocking!

To get you started we give you our best selling eBooks for FREE!

 

1. JPA Mini Book

2. JVM Troubleshooting Guide

3. JUnit Tutorial for Unit Testing

4. Java Annotations Tutorial

5. Java Interview Questions

6. Spring Interview Questions

7. Android UI Design

 

and many more ....

 

Receive Java & Developer job alerts in your Area

I have read and agree to the terms & conditions

 

Ilias Tsagklis

Ilias is a software developer turned online entrepreneur. He is co-founder and Executive Editor at Java Code Geeks.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button