Implementing a fast 'split' function on a stream of characters in Java

Question

I was writing a piece of code (a custom EntityProcessor for Solr, but that isn't too relevant) designed to break lines of input based on a string delimiter (toMatch below). Characters are read from a stream and then passed to addChar. The assumption is that each character comes in a stream and look-ahead isn't allowed - the break needs to happen after the last character of a match is passed in. I wrote two implementations below addChar and addChar2. The first is a simple, dumb, keep a buffer and compare it each iteration, the second is a simple state machine and the latter is slightly faster. I wanted to see if anyone had any better ideas on how to optimize this code. Thanks!

import java.util.Iterator;
import java.util.LinkedList;

public class CharStreamMatcher {
    protected char[] toMatch = null;


    public MagicCharStreamMatcher(char[] match){
        toMatch = match;
        for(int i=0;i<match.length;i++) li.add((char)0);
    }

    public void clear(){
        correspondences.clear();
    }

    LinkedList<Character> li = new LinkedList<Character>();
    public boolean addChar2(char c){
        li.addLast(c);
        li.removeFirst();
        int i=0;
        for(Character ch : li){
            if(ch.charValue() != toMatch[i++])
                return false;
        }
        return true;
    }


    private class MutableInteger{
        public int i;
        public MutableInteger(int i){
            this.i = i;
        }
    }

    LinkedList<MutableInteger> correspondences = new LinkedList<MutableInteger>();
    public boolean addChar(char c){
        boolean result = false; 

        if(c == toMatch[0])
            correspondences.add(new MutableInteger(-1));

        Iterator<MutableInteger> it = correspondences.iterator();
        while(it.hasNext()){
            MutableInteger mi = it.next();
            mi.i++;

            // check the match
            if(c != toMatch[mi.i]){
                it.remove();
            }
            // are we done? 
            else if(mi.i == toMatch.length-1){
                result = true;
                it.remove();
            }
        }

        return result;
    }   
}

Depends on the statistics of toMatch. Knuth-Morris-Pratt might be an improvement if the delimiters are fairly long. — Peter Taylor, Feb 18 '11 at 22:35
if performance is important you can't be adding one char at a time. Also prefer explicit int for loops to java 5 sugared iterator loops. — Ron, Feb 23 '11 at 1:50
A code style comment: your code will be more readable if you put the attribute declarations at the start of the class ... before the constructors and methods. That's where people expect to find them. — Stephen C, Mar 26 '11 at 6:33

Mark Bessey · Answer 1 · 2011-03-28 20:32:17Z

There are more-sophisticated algorithms you can use, depending on the input and the search string. Even keeping the "one character at a time" interface though, there are optimizations to the implementation. Someone linked to KMP in a comment to your question - that's a good way to go if you can do some initial setup and buffer an arbitrary amount of the incoming stream.

I think you're on the right track with a state machine, but the implementation in addChar2 looks kind of heavyweight, with the MutableInteger class and a linked list, and an iterator-driven loop, and all. If you want this to be fast, you'll likely want to use primitive types (ints and arrays of ints) rather than OO wrappers around the primitive types.

asked	3 years ago
viewed	796 times
active	3 years ago

current community

your communities

more stack exchange communities

Implementing a fast 'split' function on a stream of characters in Java

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged java or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Implementing a fast 'split' function on a stream of characters in Java

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged java or ask your own question.

Related

Hot Network Questions