Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I was writing a piece of code (a custom EntityProcessor for Solr, but that isn't too relevant) designed to break lines of input based on a string delimiter (toMatch below). Characters are read from a stream and then passed to addChar. The assumption is that each character comes in a stream and look-ahead isn't allowed - the break needs to happen after the last character of a match is passed in. I wrote two implementations below addChar and addChar2. The first is a simple, dumb, keep a buffer and compare it each iteration, the second is a simple state machine and the latter is slightly faster. I wanted to see if anyone had any better ideas on how to optimize this code. Thanks!

import java.util.Iterator;
import java.util.LinkedList;

public class CharStreamMatcher {
    protected char[] toMatch = null;


    public MagicCharStreamMatcher(char[] match){
        toMatch = match;
        for(int i=0;i<match.length;i++) li.add((char)0);
    }

    public void clear(){
        correspondences.clear();
    }

    LinkedList<Character> li = new LinkedList<Character>();
    public boolean addChar2(char c){
        li.addLast(c);
        li.removeFirst();
        int i=0;
        for(Character ch : li){
            if(ch.charValue() != toMatch[i++])
                return false;
        }
        return true;
    }


    private class MutableInteger{
        public int i;
        public MutableInteger(int i){
            this.i = i;
        }
    }

    LinkedList<MutableInteger> correspondences = new LinkedList<MutableInteger>();
    public boolean addChar(char c){
        boolean result = false; 

        if(c == toMatch[0])
            correspondences.add(new MutableInteger(-1));

        Iterator<MutableInteger> it = correspondences.iterator();
        while(it.hasNext()){
            MutableInteger mi = it.next();
            mi.i++;

            // check the match
            if(c != toMatch[mi.i]){
                it.remove();
            }
            // are we done? 
            else if(mi.i == toMatch.length-1){
                result = true;
                it.remove();
            }
        }

        return result;
    }   
}
share|improve this question
1  
Depends on the statistics of toMatch. Knuth-Morris-Pratt might be an improvement if the delimiters are fairly long. –  Peter Taylor Feb 18 '11 at 22:35
1  
if performance is important you can't be adding one char at a time. Also prefer explicit int for loops to java 5 sugared iterator loops. –  Ron Feb 23 '11 at 1:50
2  
A code style comment: your code will be more readable if you put the attribute declarations at the start of the class ... before the constructors and methods. That's where people expect to find them. –  Stephen C Mar 26 '11 at 6:33

1 Answer 1

There are more-sophisticated algorithms you can use, depending on the input and the search string. Even keeping the "one character at a time" interface though, there are optimizations to the implementation. Someone linked to KMP in a comment to your question - that's a good way to go if you can do some initial setup and buffer an arbitrary amount of the incoming stream.

I think you're on the right track with a state machine, but the implementation in addChar2 looks kind of heavyweight, with the MutableInteger class and a linked list, and an iterator-driven loop, and all. If you want this to be fast, you'll likely want to use primitive types (ints and arrays of ints) rather than OO wrappers around the primitive types.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.