Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I have written up a data structure in Java that supports the following features:

  • get(i)
  • set(i, e)
  • add(e)
  • remove(e)
  • contains(e)
  • size

Elements must remain in order and iteration needs to be as fast as possible. The code does not require thread safety.

I am looking for ways to improve the speed further. It is critical to reduce times to as small as possible because this code will be invoked hundreds of thousands of times per second. I believe it's possible to optimize my stream() method!

This is my current solution which seems to work:

import java.util.Iterator;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;

public final class Indexer<E> implements Iterable<E> {

    private final int minIndex;
    private final Object[] arr;

    private int size = 0;
    private int highest;

    public Indexer(int minIndex, int capacity) {
        this.minIndex = highest = minIndex;
        arr = new Object[capacity];
    }

    public Indexer(int capacity) {
        this(0, capacity);
    }

    @SuppressWarnings("unchecked")
    public E get(int index) {
        return (E) arr[index];
    }

    @SuppressWarnings("unchecked")
    public E set(int index, E element) {
        Object last = arr[index];
        arr[index] = element;
        if (last == null && element != null) {
            size++;
            if (highest < index)
                highest = index;
        } else if (last != null && element == null) {
            size--;
            if (highest == index)
                highest--;
        }
        return (E) last;
    }

    public int add(E element) {
        int index = nextIndex();
        set(index, element);
        return index;
    }

    public void remove(E element) {
        for (int i = minIndex; i <= highest; i++) {
            if (element == arr[i]) {
                set(i, null);
                return;
            }
        }
    }

    public boolean contains(E element) {
        for (E e : this) {
            if (element == e)
                return true;
        }
        return false;
    }

    public int size() {
        return size;
    }

    public int nextIndex() {
        for (int i = minIndex; i < arr.length; i++) {
            if (null == arr[i])
                return i;
        }
        throw new IllegalStateException("Out of indices!");
    }

    @Override
    public Iterator<E> iterator() {
        return new Iterator<E>() {

            private int pointer = minIndex;

            @Override
            public boolean hasNext() {
                return size > 0 && pointer <= highest;
            }

            @Override
            @SuppressWarnings("unchecked")
            public E next() {
                return (E) arr[pointer++];
            }

        };
    }

    public Stream<E> stream() {
        return StreamSupport.stream(spliterator(), false);
    }

}
share|improve this question
1  
One quick and easy change is to change all of your operations with "--" to "+= -1". This works because it is easier for computers to perform addition than subtraction on the architecture level. If you are calling many thousands of times per second, you'll notice an improvement. –  Evan Bechtol Jun 29 at 18:21
6  
@EvanBechtol You'll want to check that the Java compiler doesn't already do that automatically, and that it doesn't convert += -1 to -= 1 or just --. Remember that the JIT does a lot of things very well, including squeeze every bit of speed out of the program once it's called enough. –  QPaysTaxes Jun 29 at 18:23
4  
What does "Elements must remain in order" mean? Because if you add 1 2 3 4 5, then remove 3, then add 6, currently your order becomes 1 2 6 4 5. Is that intended? Also, can you explain the purpose of minIndex? In what situation would you create an Indexer with a minIndex that is not 0? Lastly, which operations will be performed hundreds of thousands of times per second? Is it add, remove, contains, get, or iterating? –  JS1 Jun 29 at 18:40
1  
@Jire If performance is critical, then whoop out your profiler and measure where the time is consumed. It is the only way to go about it, measure before and after to see if your changes actually improved your code or made it worse. But be aware of the caveats of writing benchmarks in Java. –  Emily L. Jun 30 at 8:44
1  
@Jire this is pretty much a simple implementation of ArrayList<T> without automatic resizing. What is the use case that keeps you from using the standard Java implementation? –  OliverS Jun 30 at 12:00

3 Answers 3

Wow, this is pretty nice code. I've only got a few little nitpicks.

NB: I have not benchmarked the code before and after making these changes. If you'd like, I can, but as of right now, these are untested changes.

NB2: Most of these are just conforming to convention, since after a bit of research I've discovered that the JIT makes things scary fast after a few thousand iterations by putting in every optimization I'd never think of.

  1. In remove(E) and contains(E), you compare equality with == instead of equals. This is a bad idea, because sometimes people want to specify their way of comparing objects. If they don't, this will default to Object's implementation, which is just ==. The exception, of course, is when you're dealing with either primitives or null.

  2. It's standard for remove to return the element it just removed, so you'll want to change the internals of the for a little bit:

    if (element.equals(arr[i])) {
        set(i, null);
        return element;
    }
    

    and change the method signature to match.

  3. I'd recommend adding a remove(int) as well, which does basically the same thing as remove(E) but doesn't have to loop through. It would look something like this:

    public E remove(int index) {
        E ret = get(index);
        set(index, null);
        return ret;
    }
    
  4. Continuing with the trend of "I think your methods are bad and here's how I think you should do them instead", I think your lack of indexOf is bad and here's how I think you could implement it:

    public int indexOf(E element) {
        for (int i = minIndex; i < size; i++) {
            if (element.equals(arr[i]))
                return i;
        }
        return -1;
    }
    
  5. In set(int, E) I'd recommend renaming last to original, or it might be confused with the last element of the array.

  6. In iterator()'s anonymous class's next(), you should add a check to make sure that you're not overstepping the array's bounds, even if all you end up doing is throwing a different error. corsiKa suggested using an IllegalStateException, which makes a good bit of sense. The most important thing is that, from the name of the exception alone, you can figure out what went wrong, though not necessarily the details.

That's about it! Well-written code all. If you give me some test cases, I can benchmark the version with my tips and the version without.

share|improve this answer
    
What are your thoughts on the IllegalStateException instead of perhaps IndexOutOfBoundsException? –  corsiKa Jun 30 at 6:13
    
@corsiKa Yeah, that would work. I'm adding it in. –  QPaysTaxes Jun 30 at 6:15

I'm not sure about the use of minIndex. Let's say I create an instance new Indexer<E>(5,20); it implies that the indexes 0,1,2,3,4 will never be used? Therefore it seems like a useless use of memory, though I might misunderstand what you're achieving with this. If it is the case, consider that your code might use comments in order to explain why you use the minIndex!

There's a bug with the minIndex too. You check for this index when you add and remove from your data structure but not when you set. Meaning that if I wanted, I could do this :

Indexer<Integer> indexer = new Indexer<>(2,5);
indexer.set(0,2);
//My value wasn't removed because it's index is before the minIndex
indexer.remove(2);

In the set method, the variable name last is confusing, I thought it implied the last value of your Object[]. I think naming it previous might be better. Also, it is confusing as why your set returns the previous value. You should comment your methods to explain what they do and what they return. Because in this case I might think it would return the new value. The same problem applies to your add method, it isn't intuitive to understand that the int returned is the new index. Comments should make this a lot clearer :)

You should use equals over == in your comparisons. == compared the reference value whereas equals will compare using the equals override of the type you are using, which leads to less errors.

What happens if I have multiple times the same element. Is it intended that you remove only the first instance that is found?

share|improve this answer
    
I was writing my answer when @QPaysTaxes posted his, so some points might be similar.. –  TopinFrassi Jun 29 at 18:57
3  
Aw, dang, I missed the minIndex bug! No fair. –  QPaysTaxes Jun 29 at 18:58
1  
That'll learn you to post your answer before mine! ;) –  TopinFrassi Jun 29 at 19:00

Speed concerns

Many of your operations are \$O(n)\$ and could be made faster. You didn't say which operations were the most commonly used operations, so I'm not sure if you care about the following.

add(), nextIndex()

These are both \$O(n)\$ operations, because they both need to search the array for a free slot. If you kept a set or list of free slots, this could be faster. To retain the behavior that the lowest numbered free slot is used next, you need to use an ordered set/list. So for example, if you used a TreeSet to track your free slots, you could have \$O(\log n)\$ adds.

remove(), contains()

These are both currently \$O(n)\$, because you search the array for the element. You could make these \$O(1)\$ by keeping a HashMap of elements mapped to indices, for elements added to the array.

MinIndex and capacity

I don't like the way that minIndex and capacity are used. If I create an Indexer with minIndex=100 and capacity=100, I would expect that I could use indices [100..199]. But in the current implementation, I get a useless Indexer with no usable slots. Right now, capacity acts more like a maxIndex so it is somewhat misnamed. Another thing is that if minIndex is not 0, you are currently allocating an array where some of the slots will never be used.

If the minIndex concept is to be kept, I would suggest that the Indexer support indices in the range [minIndex .. minIndex+capacity-1]. You should then allocate an array of size capacity, and then subtract minIndex from your index whenever you access the array.

Then again, if minIndex is only ever going to be 1, it would be faster to just keep your code as is. But you should probably allocate an array of size minIndex+capacity instead of just capacity, or else document what capacity actually means.

share|improve this answer
    
Iteration is the most used operation but add and contains are common as well. –  Jire Jun 29 at 21:21
    
Might be worth noting that your suggestions to use TreeSet and HashMap will probably reduce performance somewhat for small \$n\$ due to a larger constant factor (cache misses in a tree structure, and all that...). –  Emily L. Jun 30 at 8:41
    
Thanks for your answer, your solutions will help me in the future but in this case I am not concerned about add or contains performance. My #1 concern (which is what's being used so much) is iteration. Iteration needs to be as fast as possible which means I need to track the highest index. During iteration I would actually prefer not to deal with null. –  Jire Jul 1 at 2:05
    
@Jire How big is your typical array and how many nulls do you normally have in the array? There are ways you could skip over the nulls quickly during iteration, but it would cause your add and remove functions to be slower. But if there are very few nulls, you might want to just have your next() function skip over any null entries. –  JS1 Jul 1 at 6:01
    
@JS1 Typically 2000-10000, typically half nulls or more. –  Jire Jul 1 at 17:04

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.