Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I am writing a program that reads a text file passed as an argument to the main method and extracts all the unique words from the file and prints them in the console one per line. I am having trouble passing the tokens to a string array while each line is being read from the scanner:

There's a couple things I see that are wrong or could be written in a more efficient manner:

1)tokens is initialized to 100. This an obvious constraint, I thought about using something like a dynamic array like arrayList or vector but ultimately decided to use simple string array and simply expand the array (i.e. create a new array double the size of the original array, by writing some type of conditional statement that will determine if the tokens is filled up with max elements but scanner still has more lines.

2)I am not sure if simply passing input.hasNextLine() as the test statement in the for loop makes sense. I basically want to loop as long as input has reached EOF

3) I want the regex expression in split to catch all punctuation, whitespaces, and digits, I'm not 100% sure if it's written correctly

4) The line in question is tokens[index] = token[index], I'm not sure this correct. I want the tokens from each line being to be added to tokens.

    public static void main(String[] arg) throws FileNotFoundException {
    File textFile = new File(arg[0]);
    String[] tokens = new String[100];

    try {

        Scanner input = new Scanner(textFile);

        for (int index = 0; input.hasNextLine(); index++) {
            String[] token = input.nextLine().split("[.,;']+\\d +\\s");
            tokens[index] = token[index];
        }
        for (String token : tokens) {
            System.out.println(token);
        }
        input.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
}
share|improve this question
1  
1) Use a for loop only when you know in advance how many times the loop will loop. For your present loop this condition isn't met -- so use a while loop. Any tutorial on looping will tell you this right away -- consider going through one. 2) Why did you decide not to use an ArrayList again? Your logic seemed to be about keeping it simple, well what you're doing isn't simple at all. 3) you should know on inspection that tokens[index] = tokens[index] isn't right. You're assigning a variable to itself. –  Hovercraft Full Of Eels 39 mins ago
1  
Also imagine that you have 99 lines in your file. Eventually after looping through the file, then index will become 99 right? You split your line into tokens, then try to access the array using index which is 99 - see a problem here? –  Scary Wombat 37 mins ago
    
@LF Hernandez can you post your file content plz? –  Kick Buttowski 34 mins ago

1 Answer 1

There are several errors in the code, I'll try to cover all of them:

  1. change tokens to be an ArrayList, there is no reason not to
  2. you need two iterations: a) lines in the file and b) tokens in the line
  3. the regex is really specific of what you have between tokens (punctuations + one digit + spaces + other space)

    public static void main(String[] arg) throws FileNotFoundException {
        File textFile = new File(arg[0]);
        ArrayList<String> tokens = new ArrayList<String>();
    
        try {
            Scanner input = new Scanner(textFile);
    
            while (input.hasNextLine()) {
                String[] lineTokens = input.nextLine().split("[,;:\"\\.\\s]+");
                for (String token : lineTokens) {
                    tokens.add(token);
                }
            }
            for (String token : tokens) {
                System.out.println(token);
            }
            input.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }
    

The regex can be improved but it depends on your data anyway so I can't know all the cases you need to handle.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.