Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have a piece of code that loops through a char array string to try and detect words. It loops through and if the detects A - Z or a - z or an _ (underscore) it will add it to a char array. What I need, because they're words, is to be able to put them into a string which I can then use another function to check and then can be discarded. This is my function:

char wholeProgramStr2[20000];
char wordToCheck[100] ="";

IdentiferFinder(char *tmp){
    //find the identifiers
    int count = 0;
    int i;
    for (i = 0; i < strlen(tmp); ++i){
        Ascii = toascii(tmp[i]);
        if ((Ascii >= 65 && Ascii <= 90) || (Ascii >= 97 && Ascii <= 122) || (Ascii == 95))
        {
            wordToCheck[i] = tmp[i];
            count++;
            printf("%c",wordToCheck[i]); 
        }
        else {
            if (count != 0){
            printf("\n");
        }
            count = 0;
        }
    }
    printf("\n");
}

At the moment I can see all of the words because it prints them out on separate lines.

the content of WholeProgram2 is whatever all the lines are of the file. and it is the *tmp argument.

Thank you.

share|improve this question
4  
Never compare against magic numbers. Use isalpha() and '-'. –  unwind Nov 12 '14 at 14:31
    
Your question is similar to this one, which is duplicate of that. Adapt my answer to your needs here. –  Basile Starynkevitch Nov 12 '14 at 14:33
    
You currently fill the wordToCheck array already (but remove the ="" initializer, global variables are zero initialized thus you will get a NUL terminated string in the array, given that tmp is not too long.) It's a global variable, you can access it from other functions. Please clarify what you want or what the problem is. –  Jite Nov 12 '14 at 14:34
    
What the hell is Ascii = toascii(tmp[i]);? Where is the Ascii defined? What's its type? –  EOF Nov 12 '14 at 14:35
    
Can you explain what delimiters you are using to parse words from wholeProgramStr2. usually, spaces, tabs etc. are used as delimiters for this type of parsing. Is this what you are doing? –  ryyker Nov 12 '14 at 14:42

2 Answers 2

up vote 3 down vote accepted

You describe breaking apart a big string, into little strings (words).
Assuming you are using normal delimiters to parse, such as spaces or tabs or newlines:

Here is a three step approach:
First, get information about your source string.
Second, create your target array dynamically to fit your size needs
Third, loop on strtok() to populate your target array of strings (char **)

(A forth would be to free memory created, which you will need to do)
hint: the prototype could look like this:
// void Free2DCharArray(char **a, int numWords);

Code example:

void FindWords(char **words, char *source);
void GetStringParams(char *source, int *longest, int *wordCount);
char ** Create2DCharArray(char **a, int numWords, int maxWordLen);
#define DELIM " \n\t"

int main(void)
{
    int longestWord = 0, WordCount = 0;
    char **words={0};
    char string[]="this is a bunch of test words";

    //Get number of words, and longest word, use in allocating memory
    GetStringParams(string, &longestWord, &WordCount);

    //create array of strings with information from source string
    words = Create2DCharArray(words, WordCount, longestWord);

    //populate array of strings with words
    FindWords(words, string);

    //Do not forget to free words (left for you to do)
    return 0;   
}

void GetStringParams(char *source, int *longest, int *wordCount)
{
    char *tok;
    int i=-1, Len = 0, KeepLen = 0;
    char *cpyString = 0;
    cpyString = calloc(strlen(source)+1, 1);
    strcpy(cpyString, source);
    tok=strtok(source, DELIM);
    while(tok)
    {
        (*wordCount)++;
        Len = strlen(tok);
        if(Len > KeepLen) KeepLen = Len;
        tok = strtok(NULL, DELIM);
    }
    *longest = KeepLen;
    strcpy(source, cpyString);//restore contents of source
}

void FindWords(char **words, char *source)             
{
    char *tok;
    int i=-1;

    tok = strtok(source, DELIM);
    while(tok)
    {
        strcpy(words[++i], tok);
        tok = strtok(NULL, DELIM);
    }
}

char ** Create2DCharArray(char **a, int numWords, int maxWordLen)
{
    int i;
    a = calloc(numWords, sizeof(char *));
    if(!a) return a;
    for(i=0;i<numWords;i++)
    {
        a[i] = calloc(maxWordLen + 1, 1);       
    }
    return a;
}
share|improve this answer

If your goal is to look for words in an array of chars, you probably want to first find a valid sequence of character (and you seem to be trying to do that), and once you've found one, do that secondary check to know if it is a real word. If it is indeed a word, you may then decide to keep it for further usage.

The advantage of this approach is that you don't need to keep a large buffer of potential words, you only need a fixed one, of size matching the largest word in your dictionary. In fact, you might not even need a buffer, but just a pointer sliding along the char array, pointing at the start of a possible word, and an int (though a byte might suffice) to keep track of the length of that word.

// structure to store a word match in array
typedef struct token_s {
  int length;
  const char *data;
} token_t;

void nextToken(const char *tmp, int len, token_t *to){
  char *start = NULL;
  while (len){
    if (start) {
      // search for end of current word
      if (!isalpha(*tmp)) {
        to->data = start;
        to->length = tmp - start;
        return;
      }
    } else { 
      // search for beginning of next word
      if (isalpha(*tmp))
        start = tmp;
    }
    tmp++;
    len--;
  } // while
  if (start) {
    to->data = start;
    to->length = tmp - start;  
  }
}

Simply pass:

  • the start of your char array, or to->data + to->length + 1 if it's not beyond the end of the array
  • the raining length of the char array to scan
  • a pointer to a zeroed token_t

to each call to nextToken, and check the token's content to know if it found a candidate; if it didn't, you know that the array has been scanned entirely.

void scanArray(const char *tmp, int len){
  while (len > 0){
    token_t to;
    to.data = NULL;
    to.length =0;
    nextToken(tmp, len, &to);
    if (to.data) {
      tmp += to.length +1;
      len -= to.length +1;     
      // process token here...
    } else break;
  } // while
}

I used isalpha to test for valid characters, but you'll want to replace that by a function of your own. And you'll have to insert your own code for that secondary checking in the body of scanArray.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.