Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I am trying to write a program which will count the occurrence of words in a paragraph.

The logic I am following : I am using a linked list for the purpose. And I am searching sequentially - if new word encountered adding the word in the list, but if word already exist in the list increase its count flag.

Code:

    //case insensitive string matching
int strcicmp(char const *a, char const *b)
{
    int d;
    for(;;a++,b++)
    {
        d=tolower(*a)-tolower(*b);
        if(d!=0 || !*a)
            return d;
    }
}

//declare the linked list structure to store distinct words and their count
typedef struct node
{
    char *word;
    int count;
    struct node *next;
} node;

node *ptr, *newnode, *first=NULL, *last=NULL;

void insertnewword(char *ch)
{
    newnode=(node*)malloc(sizeof(node));
    if(newnode == NULL)
    {
        printf("\nMemory is not allocated\n");
        exit(0);
    }
    else
    {
        newnode->word=ch;
        newnode->count=1;
        newnode->next=NULL;
    }           

    if(first==last && last==NULL)
    {
        first=last=newnode;
        first->next=NULL;
        last->next=NULL;
    }   
    else
    {
        last->next=newnode;     
        last=newnode;
        last->next=NULL;            
    }   
}

void processword(char *ch)
{   
    int found=0;
    //if word is already in the list, increase the count
    for(ptr=first;ptr!=NULL;ptr=ptr->next)          
        if(strcicmp(ptr->word, ch) == 0)
        {
            ptr->count += 1;
            found=1;
            break;
        }

    //if it's a new word, add the word to the list
    if(!found)
        insertnewword(ch);  
}

int main()
{
    const char *delimiters=" ~`!@#$%^&*()_-+={[}]:;<,>.?/|\\\'\"\t\n\r";
    char *ch, *str; 
    str=(char*)malloc(sizeof(char));
    ch=(char*)malloc(sizeof(char));

    //get the original string
    scanf("%[^\n]%*c", str);
    //fgets(str, 500, stdin);

    //get the tokenized string
    ch=strtok(str,delimiters);
    while(ch!=NULL)
    {
        //a, an, the should not be counted
        if(strcicmp("a", ch)!=0 && strcicmp("an", ch)!=0 && strcicmp("the", ch)!=0)
            processword(ch);        

        ch=strtok(NULL,delimiters);
    }

    //print the word and it's occurrence count
    for(ptr=first; ptr!=NULL; ptr=ptr->next)
        printf("%s\t\t%d\n",ptr->word,ptr->count);
    return 0;
}

this seem to be working fine for few number of words, but if word count is more than 6-7, this program is encountering some problem.

I can always implement any other logic for the same problem, but I want to know the issue with this implementation.

Thanks in advance

share|improve this question

closed as off-topic by Emily L., BenVlodgi, janos, Kid Diamond, 200_success Sep 2 '14 at 8:52

This question appears to be off-topic. The users who voted to close gave this specific reason:

  • "Questions containing broken code or asking for advice about code not yet written are off-topic, as the code is not ready for review. Such questions may be suitable for Stack Overflow or Programmers. After the question has been edited to contain working code, we will consider reopening it." – Emily L., BenVlodgi, janos, Kid Diamond, 200_success
If this question can be reworded to fit the rules in the help center, please edit the question.

1  
Hello and welcome to CodeReview! We review and comment on working code that you have written or maintain. Your question contains broken code and that is unfortunately off-topic for this SX site. Please try one of the other SX sites for help such as StackOverflow or Programmers. –  Emily L. Sep 2 '14 at 8:07
    
Ohh ok, thanx for letting me know :) –  Diptarag Sep 2 '14 at 8:21
2  
@Diptarag: Once you fix the bug I point out in my answer (strlen) then code should work as expected and can be re-opened. –  Loki Astari Sep 2 '14 at 14:35

1 Answer 1

up vote 1 down vote accepted

The well hidden bug.

This allocates strings of length 1.

str=(char*)malloc(sizeof(char));
ch=(char*)malloc(sizeof(char));

This is enough to hold the terminating '\0' but nothing else. You probably want to hold a longer string try.

str = (char*)malloc(sizeof(char) * 500 );
ch  = (char*)malloc(sizeof(char) * 500 );

You should also check to see if they actually worked and returned a good value. One of the major problems with C code is forgetting to check the result of a call to make sure it actually worked.

Scanning. This fails if the input is an empty line. (it will not remove the '\n' on empty lines.

scanf("%[^\n]%*c", str);

Also you should probably put the maximum size of the string encoded into the format specifier.

scanf("%500[^\n]", str);
scanf("%*c");            // Now read and throw away the newline character.
                         // Thus you strip the newline even if the read fails.
                         // PS you may want to check the read worked.

Code Review:

Avoid global variables.

node *ptr, *newnode, *first=NULL, *last=NULL;

There is no need for any of these variables. The variables ptr and newnode should be local varaibles declared inside functions. The variables first and last describe a list. It would be best if you wrapped these up into a structure (as they represent a single entity (a list)). Your code is currently written to support a single list. If you pass this as a parameter to your functions you can support multiple lists in your application (it also makes your code much easier to read.

typedef struct List
{
     node*  first;
     node*  last;
} List;

int main()
{
    List   wordList = { NULL, NULL};
    // STUFF
                //  pass word list as a parameter.
       processword(&wordList, ch); 
}

Indide:

void insertnewword(char *ch)

As mentioned above newnode should be a local function variable (not a global) and you should pass the list you are adding the node onto as the first parameter so that you don't need to keep first and last as global variables.

In this condition we have already asserted that first == last

    first->next=NULL;
    last->next=NULL;

So this double assignment of NULL is redundant. Also When you create the node you have already set it up to be NULL. So there is no point in setting it to NULL again.

On the other side of the else you have the same issue.

    last->next=NULL;            

At this point last points at the new node. Which had its next value set to NULL during construction.

Another trick you could look up is using a sentinel node. Basically you always put one fake node into the list. Then the code for handling insert/delete become much easier to write because you don't need to check if the list is NULL (because there is always at least one node).

typedef struct List
{
     node   sentinel;
     node*  first;
     node*  last;
} List;

void initList(List* list)
{
    list->sentinel = {NULL, -1, NULL};

    list->first = &list->sentinel;
    list->last  = list->first;
}
void destroyList(List* list)
{
    node* loop;
    node* next;
    for(loop = list->head->next;loop != NULL; loop = next)
    {
         next = loop=>next;
         free(loop);
    }
}
void insertnewword(List* list, char* ch)
{
    node*  newnode = (node*)malloc(sizeof(node));
    if (newnode == NULL)
    {  /* ERROR */ exit(0);
    }
    newnode->word  = ch;
    newnode->count = 1;
    newnode->next  = null;

    list->last->next = newnode;
    last->last       = newnode;
}

In:

void processword(char *ch)

As mentioned above ptr should be a local function variable (not a global) and you should pass the list you are adding the node onto as the first parameter so that you don't need to keep first as global variables.

This is not wrong.

for(ptr=first;ptr!=NULL;ptr=ptr->next)          
    if(strcicmp(ptr->word, ch) == 0)
    {}

But I would always put the {} braces around sub statement groups. It becomes too easy for people to accidentally add more code to the for loop without thinking about it and suddenly what you expect to be in the loop is not actually part of the loop. Also it keeps your style consistent if you always have the {} rather than only sometimes having them.

Same comment about {} here.

if(!found)
    insertnewword(ch);  

A lot of poeple use the uneven style.

if(!found) {
    insertnewword(ch);
}

This saves some space. Personally I still use the even style.

if(!found)
{
    insertnewword(ch);
}

But to be honest I am starting to like the uneven style and may change. But for now the matching open and close above each other appeal to me more.

share|improve this answer
    
Thanks a lot for your insight! Learnt many things. Wish I could give you more than one upvote. Really appreciate your effort. –  Diptarag Sep 3 '14 at 10:21

Not the answer you're looking for? Browse other questions tagged or ask your own question.