Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I want to extract some values in i-commas(") from lines like this:

<P k="9,0,1" vt="191" v="100.99936" z="" />

Example:

getCharVal ( cp, char "k=\"", 9)

where cp is a pointer the line above should return "9,0,1"

The function:

#define  ENDTAG   "/>"
#define ICOMMAS   '"'
char * getCharVal ( char *cp, char *att, size_t s)
{
  char * val;
  char * endTagP;
  if (cp == NULL)return NULL;
  cp = strstr(cp, att)+strlen(att);
  if (cp == NULL) return NULL;
  char * endP = strchr(cp, ICOMMAS);
  if (endP == NULL) return NULL;
  endTagP = strstr(cp, ENDTAG);
  if (endTagP == NULL) return NULL;
  if (endP > endTagP) return NULL;
  size_t valsize    = endP - cp ;
  if (valsize > s) return NULL;
  val = malloc(valsize + 1);
  memcpy (val, cp, valsize);
  val[valsize]='\0';
  cp = endTagP;
  return val;

}

This code is ugly. Could anyone give me hints on how to write it better?

share|improve this question
    
Use a regular expression! (seriously, don't!) –  Mat's Mug Dec 19 '13 at 3:04

3 Answers 3

up vote 4 down vote accepted

I've modified your code to make it more normal C99, including adding const to parameters that your function does not change, improving (to my taste) the variable names, moving variable definition to the point of their first use and using strndup to duplicate the tag (not universally available but easily written).

char *getCharVal(const char *ch, const char *att, size_t size)
{
    if (!ch) {
        return NULL;
    }
    ch = strstr(ch, att);
    if (!ch) {
        return NULL;
    }
    ch += strlen(att);

    char *end = strchr(ch, '"');
    if (!end) {
        return NULL;
    }
    char *endTag = strstr(ch, ENDTAG);
    if (!endTag) {
        return NULL;
    }
    if (end > endTag) {
        return NULL;
    }
    size_t valSize = end - ch;
    if (valSize > size) {
        return NULL;
    }
    return strndup(ch, valSize);
}

Note also that your

cp = strstr(cp, att)+strlen(att);
if (cp == NULL) return NULL;

will never fail because even if att is not found you have added its length to the NULL pointer that strstr would return.

A beneficial change would be to omit the check for size (although perhaps your application is such that this check is more necessary than it appears).

Finally, I think your interface is ugly. It would be much cleaner to pass the the pure attribute "k". However, I can see that passing "k=\"" is an optimisation that makes the function much simpler to implement.

share|improve this answer
    
Didn't notice: cp = strstr(cp, att)+strlen(att);. Goo eye. –  Fiddling Bits Dec 19 '13 at 4:07
    
Do you also recommend the change it and use standard xml libraries? –  MaMu Dec 19 '13 at 9:16
    
Size is checked because it is stream and sometimes comes something wrong. –  MaMu Dec 19 '13 at 9:25
1  
Depends. If if you only ever want to extract "k" from your stream then using your own function might be sensible. But if you need to extract more and/or your XML can be badly formed a library is likely to make the job much easier. –  William Morris Dec 19 '13 at 11:52

Work on the algorithm first before you think about error checking. You know you'll input information correctly. The basic algorithm follows:

char *getCharVal(char *cp, char *att, size_t s)
{
    char *val, *pChStart, *pChEnd;
    size_t valLen;

    pChStart = strstr(cp, att);      // Find value substring in source string
    pChStart += strlen(att);         // Go to beginning of value substring
    pChEnd = strstr(pChStart, "\""); // Find end of value substring
    valLen = pChEnd - pChStart;      // Calculate length of value substring

    val = malloc(valLen * sizeof(char) + 1); // Allocate memory for value substring
    strncpy(val, pChStart, valLen);          // Copy value in source string to value substring copy
    val[valLen] = '\0';                      // NULL terminate value substring copy

    return val;
}

Once you get the algorithm down, you can add error checking. You know you'll be checking for NULL a lot, so instead of using a lot of if statements in the source code, which adds clutter and makes things less readable, add a macro to "hide" it:

#define CHECK_NULL(string) (if((string) == NULL) return NULL)

With this, you can check for NULLs in a less obtrusive way:

 CHECK_NULL(pChStart == strstr(cp, att));

Your code with the CHECK_NULL macro is much more readable:

char * getCharVal ( char *cp, char *att, size_t s)
{
    char * val;
    char * endTagP;
    char * endP;

    CHECK_NULL(cp);
    CHECK_NULL(strstr(cp, att)+strlen(att));
    CHECK_NULL(endP = strchr(cp, ICOMMAS));
    CHECK_NULL(strstr(cp, ENDTAG));

    size_t valsize = endP - cp ;
    if (valsize > s) return NULL;

    CHECK_NULL(val = malloc(valsize + 1));
    memcpy (val, cp, valsize);
    val[valsize]='\0';
    cp = endTagP;

    return val;
}
share|improve this answer
1  
Note that sizeof(char) is 1 by definition, malloc should not be cast, the OP is correct in using memcpy instead of your strncpy (he knows the length) and that multiple vars are usually best not declared on the same line but are often better defined at the point of first use. Encouraging the use of macros is also not something I would do and encouraging their use to hide error handling is arguably wrong. –  William Morris Dec 19 '13 at 1:34
1  
@WilliamMorris OP tagged this C, so variables at top is convention. Using old compiler... I forgot to take the cast that it requires out. Chars are not always one byte. The macro is simple and it's obvious what it does. –  Fiddling Bits Dec 19 '13 at 1:41
1  
sizeof(char) is 1. By definition. The macro call is only "obvious" here because the definition is right above it. In normal use it will not be, and so without looking up the definition there is no way to know what happens if its "check" turns out false. And what happens when you have another use case where the return needs to be -1 instead of NULL? Another macro with a similar name? Use of macros is discouraged for good reasons. –  William Morris Dec 19 '13 at 2:21
1  
@WilliamMorris stackoverflow.com/questions/20684056/… You are correct sir. My apologies. –  Fiddling Bits Dec 19 '13 at 14:54
1  
@Fiddling Bits i really like your solution. I really didn't know to whom I should give "accepted". I decided for Morris because of his further explanation. Anyhow big thanks and I'll use some of you ideas too. –  MaMu Dec 20 '13 at 10:14

The easiest code to maintain is code that doesn't exist. For parsing XML, use an XML-parsing library, preferably with XPath support. Sometimes it might be justified to whip up your own code to extract values from XML, but it clearly does not make sense in this case. Not only is the code ugly by your own admission, low-level string- and memory-manipulation also distracts you from thinking about the big picture. Furthermore, your code will be less robust than a proper XML parser, and it will fail if the text is encoded in a semantically equivalent variant (for example, if a value is escaped).

share|improve this answer
    
Definitely use a tried and true library, but as a exercise or for fun, try to write these yourself. You'll become a better programmer for the effort. –  Fiddling Bits Dec 19 '13 at 1:11

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.