I've built a C decoder program. The length of an encoded message will be given and the message itself in the following line. All the characters of the message will be in uppercase letters. The task is to print out the decoded message.

Encoding: A->B, B->C...Y->Z, Z->A (well, you get the idea)

Here's what I did:

#include <stdio.h>


main()
{
    int size;
    scanf("%d", &size);

    char str[size];
    scanf("%s", str);

    for (int i=0; i<size; i++)
    {
        if (str[i]!='Z')
        {
            str[i] -= 1;
        }

        else
        {
            str[i] -= 25;
        }

        printf("%c", str[i]);
    }

    return 0;
}

Is this efficient enough? Could this be done in a simpler and more efficient way? I'm a beginner in C, so it would be really nice if you could help me out. I'm only 13, by the way.

Also, should I use gets() instead of scanf() to take the string as input?

Note that some of the things in my code are not valid C (e.g. not using int() before main(), variable declaration within loop). But, my compiler ignores them. I'm a lazy person; I like to type less. I'm well-aware of the fact, nonetheless.

share|improve this question
    
This will work with contiguously defined character sets for A to Z only and it's guaranteed that 'Z' - 'A' is 25. – πάντα ῥεῖ 20 hours ago
    
@πάνταῥεῖ Comments are for seeking clarification. Please put all critiques in answers. – 200_success 20 hours ago
    
@200_success So far I couldn't make that code working at all. So I didn't answer yet. – πάντα ῥεῖ 20 hours ago
1  
@πάνταῥεῖ The code does work in C99. – 200_success 17 hours ago
1  
None of the code in the question is invalid in C — there is no need to modify the code. – Jonathan Leffler 16 hours ago
up vote 34 down vote accepted

That's not a bad program, despite the number of suggestions I'm about to make. I do recognize you're a learner, and a young learner at that.

There are several levels at which we can analyze this program. One is checking the current implementation. Another is considering whether it is portable. We can question whether the interface is good — should humans be made to count? And there are alternative algorithms that could be used.

Some of the points I'm about to make are related to style. Although you can write programs in C in all sorts of ways, some of them are easier to read than others. What you show is not bad, but it isn't wholly consistent (and consistency is very important — and hard!).

Analyzing the current implementation — line by line.

#include<stdio.h>

Generally, put a space between #include and the header name, whether a standard header like <stdio.h> or your own header like "caesar.h".

main()

You should always specify the return type of every function, including main(). You're using either C99 or C11, so you are required to specify the return type. (The oldest version of the standard, C90, wasn't quite so fussy.). Further, the return type of main() should be int, though if you work with Microsoft compilers, they allow void too. The standard is quite clear; int is expected. It's often a good idea to say explicitly 'no command line arguments' by writing int main(void), but in practice, int main() also works fine almost all the time. (You have to be doing weird things for it to matter.)

{
    int size;
    scanf("%d", &size);

One of the painful lessons you'll learn coding C is that a lot of your effort is spent error checking. In particular, it is important to check input functions because they're one of the places where things can go wrong and have bad consequences on the rest of the program. The scanf() function returns the number of successful conversions; it can also return 0 indicating no successful conversion, and EOF indicating that there wasn't any data to read. You should also check that the input value is plausible: negative numbers, zero, even one are dubious, and so are huge numbers (e.g. more than 1024).

    char str[size];

This is a VLA — variable length array. They're very useful and have been a part of C since the C99 standard (though technically they're an optional feature in C11 — they were mandatory in C99). You've allocated enough space for size characters, but strings are terminated with a null byte, '\0', and you need to allow space for that. You should probably use char str[size+1];.

    scanf("%s", str);

Again, you should check that you got some data read. Note that with "%s", scanf() first skips any white space (blanks, tabs, newlines) and then reads one word — a sequence of non-blanks. You've not limited the size of the input. If the user said 10 characters but typed 20 characters before the newline or first blank or tab, you'd have problems. If the array size was fixed at, say, 1024 bytes in total, you could use "%1023s" to limit the input to 1023 non-blanks and the terminating null byte. With a variable length array, it's harder. (This is a common oversight in programs, even those by people with lots of experience.)

    for (int i=0; i<size; i++)

This loop is fine, except that many people prefer a bit more space around the operators. What you've written is self-consistent; that's good.

    {
        if (str[i]!='Z'){
            str[i] -= 1;
        }

You're assuming that the user is obedient and does exactly what you want. Unfortunately, users are seldom obedient and rarely do exactly what you want. If the user types 'abracadabra' instead of 'ABRACADABRA' as you expect, or if they type '@Wonderful2CU', you are not going to get the result you expect. You can handle this in various ways. The simple one, which you've currently chosen, is to ignore the issue — it is sometimes termed GIGO: Garbage In, Garbage Out. You could decide that you'll convert lower-case letters to upper-case and then decode them. You could decide to not touch non-letters. You could decide to complain about non-letters. On the whole, it would probably be best to handle lower-case like upper-case and not to change non-letters, but you might decide differently.

Many people would put a space before the {. Others (myself included) would put the { on the next line, like you did after the else below. Here is a case of inconsistency. Either you should use 1TBS (One True Brace Style), which is more or less what you're using in the if, or you should use Allman style, which is more or less what you're using in the else. See Wikipedia on Indent Style for more information (and many more styles).

        else
        {
            str[i] -= 25;
        }

You could simply translate 'Z' to 'A', rather than subtracting a magic number — str[i] = 'A'; would work well.

        printf("%c", str[i]);
    }

These lines are fine. However, it is good practice to end lines of output with a newline. You could sensibly add putchar('\n'); to add a newline.

    return 0;
}

I like to see the explicit return 0; at the end of main(); there are others who aren't convinced. With C99 and later, if you omit the explicit return 0;, then if the main() function — and only the main() function; it doesn't apply to any other functions — 'falls off the end' without any return, it is equivalent to doing return 0;.

Oops!

One other problem, pointed out by NowIGetToLearnWhatAHeadIs in comments is that your decoding steps are not an exact reverse of your encoding steps.

Original: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Encoded:  BCDEFGHIJKLMNOPQRSTUVWXYZA
Decoded:  ABCDEFGHIJKLMNOPQRSTUVWXA@

Minor cheating here; there was also a stray character after the @ because of the issues with string length.

Clearly, the decoded information is not the same as the original. Instead of Z being special, it is A that is special; we need to subtract 1 from everything else and either add 25 or map to Z when the letter is A.

Revising the current implementation

#include <stdio.h>
#include <ctype.h>

int main(void)
{
    int size;
    if (scanf("%d", &size) != 1)
    {
        fprintf(stderr, "Failed to read an integer\n");
        return 1;
    }
    if (size < 2 || size > 1024)
    {
        fprintf(stderr, "Size %d is not in the range 2..1024\n", size);
        return 1;
    }

    char str[size + 1];
    char fmt[10];
    snprintf(fmt, sizeof(fmt), "%%%ds", size);

    if (scanf(fmt, str) != 1)
    {
        fprintf(stderr, "Failed to read a string\n");
        return 1;
    }

    for (int i = 0; i < size; i++)
    {
        if (toupper((unsigned char)str[i]) == 'A')
        {
            str[i] = 'Z';
        }
        else if (isalpha((unsigned char)str[i]))
        {
            str[i] -= 1;
        }
        printf("%c", str[i]);
    }
    putchar('\n');

    return 0;
}

I used #include <ctype.h> to provide macros isalpha() and toupper(). One mildly irksome problem is that plain char can be a signed or unsigned type, and the macros from <ctype.h> expect an unsigned char converted to an int. Putting the (unsigned char) cast into those calls protects you from the sadistic user who types à or ÿ into your program. (There is more to handling accented characters than that, but it is an adequate safety precaution for now.)

Note that this 'decodes' Aa to ZZ; that can be fixed if it matters to you.

I also use snprintf() to create an appropriate format string to ensure that the buffer won't be overflowed. If the size given is 100, it generates %100s (which is the correct size to use since the array is 101 characters long — the difference by one is a nuisance).

Portability

You can skip this section if you wish — it matters to people like me who have to make the software they write work on lots of different machines. If you only have one machine type to work with, you can largely ignore it.

The original code used -= 25 and -= 1 to map the characters. As was noted in the comments, this assumes that the character code for Z is 25 larger than the character code for A. Now, in practice, that is valid for most of the character sets used in the world — and, in particular, it is accurate for Unicode. However, there are machines — IBM mainframes for example — where the code set used is called EBCDIC, where the code for A is 193, for I is 201, for J is 209, for R is 217, for S is 226, and for Z is 233. There are gaps in those ranges to cause confusion.

If you decided to worry about it, you should devise an alternative way of mapping the characters.

At some point, you'll note that the original Caesar cipher used a shift of 3, not 1, and you'd upgrade your code to handle any shift between 1 and 25 (there's not much point in shifting by 0 or 26; nothing changes). That complicates the checking for wraparound at the ends of the alphabet. Again, that isn't an immediate problem.

Interface design

You currently require the human to know how long the message is. I don't know about you, but I know I don't know how many characters are in the first sentence of this paragraph. Sure, I could count, but computers are much better at counting. You can work around this problem in various ways. One simple one is to provide a fixed size but large buffer (e.g. 1024 bytes) and simply read the data into that. You would probably use the fgets() function for the job instead of scanf(). Then you might read more than one line of input in a run of the program, too, using a loop. The revised program will look a bit simpler:

#include <stdio.h>
#include <ctype.h>
#include <string.h>

int main(void)
{
    char str[1024];

    while (fgets(str, sizeof(str), stdin) != 0)
    {
        int size = strlen(str);    
        for (int i = 0; i < size; i++)
        {
            if (toupper((unsigned char)str[i]) == 'A')
            {
                str[i] = 'Z';
            }
            else if (isalpha((unsigned char)str[i]))
            {
                str[i] -= 1;
            }
            printf("%c", str[i]);
        }
    }

    return 0;
}

This simply reads standard input until it encounters EOF. You might run your program as ./decode-caesar < encoded.txt, or you might type your encoded message by hand, in which case you'd indicate EOF by typing a control character at the start of a line — it would be Control-D on Unix systems or Control-Z on Windows systems (unless you've changed the default setting for EOF on Unix).

Note that fgets() preserves newlines and the revised algorithm doesn't change newlines so there's no need to print an extra newline at the end of the output.

Alternative designs

The programs so far have read a whole word or line at a time, and then processed each character in turn. Another way of doing it would be to simply read and process each character in turn. This is simpler still:

#include <stdio.h>
#include <ctype.h>

int main(void)
{
    int c;

    while ((c = getchar()) != EOF)
    {
        if (toupper(c) == 'A')
            c = 'Z';
        else if (isalpha(c))
            c -= 1;
        putchar(c);
    }

    return 0;
}

Note that I used int c; — that's because getchar() returns an int, not just a char. It has to return every possible char value and a separate EOF value, which means that it can't return just a char; it must return an int. A side-effect of that is that all characters in c will be in the range 0..255 (positive) because getchar() returns the character value converted to unsigned char. This means that it is safe to drop the casts in the calls to toupper and isalpha.

I've opted not to use braces around the single-line actions after the if and else clause. This is another style issue. There are those who argue that you should always use braces after if or else (and there are languages such as Perl that insist on braces there), partly on the grounds that if you ever add another statement to the else clause, you might forget to add the necessary braces. I'm not convinced by that argument, but maybe some programmers are careless enough that 'always use braces' really helps prevent bugs.

Wrapping up

There are lots of little details that could be explained and discussed. There are still caveats that could be discussed. However, they're minor enough that you don't need to worry about them at this stage, I think.

You asked about efficiency. The code you wrote is reasonably efficient. There are no gross inefficiencies in it. It is clean, and that is often more important than efficient. Before worrying about efficiency, measure whether there is a performance problem. With this code, there won't be a performance problem — none of the programs is inefficient — for the plausible sizes of input.

If you wanted to be more efficient, you'd print the whole word or line in one operation, rather than using printf("%c", str[i]) to print each character one at a time. You could replace printf("%c", str[i]) with putchar(str[i]) and it would be 'more efficient', but whether you'd be able to measure it is more debatable.

Michael Jackson (no, not the pop-star — but maybe he's before your time) had two rules for optimization (improving the efficiency of code):

  • First rule of optimization: Don't do it.
  • Second rule of optimization (for experts only): Don't do it yet.

Testing

Oh, and it is important to test code. I didn't run your code (or my code) on any data until @NowIGetToLearnWhatAHeadIs pointed out the error in the decoding. That was lazy of me — copying your mistake into my code. Testing is very important!

Fixing the mapping of 'a' to 'Z'

I've decided I'm not keen on the sloppy way my code maps 'a' to 'Z'. It is easy enough to fix. This variant of the third program handles it. It contains an assertion to ensure that the code is not used successfully on an EBCDIC machine.

#include <assert.h>
#include <ctype.h>
#include <stdio.h>

int main(void)
{
    int c;

    assert('Z' - 'A' == 25 && 'z' - 'a' == 25);

    while ((c = getchar()) != EOF)
    {
        if (toupper(c) == 'A')
            c += 25;
        else if (isalpha(c))
            c -= 1;
        putchar(c);
    }

    return 0;
}

Sample input:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
09@?

Sample output:

ZABCDEFGHIJKLMNOPQRSTUVWXY
zabcdefghijklmnopqrstuvwxy
09@?

That's cleaner!

Using C11, you could use a 'static assert' instead of a run-time assertion. This would prevent the program compiling:

static_assert('Z' - 'A' == 25 && 'z' - 'a' == 25,
              "Alphabet should be contiguous but isn't");
share|improve this answer
    
Your implementations decode "BZ" to "AA". For example, ideone.com/Oh93oE. This seems wrong. – NowIGetToLearnWhatAHeadIs 13 hours ago
    
@NowIGetToLearnWhatAHeadIs : the decoding is supposed to be the same as in the question. I'll test compile and check, but I think it would be the same with the original code. Thanks for the heads up. – Jonathan Leffler 13 hours ago
    
I think that is an error in Soha Farhin Pine's code though. I think the decoding algorithm is supposed to invert the encoding algorithm described in the question. Specifically, I think B is supposed to decode to A, A is supposed to decode to Z and Z is supposed to decode to Y (since A encodes to B, Y encodes to Z, and Z encodes to A). – NowIGetToLearnWhatAHeadIs 13 hours ago
    
@NowIGetToLearnWhatAHeadIs: You're right — and I'm only marginally relieved that the decoding error was because I copied the wrong algorithm from the question. My bad for not actually testing the code. All three programs now produce the correct A..Z from B..ZA as input. Thanks for spotting that. I should have been worried about subtract 25 and subtract 1; that isn't quite right — but my senses were dulled by … oh, whatever (it is New Year's Day still). – Jonathan Leffler 13 hours ago
    
@JonathanLeffler Really, really helpful! Thank you a gazillion times! Some of the things you pointed out (e.g. the formatting, or invalid C) were either typos or written because my compiler ignores them. I've not yet finished---nearly at the end---and I have to say, this is a fabulous answer! It must've taken you a lot of time and effort to type all this. I'm really grateful to you for your help. – Soha Farhin Pine 5 hours ago

This reminds me of some old times---doing programming practice and contest with my school, colleagues.

Back to the answer, first of all, main() must return an int value. Some compilers will also accept void.

int main()

Then you cannot use a variable size array, since it must be know in compile time. In your case, you should do:

#include <memory>

....
//char str[size];
char* str = malloc(size + 1);  // also make sure size > 0, and +1 for NULL terminates a string

Inside the for loop, if you are sure all input characters range from A to Z...

str[i] = (str[i] - 'A' + 25) % 26 + 'A';

In the end, don't forget to free the string.

free(str);

I did not try compiling all this, so there might be any error.

share|improve this answer
    
The malloc() doesn't seem to fix the issue: ideone.com/mD5wVW – πάντα ῥεῖ 18 hours ago
2  
How archaic a version of C do you use if variable length arrays are not available to you? – Jonathan Leffler 18 hours ago
2  
[…continuation…] . I'm not convinced that a malloc() solution is the best alternative — I'd go with a fixed large-size (1024 or even 4096) array rather than go with malloc(), especially when the OP is a self-identified youngster. Yes, they'll learn malloc() soon enough (next week probably, if not this week), but it complicates things a bit unnecessarily, especially if you error check the allocation as you should. – Jonathan Leffler 16 hours ago
1  
My fault then, I did not learn C99 before switch to C++, then working in CPP and C mix-up environment, mostly in windows. BTW, to use big size array in stack is not a good idea. – Tiger Hwang 15 hours ago
2  
An array of 1 KiB is not a big stack allocation on any desktop machine these days. If you're working in an embedded environment, or perhaps if you're working with threads, some caution is required, but for single-threaded programs, even Windows provides 1 MiB of stack space by default; it is usually 8 MiB on Unix-based systems. Using just 1 KiB of space is not a serious memory stressor on either of those. – Jonathan Leffler 13 hours ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.