Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

Assume the built-in function Ascii(c), where c is a one character string, will return an integer containing the ASCII value of that character. Write a function that accepts a string, uses Ascii(c) and additional processing to produce a one way hash value from the string, and returns the hash value as a string. You may assume the initial string is not empty and contains several characters.

//one way hash using ascii
static string hashViaAscii(String S)
{
    int hashVal = 0;     

    for (int k = 0; k < S.Length; k++)
        hashVal += Ascii(S[k]);  //gets Ascii value, add them all

    return ((hashVal % HASH_TABLE_SIZE).ToString());  //return sum % table size as string
}

I don't think is a good approach. How can this be improved?

share|improve this question
2  
I just rolled back that edit because it made the question off-topic for the site. You need code for a code review. Plus since you already have answers, you shouldn't be substantively changing the question – Ben Aaronson yesterday
2  
It's trivial to construct an input for any given output. So this is not a one-way function. – CodesInChaos yesterday

I'm confused by the phrase “one way hash value” and so is probably whoever specified this requirement. In cryptography, a one-way hash function has a well-defined meaning.

Let \$\Sigma\$ be an alphabet, for example, \$\Sigma=\{\mathtt{0}, \mathtt{1}\}\$. A family of functions \$(f_k)_{k\in\mathbb{N}}\$ with $$f_k: \Sigma^k \times \Sigma^* \to \Sigma^k$$ is a family of one-way hash functions if and only if for each \$k\in\mathbb{N}\$, the \$f_k\$ is computable by a polynomial-time algorithm and for every probabilistic polynomial-time algorithm \$\mathcal{A}\$, given random \$K\in\Sigma^k\$ and \$h\in\Sigma^k\$, the output \$m^*=\mathcal{A}(K,h)\$ will satisfy \$f_k(K,m^*)=h\$ only with probability negligible in \$k\$.

\$K\$ is called key or salt. It should be chosen randomly but need not be kept secret.

It is unknown whether such functions exist at all but there are some functions for which it hasn't been proved yet that they are not one-way functions.

If your function should compute such a cryptographic hash, I recommend that you look into one of the existing cryptographic hash function even though I doubt that you're supposed to implement one of these. (Although returning the result as a string would be meaningful in this context because for \$k>64\$, you usually cannot return the hash as an integer.) MD5 and SHA1 are two classical examples of cryptographic hash functions that are reasonably straight-forward to implement. The collision resistance (which is a stronger requirement than the one-way property) of both of them is known to be broken in theory. (Both are also only specified for fixed \$k\$ which means that the above definition doesn't really apply.)

Since I see the word “hash table” in your code, I suspect that instead of a cryptographic hash function, they wanted to see a “simple” hash function that is fast to compute and has low probability of collisions but is not collision-resistant against malicious attackers. Again, have a look at algorithms other people already invented.

  • The Fowler-Noll-Vo hash function gives good hashes and is easy to implement. It can also be parametrized on the length of the result. I would recommend the FNV-1a variant as your go-to hash function unless you have reasons to chose differently.
  • Jenkins' one-at-a-time hash function is equally easy to implement and will probably be faster because it only uses additions and bit-wise operations as opposed to multiplications.
  • Weinberger's PJW hash function is known for its use in the UNIX ELF format and for being mentioned in the Red Dragon Book.
  • The lose-lose hash function is the one you've actually implemented. It's name should give you a hint. To your rescue, it was once used as an example by Kernighan and Ritchie.

There are numerous other hash functions and the literature / internet is full of comparisons between them. As you can imagine, while there are clearly better and less good functions available, there is also a lot of personal preference involved in deciding which function to use.

Even if you're planning to use the hash as index for a hash table, don't take the hash modulo the table size in the hash function itself. This will needlessly limit its usefulness. Whoever uses the function should be responsible for wrapping the computed hash into the table size.

share|improve this answer
    
Wow thanks a lot – david yesterday
3  
While the collision resistance of MD5 and SHA1 is broken, they're still pre-image resistant and thus one-way functions. – CodesInChaos yesterday
    
@CodesInChaos Alright, updated the answer. – 5gon12eder yesterday

How can this be improved?

  • The statement of the problem says that Ascii takes a "one character string". You give it a character, not a one-character string.

  • The whole notion of the function Ascii is dumb. A character in the ASCII range is already a number representing its ASCII code. The function is completely unnecessary.

  • The statement of the problem says that the function returns a string. You do that, but boy, is that dumb. Why on earth would you return a number as a string when you could return it as a number???

  • It's not the 1970's anymore. Stop thinking that strings contain ASCII. They contain Unicode.

  • "You may assume the initial string is not empty and contains several characters." Whoever posed the problem forgot that strings can be null.

  • I am thinking based on the incredible badness of the way the question is phrased, that whoever is posing you this question may not be an expert on C#. Consider taking a course from someone who is an expert on the subject matter.

  • Who said anything about a hash table size in the statement of the problem? This comes out of nowhere.

  • Good hash functions produce different values when given inputs are the same characters in a different order. Your hash function produces the same output for "12345", "54321", "51234", and so on. I made that mistake in a hash function once; the function was used to hash zip codes, and so of course most of them hashed to the same buckets. That was bad.

  • C# uses PascalCase for function names and magic constants. SHOUTY_SNAKE_CASE is for C programmers and Java programmers.

  • C# uses lowercase for formal parameters.

share|improve this answer
    
Well I do come from C background. Question says that Ascii function returns integer, but the hash function returns a value as string. How would I modify it to have a good hash function? – david yesterday
2  
Some interesting points raised, but why phrase them in such a combative, insulting way? "You do that, but boy, is that dumb" ... "incredible badness of the way the question is phrased" -- this kind of stuff is what drives a lot of clever, decent people away from the entire field. – Desty yesterday
    
@Desty: The person asking the question lacks the experience to know when they are getting a substandard education from inexpert teachers who pose poor questions. Bluntly stating that is a kindness. – Eric Lippert yesterday
1  
@EricLippert While that is as true as it gets, I disagree that they couldn't have been phrased in a less insulting way while keeping the net effect the same. You could use something less raw than "badness" and losing the extra punctuation. – Tamoghna Chowdhury 22 hours ago
1  
@TamoghnaChowdhury: I encourage you to read paulgraham.com/disagree.html and spend your time above level DH2. Do you have a substantive critique, or are you just policing tone? – Eric Lippert 21 hours ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.