5
\$\begingroup\$

I made this project in my free time to try to crack a 4x hashed md5 with a couple of hints but I'm looking for ways I can optimize this algo because it's pretty slow and I know you could probs use threading or some other ways to increase speed.

If you can even point out some minor ways to increase performance I would be eternally grateful.

I was told I could try re-using the memory for the strings but I'm not sure how to do that. I was also told to not generate the wordlist in advance and try as I go but I dont think the algorithm I used supports that?

Here's the branch to review (more info in the readme): https://github.com/TransmissionsDev/crack_yearn_md5/tree/v1

Here's the main.rs file:

#[cfg(debug_assertions)]
use num_format::{Locale, ToFormattedString};

fn hash_md5(input: String) -> String {
    format!("{:x}", md5::compute(input.into_bytes()))
}

fn main() {
    let goal_hash = "dbba1bfe930d953cabcc03d7b6ab05e";

    let length = 17;

    let alphabet = "bdeilmostu-"
        .split("")
        .map(String::from)
        .filter(|letter| !letter.is_empty())
        .collect::<Vec<String>>();

    let mut words = alphabet.clone();

    for phase in 1..length {
        let mut temp: Vec<String> = Vec::new();

        #[cfg(not(debug_assertions))]
        let loopable_words = words.iter();

        #[cfg(debug_assertions)]
        let loopable_words = words.iter().enumerate();

        for data in loopable_words {
            #[cfg(not(debug_assertions))]
            let word = data;

            #[cfg(debug_assertions)]
            let word = data.1;
            #[cfg(debug_assertions)]
            let index = data.0;

            for letter in alphabet.iter() {
                let new_word = format!("{}{}", word, letter);

                temp.push(new_word);
            }

            #[cfg(debug_assertions)]
            if index != 0 && ((index % 1000000) == 0) {
                println!(
                    "Completed phase {}/{}'s sub-phase {}/{}",
                    phase,
                    length,
                    index.to_formatted_string(&Locale::en),
                    words.len().to_formatted_string(&Locale::en),
                );
            }
        }

        words = temp;
    }

    #[cfg(debug_assertions)]
    let word_list_length = words.len();
    #[cfg(debug_assertions)]
    println!("\n\nLength of word list: {}\n\n", word_list_length);

    #[cfg(not(debug_assertions))]
    let loopable_words = words.iter();

    #[cfg(debug_assertions)]
    let loopable_words = words.iter().enumerate();

    for data in loopable_words {
        #[cfg(not(debug_assertions))]
        let word = data;

        #[cfg(debug_assertions)]
        let word = data.1;
        #[cfg(debug_assertions)]
        let attempts = data.0;

        let merged = format!(
            "{}{}",
            word, "........................................................!1"
        );

        let hash = hash_md5(hash_md5(hash_md5(hash_md5(merged.clone()))));

        #[cfg(debug_assertions)]
        println!(
            "Attempts: {}/{}, Hash: {}, Text: {}",
            attempts, word_list_length, hash, merged
        );

        if hash == goal_hash {
            println!("We found the password: {}", merged);

            return;
        }
    }
}
\$\endgroup\$
1

1 Answer 1

2
\$\begingroup\$

First, I would change the approach fundamentally: I'd consider the plaintext as a 17-digit number in "base 11" (because there are 11 "letters" in the alphabeth). Then the loop becomes a simple iteration through a range from 0 to 1117-1. On top of that, threading will be easy to add using rayon.

As you were told, we should re-use memory for strings and not generate the wordlist in advance.

To re-use memory for strings, we treat them as long-living containers and call .clear() whenever we need to reuse them for another purpose.

This part in your code

for letter in alphabet.iter() {
    let new_word = format!("{}{}", word, letter);

    temp.push(new_word);
}

is very inefficient as it's calling format!, which is complex machinery, just to append a letter. It would be better to have at least String::push there, but we will take another approach below.


On 4 cores, the all-in-one is roughly as fast as just your non-parallelized first pass. For parallelism, I did parallel fold after Range<u64>. So each thread has a little bit of its own state there.

Make sure you add rayon = "0.6" in the Cargo manifest.

Here is the code:

#[cfg(debug_assertions)]
use num_format::{Locale, ToFormattedString};
use std::str;
use std::io::Write;
use rayon::prelude::*;

static PASS_LENGTH: u32 = 17;

#[derive(Default)]
struct State {
    plaintext: Vec<u8>,
    intermediate: Vec<u8>,
}

impl State {
    fn new() -> Self {
        let mut result = State::default();
        for _ in 0 .. PASS_LENGTH {
            result.plaintext.push(b"x"[0]);
        }
        result.plaintext.extend(b"........................................................!1".iter().copied());
        result
    }
}

fn main() {
    let goal_hash = "dbba1bfe930d953cabcc03d7b6ab05e";

    let num_of_md5 = 4;
    let alphabet = "bdeilmostu-";
    let alphabet_len = alphabet.len() as u64;

    let total = alphabet_len.pow(PASS_LENGTH);

    (0 .. total).into_par_iter().fold(
        || State::new(),
        |mut state, num| {
            for i in 0 .. PASS_LENGTH {
                let letter = ((num / alphabet_len.pow(i)) % alphabet_len) as usize;
                state.plaintext[i as usize] = alphabet.as_bytes()[letter];
            }

            state.intermediate.clear();
            write!(&mut state.intermediate, "{:x}", md5::compute(&state.plaintext[..]));
            for _ in 1 .. num_of_md5 {
                let digest = md5::compute(&state.intermediate[..]);
                state.intermediate.clear();
                write!(&mut state.intermediate, "{:x}", digest);
            }
            let result = &state.intermediate[..];

            if result == goal_hash.as_bytes() {
                println!("We found the password: {}", str::from_utf8(&state.plaintext[..]).unwrap());
            }

            if num % 1_000_000 == 0 && num != 0 {
                #[cfg(debug_assertions)]
                println!(
                    "Attempts: {}/{}, Text: {}",
                    num, total, str::from_utf8(&state.plaintext[..]).unwrap()
                );
            }

            state
        }
    ).for_each(|_| {});
}
```
\$\endgroup\$
1
  • 1
    \$\begingroup\$ @TransmissionsDev sure thing. It would be interesting to improve the conversion from number to plaintext probably through smarter division or add-1 with carry.Further, I'm not sure how you'd like to use this program, but it's possible to adapt it for distributed computation or even the cloud. \$\endgroup\$ Commented Dec 12, 2020 at 20:00

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.