There have been already many similar questions to this one (e.g. this and this, just to point two), but I did not find any for clojure.
The goal is to compress a string in a RLE-like way: i.e. each character "run" is replaced by the character and the count of the run. Some examples:
aaaabbccc --> a4b2c3
b --> b1
aaaaaaaaaaaa --> a9a3
Some remarks:
- For now, I do not care about the possible optimisation that runs with the length of 1 can omit the count. (E.g. the output could be
b
instead ofb1
.) - If there are more than
n > 9
characters in a run, then it is split up into two or more runs, like9 + 9 + 9 + ... + k = n
, wherek <= 9
. - For now, I do not care if the "compressed" string will be longer than the original one (this is the case e.g. for an input containing only single characters, like:
abcde --> a1b1c1d1e1
).
That said, please see my implementation below:
Source
(defn countrepetitions [s]
(if (empty? s) 0
(loop [acts (vec s) ch (first acts) cnt 1]
(if (= ch (first(rest acts)))
(recur (rest acts) ch (inc cnt))
cnt))))
(defn mycompress [s]
(loop [acts s encoded "" cnt1 (countrepetitions acts)]
(if (= 0 cnt1)
encoded
(let [cnt2 (if (< cnt1 10) cnt1 9)]
(recur (subs acts cnt2) (str encoded (first acts) cnt2) (countrepetitions (subs acts cnt2)))))))
Tests
(deftest a-test
(testing "compressor"
(is (= "" (mycompress "")))
(is (= "a2" (mycompress "aa")))
(is (= "a1b8a1c7a3" (mycompress "abbbbbbbbacccccccaaa")))
(is (= "b9b3" (mycompress "bbbbbbbbbbbb")))))
Questions
In particular, I would be interested in the following, but any other remark/suggestion is also welcome:
- Is there a more efficient way (either in terms of readability or speed) to implement
countrepetitions
, or maybe leave it out completely? - Are there any other tests worth adding? Or can you find a "counter-example" (i.e. a test case, for which the implementation does not work correctly)?
(map (juxt first count) (partition-by identity "abbbbbbbbacccccccaaa"))
– Shlomi Jan 17 at 19:38(apply str (flatten (map (juxt first count) (apply concat (map #(partition-all 9 %) (partition-by identity s)))))))
. Definitely shorter then my original attempt :) – Attilio Jan 19 at 19:10