Code Review Stack Exchange is a question and answer site for peer programmer code reviews. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

This program forms the reducer of a Hadoop MapReduce job. It reads data in from stdin that is tab delimited.

foo    1
foo    1
bar    1

and outputs

foo    2
bar    1

Any suggestions for improvements?

(use '[clojure.string :only [split]])
(def reducer (atom {}))

(defn update-map [map key]
  (merge-with + map {key 1}))

(doseq [line (line-seq (java.io.BufferedReader. *in*))]
  (let [k (first (split line #"\t"))]
    (swap! reducer update-map k)))

(doseq [kv @reducer]
  (println (format "%s\t%s" (first kv) (second kv))))
share|improve this question

probably a bit too late to help OP, but in case anyone else stumbles upon this question, here's a nice succinct way of doing it, using the frequencies function:

(doseq [[word freq] (frequencies
                      (map
                        #(re-find #"^[^\t]+" %) ;; just get the first non-tab characters
                        (line-seq (java.io.BufferedReader. *in*))))]
  (println (str word "\t" freq)))
share|improve this answer

Why don't you use reduce instead of the first doseq? Something along the lines (untested, entered directly here):

(def response
  (reduce (fn [map line]
            (let [k (fist (split line #"\t"))]
               (update-map map k)))
          {} (line-seq (java.io.BufferedReader. *in*)))

(doseq [kv response]
  (println (format "%s\t%s" (first kv) (second kv))))

Then you won't need the atom either.

share|improve this answer

Can ouput contain numbers other than 1? Like:

foo 1
foo 3
bar 10

If so, then:

(use '[clojure.string :only [split]])

(def parsed-input
  (for [line (line-seq (java.io.BufferedReader. *in*))
        :let [[k v] (split line #"\t")]]
    {k (Double/parseDouble v)}))

(def table (apply (partial merge-with + {}) parsed-input))

(doseq [[k v] table]
  (println (str k "\t" v)))

Outputs:

bar     10.0
foo     4.0

If it's just 1's frequencies will do as suggested.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.