Translating CFRM algorithm from Java to Clojure, + improving performance

Question

Counterfactual Regret Minimization is an algorithm that can be used to find the Nash Equilibrium for games of incomplete information. I have tried to adapt the exercise from here:

http://cs.gettysburg.edu/~tneller/modelai/2013/cfr/index.html

to Clojure. You can see the original RPSTrainer.java, my first functional version of the algorithm rps.clj and finally a version that I tried to tweak for performance rps_tweak.clj all at:

https://gist.github.com/luxbock/da22767ef16af6ebc5dc

Here is the tweaked version:

(ns cfr.rps-tweak
  (:require [clojure.core.matrix :as m]
            [primitive-math :as pm]))

(set! *warn-on-reflection* true)
(m/set-current-implementation :vectorz)

(defn create-utility-fn
  "Given a payoff-matrix creates a utility function for the game. The utility
  function accepts two strategy vectors as its arguments and returns the utility
  for the first player in question."
  [m]
  (fn ^double [sa sb]
    (let [prob-m
          (m/compute-matrix
           (map m/ecount [sa sb])
           #(pm/* ^double (m/mget sa %1) ^double (m/mget sb %2)))]
      (m/ereduce + (m/mul prob-m m)))))

(defn regret
  "Given a utility function and three strategy vectors, returns the regret for
  player having played his strategy `sa' instead of `sb' against his opponents `so'"
  [uf sa sb so]
  (pm/- ^double (uf sb so) ^double (uf sa so)))

(defn action-profile [n]
  "An action profile is the list of pure strategies available to a player."
  (map #(m/mset (repeat n 0) % 1) (range n)))

(defn regret-profile
  "Given a utility function and strategies for both players, this function
  returns the regret for all the pure-strategies the first player could have
  played, including the strategy he did play."
  [uf sa so]
  (map #(regret uf sa % so) (action-profile (m/ecount sa))))

(defn normalise-strategies
  [nsum strat]
  (if (pm/> ^double nsum 0.0)
    (map #(pm/div ^double % ^double nsum) strat)
    (repeat (m/ecount strat) (pm/div (m/ecount strat)))))

(defn new-strategy
  "Creates a new strategy based on the regrets experienced by the player."
  [rgr]
  (let [n     (m/ecount rgr)
        strat (map #(if (pos? (m/mget rgr %)) (m/mget rgr %) 0) (range n))
        nsum  (reduce + strat)]
    (normalise-strategies nsum strat)))

(defn cumulative-probabilities
  "Takes a collection of probabilities (that sum up to one) and turns it into a
  sequence of cumulative probabilities."
  [coll]
  (reduce #(conj %1 (+ %2 (last %1))) [0] coll))

(defn choose-action
  "Given a strategy vector, chooses an action to play based on its probability."
  [^doubles strat]
  (let [cp    (cumulative-probabilities strat)
        r     (rand)
        index (pm/dec ^long (m/ecount (take-while #(pm/> ^double r ^double %) cp)))]
    (m/mset (repeat (m/ecount strat) 0) index 1)))

(defn avarage-strategy
  "Given a vector where each index maps to how often a certain strategy has been
  played, returns the frequency of each strategy as a part of the total."
  [ssum]
  (let [nsum (reduce + ssum)]
    (normalise-strategies nsum ssum)))

(defn cfrm-be
  "Given a utility function, number of iterations and a strategy for the
  opponent, performs the Counterfactual Regret Minimization algorithm to find
  the best response to the strategy in question."
  [uf n sb]
  (let [n (int n)]
    (loop [i     (int 0)
           reg-a (m/array [0 0 0])
           ssum  (m/array [0 0 0])]
      (if (pm/== i n)
        (avarage-strategy ssum)
        (let [strat-a (choose-action (new-strategy reg-a))
              strat-b sb]
          (recur (pm/inc i)
                 (m/add reg-a (regret-profile uf strat-a strat-b))
                 (m/add ssum strat-a)))))))

(defn cfrm-ne
  "Given a utility function and a number of iterations to perform, performs the
  Counterfactual Regret Minimization algorithm to find an approximation of the
  Nash Equilibrium for the game."
  [uf n]
  (let [n (int n)]
    (loop [i      (int 0)
           reg-a  (m/array [0 0 0])
           reg-b  (m/array [0 0 0])
           ssum   (m/array [0 0 0])]
      (if (pm/== i n)
        (avarage-strategy ssum)
        (let [strat-a (choose-action (new-strategy reg-a))
              strat-b (choose-action (new-strategy reg-b))]
          (recur (pm/inc i)
                 (m/add reg-a (regret-profile uf strat-a strat-b))
                 (m/add reg-b (regret-profile uf strat-b strat-a))
                 (m/add ssum strat-a)))))))

(comment

  (def rps
    (create-utility-fn [[0, -1, 1]
                        [1, 0, -1]
                        [-1, 1, 0]]))

  (cfrm-ne rps 100000)

)

The tweaked version performs about 3.5x faster than rps.clj, but it's still two orders of magnitude away from the original Java implementation. This is not that surprising given that the two versions are doing very different things. Still I wonder are there any other improvements I could make for speed without having to write Java in Clojure, in which case I would probably just write Java in Java and call it from Clojure? If I were to build an application that relied on the performance of an algorithm such as the one above, would I be better off doing the performance critical things in Java and then just using Clojure as glue code for the rest?

EDIT: I made two significant improvements at some cost to the flexibility of the program:

The utility function is now a simple lookup from the payoff matrix, which works because choose-action always returns a pure strategy. Creating a probability matrix is therefore unnecessary.
As a result the arguments to the utility-function can simply be indices of the payoff matrix instead of arrays.

These two changes gave me a speedup roughly by a factor of 10x.

Using mutable arrays to store the regrets and strategy-sum in the main loop also performs just a tiny bit faster, but the difference is surprisingly miniscule:

cfr.rps-moar> (bench (cfrm-ne rps 100000))
WARNING: Final GC required 2.974887513309308 % of runtime

Evaluation count : 60 in 60 samples of 1 calls.
             Execution time mean : 1.285734 sec
    Execution time std-deviation : 16.892914 ms
   Execution time lower quantile : 1.263259 sec ( 2.5%)
   Execution time upper quantile : 1.321674 sec (97.5%)
                   Overhead used : 1.982317 ns

Found 1 outliers in 60 samples (1.6667 %)
    low-severe   1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

vs.

cfr.rps-muta> (bench (cfrm-ne rps 100000))
WARNING: Final GC required 3.27423659048163 % of runtime
Evaluation count : 60 in 60 samples of 1 calls.
             Execution time mean : 1.217206 sec
    Execution time std-deviation : 7.875121 ms
   Execution time lower quantile : 1.203429 sec ( 2.5%)
   Execution time upper quantile : 1.236352 sec (97.5%)
                   Overhead used : 1.968409 ns

Found 4 outliers in 60 samples (6.6667 %)
    low-severe   4 (6.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

Here is the mutable version:

(ns cfr.rps-muta
  (:require [clojure.core.matrix :as m]
            [primitive-math :as pm]))

(set! *warn-on-reflection* true)
(m/set-current-implementation :vectorz)

(defn create-utility-fn
  "Given a payoff-matrix creates a utility function for the game. The utility
  function accepts two strategy vectors as its arguments and returns the utility
  for the first player in question."
  [m]
  (fn ^double [sa sb]
    (m/mget m sa sb)))

(defn regret
  "Given a utility function and three strategy vectors, returns the regret for
  player having played his strategy `sa' instead of `sb' against his opponents `so'"
  [uf sa sb so]
  (pm/- ^double (uf sb so) ^double (uf sa so)))

(defn regret-profile
  "Given a utility function and strategies for both players, this function
  returns the regret for all the pure-strategies the first player could have
  played, including the strategy he did play."
  [uf sa so ns]
  (map #(regret uf sa % so) (range ns)))

(defn normalise-strategies
  [nsum strat]
  (if (pm/> ^double nsum 0.0)
    (map #(pm/div ^double % ^double nsum) strat)
    (repeat (m/ecount strat) (pm/div (m/ecount strat)))))

(defn new-strategy
  "Creates a new strategy based on the regrets experienced by the player."
  [rgr]
  (let [n     (m/ecount rgr)
        strat (map #(if (pos? (m/mget rgr %)) (m/mget rgr %) 0) (range n))
        nsum  (reduce + strat)]
    (normalise-strategies nsum strat)))

(defn cumulative-probabilities
  "Takes a collection of probabilities (that sum up to one) and turns it into a
  sequence of cumulative probabilities."
  [coll]
  (reduce #(conj %1 (+ %2 (last %1))) [0] coll))

(defn choose-action
  "Given a strategy vector, chooses an action to play based on its probability."
  [^doubles strat]
  (let [cp    (cumulative-probabilities strat)
        r     (rand)]
    (pm/dec ^long (m/ecount (take-while #(pm/> ^double r ^double %) cp)))))

(defn avarage-strategy
  "Given a vector where each index maps to how often a certain strategy has been
  played, returns the frequency of each strategy as a part of the total."
  [ssum]
  (let [nsum (reduce + ssum)]
    (normalise-strategies nsum ssum)))

(defn cfrm-be
  "Given a utility function, number of iterations and a strategy for the
  opponent, performs the Counterfactual Regret Minimization algorithm to find
  the best response to the strategy in question."
  [m n sb]
  (let [n         (int n)
        uf        (create-utility-fn m)
        reg-a     (m/array [0.0 0.0 0.0])
        ssum      (m/array [0.0 0.0 0.0])
        [sca scb] (m/shape m)]
    (loop [i (int 0)]
      (if (pm/== i n)
        (avarage-strategy ssum)
        (let [strat-a (choose-action (new-strategy reg-a))
              strat-b sb]
          (m/add! reg-a (regret-profile uf strat-a strat-b sca))
          (m/add! ssum strat-a)
          (recur (pm/inc i)))))))

(defn cfrm-ne
  "Given a utility function and a number of iterations to perform, performs the
  Counterfactual Regret Minimization algorithm to find an approximation of the
  Nash Equilibrium for the game."
  [m n]
  (let [n         (int n)
        uf        (create-utility-fn m)
        reg-a     (m/array [0.0 0.0 0.0])
        reg-b     (m/array [0.0 0.0 0.0])
        ssum      (m/array [0.0 0.0 0.0])
        [sca scb] (m/shape m)]
    (loop [i (int 0)]
      (if (pm/== i n)
        (avarage-strategy ssum)
        (let [strat-a (choose-action (new-strategy reg-a))
              strat-b (choose-action (new-strategy reg-b))]
          (m/add! reg-a (regret-profile uf strat-a strat-b sca))
          (m/add! reg-b (regret-profile uf strat-b strat-a scb))
          (m/add! ssum strat-a)
          (recur (pm/inc i)))))))

(comment

  (def rps
    [[0, -1, 1]
     [1, 0, -1]
     [-1, 1, 0]])

  (cfrm-ne rps 100000)

)

I can't think of anything more I could do. It still runs about 10x slower than the original Java implementation, but I think my version is more general and nicer to read.

I don't speak Clojure, but I have to thank you a lot for the very interesting article! I'm reading up on this subject a lot now and checking out the Java code, and doing some exercises. — Simon André Forsberg, May 9 at 20:28

asked	1 month ago
viewed	123 times

current community

your communities

more stack exchange communities

Translating CFRM algorithm from Java to Clojure, + improving performance

Your Answer

Browse other questions tagged java algorithm performance clojure or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Translating CFRM algorithm from Java to Clojure, + improving performance

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged java algorithm performance clojure or ask your own question.

Related

Hot Network Questions