Counterfactual Regret Minimization is an algorithm that can be used to find the Nash Equilibrium for games of incomplete information. I have tried to adapt the exercise from here:
http://cs.gettysburg.edu/~tneller/modelai/2013/cfr/index.html
to Clojure. You can see the original RPSTrainer.java
, my first functional version of the algorithm rps.clj
and finally a version that I tried to tweak for performance rps_tweak.clj
all at:
https://gist.github.com/luxbock/da22767ef16af6ebc5dc
Here is the tweaked version:
(ns cfr.rps-tweak
(:require [clojure.core.matrix :as m]
[primitive-math :as pm]))
(set! *warn-on-reflection* true)
(m/set-current-implementation :vectorz)
(defn create-utility-fn
"Given a payoff-matrix creates a utility function for the game. The utility
function accepts two strategy vectors as its arguments and returns the utility
for the first player in question."
[m]
(fn ^double [sa sb]
(let [prob-m
(m/compute-matrix
(map m/ecount [sa sb])
#(pm/* ^double (m/mget sa %1) ^double (m/mget sb %2)))]
(m/ereduce + (m/mul prob-m m)))))
(defn regret
"Given a utility function and three strategy vectors, returns the regret for
player having played his strategy `sa' instead of `sb' against his opponents `so'"
[uf sa sb so]
(pm/- ^double (uf sb so) ^double (uf sa so)))
(defn action-profile [n]
"An action profile is the list of pure strategies available to a player."
(map #(m/mset (repeat n 0) % 1) (range n)))
(defn regret-profile
"Given a utility function and strategies for both players, this function
returns the regret for all the pure-strategies the first player could have
played, including the strategy he did play."
[uf sa so]
(map #(regret uf sa % so) (action-profile (m/ecount sa))))
(defn normalise-strategies
[nsum strat]
(if (pm/> ^double nsum 0.0)
(map #(pm/div ^double % ^double nsum) strat)
(repeat (m/ecount strat) (pm/div (m/ecount strat)))))
(defn new-strategy
"Creates a new strategy based on the regrets experienced by the player."
[rgr]
(let [n (m/ecount rgr)
strat (map #(if (pos? (m/mget rgr %)) (m/mget rgr %) 0) (range n))
nsum (reduce + strat)]
(normalise-strategies nsum strat)))
(defn cumulative-probabilities
"Takes a collection of probabilities (that sum up to one) and turns it into a
sequence of cumulative probabilities."
[coll]
(reduce #(conj %1 (+ %2 (last %1))) [0] coll))
(defn choose-action
"Given a strategy vector, chooses an action to play based on its probability."
[^doubles strat]
(let [cp (cumulative-probabilities strat)
r (rand)
index (pm/dec ^long (m/ecount (take-while #(pm/> ^double r ^double %) cp)))]
(m/mset (repeat (m/ecount strat) 0) index 1)))
(defn avarage-strategy
"Given a vector where each index maps to how often a certain strategy has been
played, returns the frequency of each strategy as a part of the total."
[ssum]
(let [nsum (reduce + ssum)]
(normalise-strategies nsum ssum)))
(defn cfrm-be
"Given a utility function, number of iterations and a strategy for the
opponent, performs the Counterfactual Regret Minimization algorithm to find
the best response to the strategy in question."
[uf n sb]
(let [n (int n)]
(loop [i (int 0)
reg-a (m/array [0 0 0])
ssum (m/array [0 0 0])]
(if (pm/== i n)
(avarage-strategy ssum)
(let [strat-a (choose-action (new-strategy reg-a))
strat-b sb]
(recur (pm/inc i)
(m/add reg-a (regret-profile uf strat-a strat-b))
(m/add ssum strat-a)))))))
(defn cfrm-ne
"Given a utility function and a number of iterations to perform, performs the
Counterfactual Regret Minimization algorithm to find an approximation of the
Nash Equilibrium for the game."
[uf n]
(let [n (int n)]
(loop [i (int 0)
reg-a (m/array [0 0 0])
reg-b (m/array [0 0 0])
ssum (m/array [0 0 0])]
(if (pm/== i n)
(avarage-strategy ssum)
(let [strat-a (choose-action (new-strategy reg-a))
strat-b (choose-action (new-strategy reg-b))]
(recur (pm/inc i)
(m/add reg-a (regret-profile uf strat-a strat-b))
(m/add reg-b (regret-profile uf strat-b strat-a))
(m/add ssum strat-a)))))))
(comment
(def rps
(create-utility-fn [[0, -1, 1]
[1, 0, -1]
[-1, 1, 0]]))
(cfrm-ne rps 100000)
)
The tweaked version performs about 3.5x faster than rps.clj
, but it's still two orders of magnitude away from the original Java implementation. This is not that surprising given that the two versions are doing very different things. Still I wonder are there any other improvements I could make for speed without having to write Java in Clojure, in which case I would probably just write Java in Java and call it from Clojure? If I were to build an application that relied on the performance of an algorithm such as the one above, would I be better off doing the performance critical things in Java and then just using Clojure as glue code for the rest?
EDIT: I made two significant improvements at some cost to the flexibility of the program:
- The utility function is now a simple lookup from the payoff matrix, which works because
choose-action
always returns a pure strategy. Creating a probability matrix is therefore unnecessary. - As a result the arguments to the utility-function can simply be indices of the payoff matrix instead of arrays.
These two changes gave me a speedup roughly by a factor of 10x.
Using mutable arrays to store the regrets and strategy-sum in the main loop also performs just a tiny bit faster, but the difference is surprisingly miniscule:
cfr.rps-moar> (bench (cfrm-ne rps 100000))
WARNING: Final GC required 2.974887513309308 % of runtime
Evaluation count : 60 in 60 samples of 1 calls.
Execution time mean : 1.285734 sec
Execution time std-deviation : 16.892914 ms
Execution time lower quantile : 1.263259 sec ( 2.5%)
Execution time upper quantile : 1.321674 sec (97.5%)
Overhead used : 1.982317 ns
Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
vs.
cfr.rps-muta> (bench (cfrm-ne rps 100000))
WARNING: Final GC required 3.27423659048163 % of runtime
Evaluation count : 60 in 60 samples of 1 calls.
Execution time mean : 1.217206 sec
Execution time std-deviation : 7.875121 ms
Execution time lower quantile : 1.203429 sec ( 2.5%)
Execution time upper quantile : 1.236352 sec (97.5%)
Overhead used : 1.968409 ns
Found 4 outliers in 60 samples (6.6667 %)
low-severe 4 (6.6667 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Here is the mutable version:
(ns cfr.rps-muta
(:require [clojure.core.matrix :as m]
[primitive-math :as pm]))
(set! *warn-on-reflection* true)
(m/set-current-implementation :vectorz)
(defn create-utility-fn
"Given a payoff-matrix creates a utility function for the game. The utility
function accepts two strategy vectors as its arguments and returns the utility
for the first player in question."
[m]
(fn ^double [sa sb]
(m/mget m sa sb)))
(defn regret
"Given a utility function and three strategy vectors, returns the regret for
player having played his strategy `sa' instead of `sb' against his opponents `so'"
[uf sa sb so]
(pm/- ^double (uf sb so) ^double (uf sa so)))
(defn regret-profile
"Given a utility function and strategies for both players, this function
returns the regret for all the pure-strategies the first player could have
played, including the strategy he did play."
[uf sa so ns]
(map #(regret uf sa % so) (range ns)))
(defn normalise-strategies
[nsum strat]
(if (pm/> ^double nsum 0.0)
(map #(pm/div ^double % ^double nsum) strat)
(repeat (m/ecount strat) (pm/div (m/ecount strat)))))
(defn new-strategy
"Creates a new strategy based on the regrets experienced by the player."
[rgr]
(let [n (m/ecount rgr)
strat (map #(if (pos? (m/mget rgr %)) (m/mget rgr %) 0) (range n))
nsum (reduce + strat)]
(normalise-strategies nsum strat)))
(defn cumulative-probabilities
"Takes a collection of probabilities (that sum up to one) and turns it into a
sequence of cumulative probabilities."
[coll]
(reduce #(conj %1 (+ %2 (last %1))) [0] coll))
(defn choose-action
"Given a strategy vector, chooses an action to play based on its probability."
[^doubles strat]
(let [cp (cumulative-probabilities strat)
r (rand)]
(pm/dec ^long (m/ecount (take-while #(pm/> ^double r ^double %) cp)))))
(defn avarage-strategy
"Given a vector where each index maps to how often a certain strategy has been
played, returns the frequency of each strategy as a part of the total."
[ssum]
(let [nsum (reduce + ssum)]
(normalise-strategies nsum ssum)))
(defn cfrm-be
"Given a utility function, number of iterations and a strategy for the
opponent, performs the Counterfactual Regret Minimization algorithm to find
the best response to the strategy in question."
[m n sb]
(let [n (int n)
uf (create-utility-fn m)
reg-a (m/array [0.0 0.0 0.0])
ssum (m/array [0.0 0.0 0.0])
[sca scb] (m/shape m)]
(loop [i (int 0)]
(if (pm/== i n)
(avarage-strategy ssum)
(let [strat-a (choose-action (new-strategy reg-a))
strat-b sb]
(m/add! reg-a (regret-profile uf strat-a strat-b sca))
(m/add! ssum strat-a)
(recur (pm/inc i)))))))
(defn cfrm-ne
"Given a utility function and a number of iterations to perform, performs the
Counterfactual Regret Minimization algorithm to find an approximation of the
Nash Equilibrium for the game."
[m n]
(let [n (int n)
uf (create-utility-fn m)
reg-a (m/array [0.0 0.0 0.0])
reg-b (m/array [0.0 0.0 0.0])
ssum (m/array [0.0 0.0 0.0])
[sca scb] (m/shape m)]
(loop [i (int 0)]
(if (pm/== i n)
(avarage-strategy ssum)
(let [strat-a (choose-action (new-strategy reg-a))
strat-b (choose-action (new-strategy reg-b))]
(m/add! reg-a (regret-profile uf strat-a strat-b sca))
(m/add! reg-b (regret-profile uf strat-b strat-a scb))
(m/add! ssum strat-a)
(recur (pm/inc i)))))))
(comment
(def rps
[[0, -1, 1]
[1, 0, -1]
[-1, 1, 0]])
(cfrm-ne rps 100000)
)
I can't think of anything more I could do. It still runs about 10x slower than the original Java implementation, but I think my version is more general and nicer to read.