# Download Competitive Markov Decision Processes by Jerzy Filar, Koos Vrieze PDF

00, yields = v{3(f*). Thus f* achieves the upper bound v component-wise and hence We now have proved both that f* is an optimal strategy, and that the solution of the optimality equation is unique and equals the value vector of the discounted process r {3. 2. " These methods are discussed in a lot of detail in many books. 1 supplies both the value vector and an optimal control.

Note that the cardinality of this last class is finite and is given by ° N IFDI = II m(s). 1. 1) state 1 state 2 Note that every stationary control in this process is of the form fp = ((p, 1 - p), (1)). That is, Fs = {fplp E [0, I]}. Of course, there are only two deterministic controls, that is, F D = {f1' fa}. The class of Markov controls in this example can be represented as FM = {n = (fa, II,···, ft,·· ·)Ift = fpt for some Pt E [0,1] and t = 0,1, ... }. 6 Behavior and Markov Strategies 53 for every t.

3) state 2 36 2. Markov Decision Processes: The Noncompetitive Case The reader is invited to verify the fact that this limiting average model is indeed irreducible. 3X22] = 0 Xll + X12 + X21 + X22 = 1 Xll Xll, X12, X21, X22 ::::: O. Our next goal is to show that the set X is the "frequency space" of which, in turn, is in 1:1 correspondence with the space of stationary strategies Fs. We shall need the following technical result. 2 Let r 0: be an irreducible AMD model and X be the corresponding polyhedral set defined by (i) -( iii).