jmdp.solvers
Class PolicyIterationSolver<S extends State,A extends Action>

java.lang.Object
  extended by jmdp.solvers.Solver<S,A>
      extended by jmdp.solvers.AbstractInfiniteSolver<S,A>
          extended by jmdp.solvers.AbstractDiscountedSolver<S,A>
              extended by jmdp.solvers.PolicyIterationSolver<S,A>

public class PolicyIterationSolver<S extends State,A extends Action>
extends AbstractDiscountedSolver<S,A>

This class belongs to the set of default solvers included in the jmdp package. It extends Solver and should only be used on INFINITE horizon problems. The objective function the solver uses is the discounted cost. The result is a deterministic optimal policy for the given structure.

Author:
Andres Sarmiento, German Riano - Universidad de Los Andes

Field Summary
protected  double epsilon
           
protected  boolean errorBounds
           
protected  boolean gaussSeidel
           
protected  int iterations
           
protected  java.util.List<S> localStates
           
protected  long processTime
           
 
Fields inherited from class jmdp.solvers.AbstractDiscountedSolver
discountFactor, interestRate
 
Fields inherited from class jmdp.solvers.Solver
policy, printProcessTime, printValueFunction, problem, solved, valueFunction
 
Constructor Summary
PolicyIterationSolver(DTMDP<S,A> problem, double discountFactor)
          The constructor method exclusively receives a problem of the type InfiniteMDP because this solver is only designed to work on infinite horizon problems.
PolicyIterationSolver(DTMDP<S,A> problem, double discountFactor, boolean setModifiedPolicy)
          The constructor method exclusively receives a problem of the type InfiniteMDP because this solver is only designed to work on infinite horizon problems.
 
Method Summary
 jmp.SparseRowColumnMatrix buildMatrix(DecisionRule<S,A> localPolicy)
          This method is, until now, only used by the PolicyIterationSolver.
protected  double future(S i, A a, double discountF)
           
protected  double future(S i, A a, double discountF, ValueFunction<S> vf)
          Expected value of valueFunction for the current state and a specified action.
 double getIncreasingFactor()
           
 double getInitialIterations()
           
 int getIterations()
           
 long getProcessTime()
           
 void setIncreasingFactor(double increasingFactor)
          Sets the increasing factor of the maximum iterations of the Modified policy iteration method.
 void setInitialIterations(double initialIterations)
          Sets maximum iterations for the first run of the modified policy iteration.
 void setModifiedPolicy(boolean val)
          Activates the modified policy iteration algorithm.
 Solution<S,A> solve()
          Policy Iteration is a solver method this is always convergent in a finite number of iterations.
protected  ValueFunction<S> solveMatrix(DecisionRule<S,A> localDecisionRule)
          This method is used by the PolicyIterationSolver to solve the linear system of equations to determine the value functions of each state for a given policy.
protected  ValueFunction<S> solveMatrixModified(DecisionRule<S,A> localDecisionRule)
          This method is used by the PolicyIterationSolver to solve the linear system of equations to determine the value functions of each state for a given policy.
 java.lang.String toString()
          The sub classes must return the Solver name.
 
Methods inherited from class jmdp.solvers.AbstractDiscountedSolver
getInterestRate, setDiscountFactor, setInterestRate
 
Methods inherited from class jmdp.solvers.AbstractInfiniteSolver
getDiscreteProblem, getProblem, printSolution
 
Methods inherited from class jmdp.solvers.Solver
getOptimalPolicy, getOptimalValueFunction, getValueFunction, isSolved, printSolution, setPrintProcessTime, setPrintValueFunction
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

localStates

protected java.util.List<S extends State> localStates

iterations

protected int iterations

processTime

protected long processTime

epsilon

protected double epsilon

gaussSeidel

protected boolean gaussSeidel

errorBounds

protected boolean errorBounds
Constructor Detail

PolicyIterationSolver

public PolicyIterationSolver(DTMDP<S,A> problem,
                             double discountFactor)
The constructor method exclusively receives a problem of the type InfiniteMDP because this solver is only designed to work on infinite horizon problems. Until now, this solver solves the discounted objective function problem. Other objective functions are still under development.

Parameters:
problem - the structure of the problem of type InfiniteMDP
discountFactor - represents how much less is the reward received in the next period instead of receiving it in the present period.

PolicyIterationSolver

public PolicyIterationSolver(DTMDP<S,A> problem,
                             double discountFactor,
                             boolean setModifiedPolicy)
The constructor method exclusively receives a problem of the type InfiniteMDP because this solver is only designed to work on infinite horizon problems. Until now, this solver solves the discounted objective function problem. Other objective functions are still under development.

Parameters:
problem - the structure of the problem of type InfiniteMDP
discountFactor - represents how much less is the reward received in the next period instead of receiving it in the present period.
Method Detail

getIncreasingFactor

public double getIncreasingFactor()
Returns:
increasing factor of the maximum iterations.

setIncreasingFactor

public void setIncreasingFactor(double increasingFactor)
Sets the increasing factor of the maximum iterations of the Modified policy iteration method. The first iterations are a vague aproximation to the real value functions and need not be exhaustive. But the last iterations must refine the value functions in order to get better precision. The increasing factor determines how many iteratinos are to be done in each iteration. Faster growth will be more precise but computationaly more expensive.

Parameters:
increasingFactor - greater that 1. Determines max iterations growth.

getInitialIterations

public double getInitialIterations()
Returns:
initial maximum iterations of the modified policy iteration algorithm

setInitialIterations

public void setInitialIterations(double initialIterations)
Sets maximum iterations for the first run of the modified policy iteration.

Parameters:
initialIterations -

solve

public Solution<S,A> solve()
Policy Iteration is a solver method this is always convergent in a finite number of iterations. The algorithm has to solve a linear system of equations as big as the amount of states. When there are too many states, it is recommendable to use other solvers. The advantage of using Policy Iteration is that the result is the true optimal solution and not an aproximation as in other common methods. The method starts with a policy. It solves the system of linear equations for the value functions for that policy. With this values it looks for a better policy. It then solves the value functions again and looks for a better policy. If this policy is equal to the last policy tried, it stops, in any other case it keeps improving the policy and updating the value functions.

Specified by:
solve in class Solver<S extends State,A extends Action>

buildMatrix

public jmp.SparseRowColumnMatrix buildMatrix(DecisionRule<S,A> localPolicy)
This method is, until now, only used by the PolicyIterationSolver. It builds the Probability Transision matrix for a specified policy. The solver then transforms this matrix and uses it to solve the value functions for each state. The method is declared public because it may be very helpful in the determination of measures of performance for the system. But this feature has not yet been developed.

Parameters:
localPolicy - the policy under which the probability matrix is to be built.
Returns:
the (Discrete time) probability matrix.

solveMatrix

protected ValueFunction<S> solveMatrix(DecisionRule<S,A> localDecisionRule)
This method is used by the PolicyIterationSolver to solve the linear system of equations to determine the value functions of each state for a given policy.

Returns:
a DenseVector (type defined in the JMP package documentation) with the value functions for each state. The index for each state are the same ones determined in the localStates ArrayList declared as static.

solveMatrixModified

protected ValueFunction<S> solveMatrixModified(DecisionRule<S,A> localDecisionRule)
This method is used by the PolicyIterationSolver to solve the linear system of equations to determine the value functions of each state for a given policy.

Returns:
a DenseVector (type defined in the JMP package documentation) with the value functions for each state. The index for each state are the same ones determined in the localStates ArrayList declared as static.

future

protected final double future(S i,
                              A a,
                              double discountF,
                              ValueFunction<S> vf)
                       throws java.lang.NullPointerException
Expected value of valueFunction for the current state and a specified action.

Parameters:
discountF - is the rate for discounting from one period to another. It means how much less it would represent to receive one unit of the reward in the next period instead of receiving it in the present period.
Throws:
java.lang.NullPointerException

future

protected double future(S i,
                        A a,
                        double discountF)

setModifiedPolicy

public void setModifiedPolicy(boolean val)
Activates the modified policy iteration algorithm.


toString

public java.lang.String toString()
Description copied from class: Solver
The sub classes must return the Solver name.

Specified by:
toString in class Solver<S extends State,A extends Action>
See Also:
Object.toString()

getProcessTime

public final long getProcessTime()
Specified by:
getProcessTime in class Solver<S extends State,A extends Action>
Returns:
Returns the processTime.

getIterations

public final int getIterations()
Specified by:
getIterations in class AbstractInfiniteSolver<S extends State,A extends Action>
Returns:
Returns the iterations.