jmarkov.jmdp.solvers
Class PolicyIterationSolver<S extends State,A extends Action>

java.lang.Object
  extended by jmarkov.jmdp.solvers.Solver<S,A>
      extended by jmarkov.jmdp.solvers.AbstractInfiniteSolver<S,A>
          extended by jmarkov.jmdp.solvers.AbstractDiscountedSolver<S,A>
              extended by jmarkov.jmdp.solvers.PolicyIterationSolver<S,A>
Type Parameters:
S - States class.
A - Actions class.
All Implemented Interfaces:
JMarkovElement

public class PolicyIterationSolver<S extends State,A extends Action>
extends AbstractDiscountedSolver<S,A>

This class solves infinite horizon discounted problems using the policy iteration algorithm. It extends Solver and should only be used on infinite horizon problems. The objective function the solver uses is the discounted cost. The result is a deterministic optimal policy for the given structure. Policy Iteration is a solver method this is always convergent in a finite number of iterations. The algorithm has to solve a linear system of equations as big as the amount of states. When there are too many states, it is recommendable to use other solvers, or using the modified policy iteration (by using the second constructor). Policy Iteration is a solver method this is always convergent in a finite number of iterations. The algorithm has to solve a linear system of equations as big as the amount of states. When there are too many states, it is recommendable to use other solvers. The advantage of using Policy Iteration is that the result is the true optimal solution and not an aproximation as in other common methods. The method starts with a policy. It solves the system of linear equations for the value functions for that policy. With this values it looks for a better policy. It then solves the value functions again and looks for a better policy. If this policy is equal to the last policy tried, it stops, in any other case it keeps improving the policy and updating the value functions.

Author:
Andres Sarmiento, Germán Riaño - Universidad de Los Andes

Field Summary
protected  long iterations
          Used to store the number of iterations
protected  long processTime
          Used to store process time
 
Fields inherited from class jmarkov.jmdp.solvers.AbstractDiscountedSolver
discountFactor
 
Fields inherited from class jmarkov.jmdp.solvers.Solver
policy, printProcessTime, printValueFunction, problem, solved, valueFunction
 
Constructor Summary
PolicyIterationSolver(DTMDP<S,A> problem, double discountFactor)
          The constructor method exclusively receives a problem of the type InfiniteMDP because this solver is only designed to work on infinite horizon problems.
PolicyIterationSolver(DTMDP<S,A> problem, double discountFactor, boolean setModifiedPolicy)
          The constructor method exclusively receives a problem of the type InfiniteMDP because this solver is only designed to work on infinite horizon problems.
 
Method Summary
 java.lang.String description()
          This method return a complete verbal describtion of this element.
 double getIncreasingFactor()
           
 double getInitialIterations()
           
 long getIterations()
           
 long getProcessTime()
           
 java.lang.String label()
          The sub classes must return the Solver name.
 void setIncreasingFactor(double increasingFactor)
          Sets the increasing factor of the maximum iterations of the Modified policy iteration method.
 void setInitialIterations(int initialIterations)
          Sets maximum iterations for the first run of the modified policy iteration.
 void setModifiedPolicy(boolean val)
          Activates the modified policy iteration algorithm.
 Solution<S,A> solve()
          Called to solve the problem.
protected  ValueFunction<S> solveMatrix()
          This method is used by the PolicyIterationSolver to solve the linear system of equations to determine the value functions of each state for a given policy.
protected  ValueFunction<S> solveMatrixModified(DecisionRule<S,A> localDecisionRule)
          This method is used by the PolicyIterationSolver to solve the linear system of equations to determine the value functions of each state for a given policy.
 
Methods inherited from class jmarkov.jmdp.solvers.AbstractDiscountedSolver
future, future, getInterestRate, setDiscountFactor, setInterestRate
 
Methods inherited from class jmarkov.jmdp.solvers.AbstractInfiniteSolver
getDiscreteProblem, getProblem, printSolution
 
Methods inherited from class jmarkov.jmdp.solvers.Solver
getOptimalPolicy, getOptimalValueFunction, getValueFunction, isSolved, printSolution, setPrintProcessTime, setPrintValueFunction, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface jmarkov.basic.JMarkovElement
equals
 

Field Detail

iterations

protected long iterations
Used to store the number of iterations


processTime

protected long processTime
Used to store process time

Constructor Detail

PolicyIterationSolver

public PolicyIterationSolver(DTMDP<S,A> problem,
                             double discountFactor)
The constructor method exclusively receives a problem of the type InfiniteMDP because this solver is only designed to work on infinite horizon problems. This solver solves the discounted objective function problem.

Parameters:
problem - the structure of the problem of type InfiniteMDP
discountFactor - represents how much less is the reward received in the next period instead of receiving it in the present period.

PolicyIterationSolver

public PolicyIterationSolver(DTMDP<S,A> problem,
                             double discountFactor,
                             boolean setModifiedPolicy)
The constructor method exclusively receives a problem of the type InfiniteMDP because this solver is only designed to work on infinite horizon problems. This solver solves the discounted objective function problem.

Parameters:
problem - the structure of the problem of type InfiniteMDP
discountFactor - represents how much less is the reward received in the next period instead of receiving it in the present period.
setModifiedPolicy -
Method Detail

getIncreasingFactor

public double getIncreasingFactor()
Returns:
increasing factor of the maximum iterations.

setIncreasingFactor

public void setIncreasingFactor(double increasingFactor)
Sets the increasing factor of the maximum iterations of the Modified policy iteration method. The first iterations are a vague aproximation to the real value functions and need not be exhaustive. But the last iterations must refine the value functions in order to get better precision. The increasing factor determines how many iteratinos are to be done in each iteration. Faster growth will be more precise but computationaly more expensive.

Parameters:
increasingFactor - greater that 1. Determines max iterations growth.

getInitialIterations

public double getInitialIterations()
Returns:
initial maximum iterations of the modified policy iteration algorithm.

setInitialIterations

public void setInitialIterations(int initialIterations)
Sets maximum iterations for the first run of the modified policy iteration.

Parameters:
initialIterations -

solve

public Solution<S,A> solve()
                                                 throws SolverException
Description copied from class: Solver
Called to solve the problem. This method MUST write the local variable policy and valueFunction.

Specified by:
solve in class Solver<S extends State,A extends Action>
Returns:
The solution Object taht contains the plicy and value fuenction.
Throws:
SolverException - This exception is thrown if the solver cannot find a solution for some reason.

solveMatrix

protected ValueFunction<S> solveMatrix()
                                              throws SolverException
This method is used by the PolicyIterationSolver to solve the linear system of equations to determine the value functions of each state for a given policy.

Returns:
a DenseVector (type defined in the JMP package documentation) with the value functions for each state. The index for each state are the same ones determined in the localStates ArrayList
Throws:
SolverException

solveMatrixModified

protected ValueFunction<S> solveMatrixModified(DecisionRule<S,A> localDecisionRule)
This method is used by the PolicyIterationSolver to solve the linear system of equations to determine the value functions of each state for a given policy.

Returns:
a DenseVector (type defined in the JMP package documentation) with the value functions for each state. The index for each state are the same ones determined in the localStates ArrayList declared as static.

setModifiedPolicy

public void setModifiedPolicy(boolean val)
Activates the modified policy iteration algorithm.

Parameters:
val - True if the modified policy iteration is to be used.

description

public java.lang.String description()
Description copied from interface: JMarkovElement
This method return a complete verbal describtion of this element. This description may contain multiple text rows.

Specified by:
description in interface JMarkovElement
Overrides:
description in class Solver<S extends State,A extends Action>
Returns:
A String describing this element.
See Also:
JMarkovElement.label()

label

public java.lang.String label()
Description copied from class: Solver
The sub classes must return the Solver name.

Specified by:
label in interface JMarkovElement
Specified by:
label in class Solver<S extends State,A extends Action>
Returns:
A String label.
See Also:
Solver.toString()

getProcessTime

public final long getProcessTime()
Specified by:
getProcessTime in class Solver<S extends State,A extends Action>
Returns:
Returns the processTime.

getIterations

public final long getIterations()
Specified by:
getIterations in class AbstractInfiniteSolver<S extends State,A extends Action>
Returns:
Returns the iterations.