jmdp.solvers
Class ValueIterationSolver<S extends State,A extends Action>

java.lang.Object
  extended by jmdp.solvers.Solver<S,A>
      extended by jmdp.solvers.AbstractInfiniteSolver<S,A>
          extended by jmdp.solvers.AbstractDiscountedSolver<S,A>
              extended by jmdp.solvers.ValueIterationSolver<S,A>

public class ValueIterationSolver<S extends State,A extends Action>
extends AbstractDiscountedSolver<S,A>

This class belongs to the set of default solvers included in the jmdp package. It extends Solver and should only be used on INFINITE horizon problems. The objective is to be able to return an optimal policy given a problem structure.

Author:
Andres Sarmiento, German Riano - Universidad de Los Andes

Field Summary
protected  boolean average
           
protected  A bestAction
           
protected  double epsilon
           
protected  boolean errorBounds
           
protected  boolean gaussSeidel
           
protected  double initVal
           
protected  int iterations
           
protected  boolean modifiedAverage
           
protected  long processTime
           
 
Fields inherited from class jmdp.solvers.AbstractDiscountedSolver
discountFactor, interestRate
 
Fields inherited from class jmdp.solvers.Solver
policy, printProcessTime, printValueFunction, problem, solved, valueFunction
 
Constructor Summary
ValueIterationSolver(CTMDP<S,A> problem, double discountFactor)
          This constructor method exclusively receives a problem of the type CTMDP because this solver is only designed to work on infinite horizon problems.
ValueIterationSolver(DTMDP<S,A> problem, double interestRate)
          This constructor method exclusively receives a problem of the type DTMDP because this solver is only designed to work on infinite horizon problems.
 
Method Summary
protected  double bestAction(S i)
          Sets the best action to take in state i, in the static variable bestAction.
protected  double computeNoErrorBounds()
           
protected  double computeWithErrorBounds()
           
protected  double future(S i, A a, double discountF)
           
protected  double future(S i, A a, double discountF, ValueFunction<S> vf)
          Expected value of valueFunction for the current state and a specified action.
 int getIterations()
           
 long getProcessTime()
           
protected  void init()
          Initializes the valueFunction for all the states.
 void setEpsilon(double epsilon)
          Value Iteration is a solver method this is theoretically convergent only after infinite iterations.
 void setGaussSeidel(boolean val)
          The GaussSeidel modification of the ValueIteration method is a change that is garanteed to have a performance at least as good as the methos without the modifications.
 void setInitVal(double val)
          All the states have an initial valueFunction that by default is 1.
 Solution<S,A> solve()
          Called to solve the problem.
 java.lang.String toString()
          The sub classes must return the Solver name.
 void useErrorBounds(boolean val)
          The ErrorBounds modification to the ValueIteration method is a change that is garanteed to have a performance at least as good as the methos without the modifications.
 
Methods inherited from class jmdp.solvers.AbstractDiscountedSolver
getInterestRate, setDiscountFactor, setInterestRate
 
Methods inherited from class jmdp.solvers.AbstractInfiniteSolver
getDiscreteProblem, getProblem, printSolution
 
Methods inherited from class jmdp.solvers.Solver
getOptimalPolicy, getOptimalValueFunction, getValueFunction, isSolved, printSolution, setPrintProcessTime, setPrintValueFunction
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

epsilon

protected double epsilon

initVal

protected double initVal

gaussSeidel

protected boolean gaussSeidel

errorBounds

protected boolean errorBounds

average

protected boolean average

processTime

protected long processTime

iterations

protected int iterations

bestAction

protected A extends Action bestAction

modifiedAverage

protected boolean modifiedAverage
Constructor Detail

ValueIterationSolver

public ValueIterationSolver(DTMDP<S,A> problem,
                            double interestRate)
This constructor method exclusively receives a problem of the type DTMDP because this solver is only designed to work on infinite horizon problems. Until now, this solver solves the discounted objective function problem. Other objective functions are still under development.

Parameters:
problem - the structure of the problem of type DTMDP
interestRate - represents how much less is the reward received in the next period instead of receiving it in the present period.

ValueIterationSolver

public ValueIterationSolver(CTMDP<S,A> problem,
                            double discountFactor)
This constructor method exclusively receives a problem of the type CTMDP because this solver is only designed to work on infinite horizon problems. Until now, this solver solves the discounted objective function problem. Other objective functions are still under development.

Parameters:
problem - the structure of the problem of type CTMDP
discountFactor - represents how much less is the reward received in the next period instead of receiving it in the present period.
Method Detail

setEpsilon

public void setEpsilon(double epsilon)
Value Iteration is a solver method this is theoretically convergent only after infinite iterations. Because of the practical impossibility to do this, the solver is designed to stop when the difference between iterations is as much as epsilon. The smaller epsilon is, the closer the result will be to the actual optimum but it will take a longer time to solve the problem. The default value of epsilon is 0.0001.

Parameters:
epsilon - maximum difference between iterations.

setInitVal

public void setInitVal(double val)
All the states have an initial valueFunction that by default is 1. The solver will converge faster if the initial valueFunction is closer to he optimum. It is a healthy practice to set the initial value in a value that is a bad estimate for the optimum.

Parameters:
val - inital valueFunction for all states.

setGaussSeidel

public void setGaussSeidel(boolean val)
The GaussSeidel modification of the ValueIteration method is a change that is garanteed to have a performance at least as good as the methos without the modifications. In many problems, specially the ones with many states, the modification can imply a significant improvement. By default it set to true. It provides no significant improvement if used jointly with the ErrorBounds modification.

Parameters:
val - sets whether or not the GaussSeidel modification will be used.
See Also:
useErrorBounds(boolean)

useErrorBounds

public void useErrorBounds(boolean val)
The ErrorBounds modification to the ValueIteration method is a change that is garanteed to have a performance at least as good as the methos without the modifications. In many problems, specially the ones with many states, the modification can imply a significant improvement. This method modifies the iteratios and the stopping criterion. It builds upper and lower bounds for the optimal in each iteration and stops when the bounds are only delta apart or less ignoring where the actual valueFunction is. The bounds converge faster than the actual valueFunction. By default it set to false.

Parameters:
val - sets whether or not to use the ErrorBounds modification.

solve

public Solution<S,A> solve()
Description copied from class: Solver
Called to solve the problem. This method MUST write the local variable policy and valueFunction.

Specified by:
solve in class Solver<S extends State,A extends Action>
Returns:
returns a Solution with the optimal policy and value funtion.

init

protected void init()
Initializes the valueFunction for all the states.


computeNoErrorBounds

protected double computeNoErrorBounds()

computeWithErrorBounds

protected double computeWithErrorBounds()

future

protected double future(S i,
                        A a,
                        double discountF,
                        ValueFunction<S> vf)
Expected value of valueFunction for the current state and a specified action.

Parameters:
discountF - is the rate for discounting from one period to another. It means how much less it would represent to receive one unit of the reward in the next period instead of receiving it in the present period.

future

protected double future(S i,
                        A a,
                        double discountF)

bestAction

protected double bestAction(S i)
Sets the best action to take in state i, in the static variable bestAction.

Parameters:
i - state for which the best action is being determined
Returns:
the new ValueFunction for this state.

getProcessTime

public final long getProcessTime()
Specified by:
getProcessTime in class Solver<S extends State,A extends Action>
Returns:
Returns the processTime.

getIterations

public final int getIterations()
Specified by:
getIterations in class AbstractInfiniteSolver<S extends State,A extends Action>
Returns:
Returns the iterations.

toString

public java.lang.String toString()
Description copied from class: Solver
The sub classes must return the Solver name.

Specified by:
toString in class Solver<S extends State,A extends Action>
See Also:
Object.toString()