Creating and Working with Objects

As we said in section [*], each plugin is a blueprint from which models are created, just like a class in the object oriented paradigm. Using this same analogy, we will call each model created, from a specific plugin, an object. MTK allows you to create, and work, at the same time, with different objects, created or not from the same plugin. This is very useful when we want to test the same type of model, only with different parameters, to the same problem.

This said, its time to create our HMM model. Any object, on MTK, can be created with the new command:

$<$object_name$>$ = new $<$plugin_name$>$ ( $<$parameters$>$ )

Example:

MTK:3> game_model = new hmm( 2, 2 )
hmm object was successfully created with name game_model
Parameters were initialized with random values.
Model error tolerance is epsilon = 0.000010
We have chosen to create a HMM with two states and two symbols for obvious reasons: there are only two coins, hence, we will use one state represent each; there are, also, only two possible outcomes, heads or tails, and thus, we need only two symbols.

After creating our HMM model, we need to load, into MTK, the observations collected, stored in the trace.txt file, which we will use to adjust the parameters of our model. MTK has two plugins that reproduce an array of elements, and can be used to work with observation samples: the intvalue plugin, and the floatvalue plugin. The first one handles only integer value data, and the second, float value data. Since our observations are composed of, only, $0$'s and $1$'s, we will use the intvalue plugin. This said, let's create and load our trace file into MTK.

Example:

MTK:4> trace = new intvalue( )
MTK:5> trace.load( "trace.txt" )

A plugin usually implements some display function, which shows, on screen, some of the objects attributes. Specifically, the intvalue plugin has a display called stats, which calculates and shows some statistics of the object.

Example:

MTK:6> trace.display( stats )
'stats' at 'trace'

Minimum:  0
Maximum:  1
Mean:     0.48
Variance: 0.24985

We are now ready to adjust our HMM model parameters. This can be accomplished with the training command, present in the hmm plugin. It estimates the model parameters by maximizing the likelihood of a sample, given as input, using the Baum-Welch algorithm[#!em_tut!#]. To obtain information on this, or any other method, from any other plugin, you can use the help command:

help $<$plugin_name$>$.$<$method_name$>$

Example:

MTK:7> help hmm.training
Estimates the model parameters by maximizing the likelihood of a
given observation sample. In case this sample is composed of
incomplete data (observations only) the Baum-Welch algorithm is
used. Multiple observation samples can also be used.
    Usages: training( <it>, [<thr>], <object1> [, ... ] )
            training( <object1>, <object2> )
    Where:
        <it>         - number of iterations to perform in the
                       Baum-Welch algorithm.
        <thr>        - log-likelihood threshold to stop training.
        <object1>    - the object containing the observations.
        <object2>    - the object containing the states path.

As it is well known, the reestimation equations, used by the Baum-Welch algorithm, give values of the HMM parameters which correspond to a local maximum of the likelihood function[#!rabiner!#], but they do not guarantee that this maximum is also the global maximum. Therefore, the initial estimates of these parameters, chosen by the user, play, usually, an important role on the estimation. By default, MTK initializes the parameters of a hmm object with random values. Hence, in order to try and get a better estimation, we will, before training our model, change these initial values.

Every attribute, from every plugin, can be edited by the user, at any time. What we want to do, is to change the values of the hmm attributes pi, A and B, which correspond, respectively, to: the initial state distribution, the state transition probability matrix, and the symbol emission probability matrix. We know a bit about the symbol emission probability. Recall that, despite not telling us the exact bias of each coin, the dealer told us that it is significant. The problem is, what does he mean by "significant"? Let's take a guess. Say each coin has a bias of $0.6$ probability. Since we don't know anything about the way the parrot chooses the coins, let's assume that he is totally unbiased. With these assumptions made, it is time to change the desired attributes of our model. This can, normally, be accomplished using the following syntax:

$<$object_name$>$.$<$attribute_name$>$[$<$index$>$ ] = $<$value$>$

Example:

MTK:8>game_model.pi[0] = 0.5; game_model.pi[1] = 0.5;

MTK:9>game_model.A[0][0] = 0.5; game_model.A[0][1] = 0.5;
      game_model.A[1][0] = 0.5; game_model.A[1][1] = 0.5;

MTK:10>game_model.B[0][0] = 0.6; game_model.B[0][1] = 0.4;
       game_model.B[1][0] = 0.4; game_model.B[1][1] = 0.6;

After setting the initial parameter values, and reading the help information above, on the training method, estimating the parameters of our model should be straightforward. We will use $1000$ iterations.

Example:

MTK:11> game_model.training( 10000, trace )
#  iteration       log-likelihood           likelihood
           0    -6.9314718056e+02    9.3326361852e-302
           1    -6.9237923384e+02    2.0114968500e-301
           2    -6.9233799381e+02    2.0961853190e-301

         ...          ...                   ...

         998    -6.9073270968e+02    1.0437481465e-300
         999    -6.9073257832e+02    1.0438852618e-300
        1000    -6.9073244705e+02    1.0440222957e-300

Let's take a look at the estimated parameters. This can be easily done using the hmm's display function:

Example:

MTK:12> game_model.display( all )
'all' at 'game_model'

Number of states:  2
Number of symbols: 2

Initial state distribution:
 [ 0.00000e+00 1.00000e+00 ]

State transition probabilities:
 [ 4.72127e-01 5.27873e-01 ]
 [ 5.54976e-01 4.45024e-01 ]

Symbol observation probabilities:
 [ 8.86456e-01 1.13544e-01 ]
 [ 1.33456e-01 8.66544e-01 ]

Now that we have our model at hand, it is a good idea to save it in a file, so we can use it in the future. This can be done with the save command, available to most plugins:

Example:

MTK:13> game_model.save( "game_model_parameters.txt", "all" )
Hmm was successfully saved in file game_model_parameters.txt.

So, its finally time to make some money! In order to accomplish this, we will try and forecast the next outcome of the game, given the last $3$, and, observing the forecasted values, decide if it is time to bet our money, or hold off a bit more. Fortunately, the hmm plugin has a function, called forecast, that does just that! Given some previous sample, it calculates the probability of each symbol, in each future time step. Thus, as a result, we will have the probability, based on our model, of showing heads or showing tails in the next round.

Say the most recent $3$ outcomes where all heads, and are stored in the recent_outcome.txt file. So, first, we load them into MTK, using a new intvalue object, and then use the forecast function:

Example:

MTK:14> recent_obs = new intvalue()
MTK:15> recent_obs.load( "recent_outcome.txt" )
Intvalue was successfully loaded.

MTK:16> game_model.forecast( 1, recent_obs )
(Time Step 1): distribution: [ 4.9768404831e-01 5.0231595169e-01 ];
               most probable symbol: 1;
               entropy (in bits): 9.9998452377e-01
As we can see, given these most recent $3$ observations, there is a slight bigger chance of, in the next round, showing tails than showing heads. If you are risk tolerant, then it might be the time to bet some money!

The hmm plugin has also another method that can be used for forecasting, called symb_sum_dist, which calculates the probability distribution of the sum of the symbol's values emitted in a time window of size $F$, given some previous history. With it, we can calculate the probability of, in the next $5$ rounds, for example, showing one tail, two tails, etc., given the $3$ most recent outcomes.

Example:

MTK:17> game_model.symb_sum_dist( recent_obs, 5 )
# sum(Obs) P[sum(Obs) after 5 time steps]
     0            3.0514042169e-02
     1            1.6531242933e-01
     2            3.3398035017e-01
     3            3.1275828032e-01
     4            1.3558484787e-01
     5            2.1850050151e-02

# estimated average of sum(Obs): 2.4231376128e+00
By looking at the result above, we can notice that, in the next five consecutive rounds, the overall chances of the casino winning are better than the chances of a player winning.

This concludes our short tutorial. By now, you should feel more comfortable using MTK, and exploring its features. In the next two sections, we describe, in details, all of MTK's main commands, and every available plugin.

Guilherme Dutra Gonzaga Jaime 2010-10-27