Pawpawanalytics.com - Markov Chains in Base R

Markov Chains offer a way to understand sequential data – where the future state depends only on the current state, not on the sequence of events that preceded it. This concept, known as the Markov property, was originally applied to text, predicting which letter will proceed a given letter. The concept is simple. Given a state can you predict the next state, and the next state after that. If you are give the letter “e” what is the probability that the next letter will be “y”, and 2 letters from now, what is that probability?

In this blog post, I’ll dive into the basics of Markov Chains and demonstrate their application using real-world weather data. The data for our exploration is sourced from the National Centers for Environmental Information (NCEI) Climate Data Online (CDO) website. You can grab the sdpecific file I pulled to follow along here.

x <- fread(here::here("posts/markov-chains/weather.csv"))

x[,desc := fcase(SNOW > 0, "snow"
                 , PRCP > 0.1, "rain"
                 , default = "clear")]
x = x[NAME == "ATHENS 6.0 NW, OH US"]
x$desc[1:5]

[1] "rain"  "rain"  "clear" "clear" "clear"

Our focus is on the weather conditions in Athens, Ohio. We categorize each day’s weather as ‘clear’, ‘rain’, or ‘snow’, based on precipitation (PRCP) and snowfall (SNOW) measurements. This simplification helps us to create a basic model of weather transitions.

Throughout this post, we will use Base R to construct and analyze a Markov Chain. Our goal is to understand the probabilities of transitioning from one type of weather to another. This journey will not only introduce the concept of Markov Chains but also illustrate the power of R in handling and analyzing complex datasets.

The Transition Matrix

A transition matrix, often referred to as a stochastic matrix or probability matrix, represents the probabilities of transitioning from one state to another in a Markov Chain. The matrix is square, meaning it has the same number of rows and columns. Each row and column corresponds to a possible state in the system being modeled. Using our weather conditions, your states would be “clear”, “rain”, and “snow”, and your matrix would be 3x3.

Here’s what that matrix initialized:

trans_matrix <- matrix(0, nrow = 3, ncol = 3
                       , dimnames = list(c("clear", "rain", "snow")
                                         , c("clear", "rain", "snow")))
trans_matrix

      clear rain snow
clear     0    0    0
rain      0    0    0
snow      0    0    0

The elements of the matrix are the transition probabilities. An element \(P_{ij}\) in the matrix represents the probability of moving from state \(i\) to state \(j\). These probabilities are non-negative and typically presented as decimals between 0 and 1. \(ij\) follow the same convention as data.frame indexing, where df[i, j] indicates df[row, column]. So you can think of reading a transition matrix along the rows as being \(i\) or initial state, while the columns indicate \(j\) the future state. So in our matrix, a value in row 2, column 1 would indicate the probability of a rainy day being followed by a clear day.

The sum of the probabilities in each row of a transition matrix is 1, as they represent all possible outcomes from a given state. This characteristic reflects the total probability theorem, which states that the sum of the probabilities of all possible outcomes must equal 1.

Generating a Transition Matrix

When given a series of states, we want to create a transition matrix that will help us calculate probabilities of events \(n\) steps in the future. To do so, we will sum the total times that a transition occurs using that [i,j] organization and then normalize (convert to probabilities) for each row.

states = x$desc
n = length(states)
for (i in 1:(n-1)) {
  trans_matrix[states[i], states[i+1]] = trans_matrix[states[i], states[i+1]] + 1 
}
trans_matrix

      clear rain snow
clear   406  106   12
rain    105   52    1
snow     13    0    3

Now that we have counts, we can find the probability of transitions by generating row wise percentages.

prob_matrix = trans_matrix

for (i in 1:nrow(trans_matrix)) {
  i_sum = sum(trans_matrix[i, ])
  for (j in 1:ncol(trans_matrix)) {
    prob_matrix[i, j] = trans_matrix[i, j] / i_sum
  }
}

trans_matrix = prob_matrix
trans_matrix

          clear      rain        snow
clear 0.7748092 0.2022901 0.022900763
rain  0.6645570 0.3291139 0.006329114
snow  0.8125000 0.0000000 0.187500000

Living here in SE Ohio, this transition matrix definitely aligns with anecdotal observation.

Applying the Transition Matrix.

There are quite a few uses for the transition matrix, but let’s start simple. Today is a clear day. What is the probability that it’s going to rain tomorrow? We can simply look that information up in our transition matrix. Row 1, column2, shows us the probability is ~20% or 1 in 5. What is the probability of a rainy day converting to a clear day? Row 2, column 1 shows about 1 in 3, or 66%. So by basic probability rules, we can pretty easily calculate a 3 day forecast, finding the probability that we will have clear, rainy, and another clear day.

trans_matrix["clear", "rain"] * trans_matrix["rain", "clear"]

[1] 0.1344333

We can see there’s an ~13% chance of clear, rain, clear according to our transition matrix.

But what if we just want the outcome? What is the most likely state in 3 days depending on the current state? To do so, we use the power of the matrix to calculate the state probabilities after \(k\) days.

# normal `^` will apply to each value in the matrix while 
# expm`%^%` will be the same as matrix %*% matrix %*% matrix ....
expm::`%^%`(trans_matrix, 3)

          clear      rain       snow
clear 0.7510673 0.2258945 0.02303822
rain  0.7487424 0.2293364 0.02192121
snow  0.7587144 0.2122592 0.02902640

It’s worth noting that the higher your \(k\) is, the closer your model will represent the overall probability. So Markov chains are most useful for near state calculations.

Applying to Game Theory

Let’s say we want to use Markov Chains to give us an edge in some form of competition. Let’s say you’re playing Monopoly and you have limited cash. You know the square that you’re on, and you want to calculate the risk of using the money on hand to buy a lesser property as you circle the board vs. holding on to your cash for a few turns and then purchasing a big property like the Boardwalk.

properties <- c("GO", "Mediterranean Avenue", "Community Chest 1", "Baltic Avenue", "Income Tax", "Reading Railroad", "Oriental Avenue", "Chance 1", "Vermont Avenue", "Connecticut Avenue", "Jail/Just Visiting", 
                "St. Charles Place", "Electric Company", "States Avenue", "Virginia Avenue", "Pennsylvania Railroad", "St. James Place", "Community Chest 2", "Tennessee Avenue", "New York Avenue", "Free Parking", 
                "Kentucky Avenue", "Chance 2", "Indiana Avenue", "Illinois Avenue", "B&O Railroad", "Atlantic Avenue", "Ventnor Avenue", "Water Works", "Marvin Gardens", "Go to Jail", 
                "Pacific Avenue", "North Carolina Avenue", "Community Chest 3", "Pennsylvania Avenue", "Short Line", "Chance 3", "Park Place", "Luxury Tax", "Boardwalk")

We’ll start by simulating die rolls for 10,000 rolls of monopoly and recording the transitions that occurr. For simplicity’s sake, we will ignore the rules about going to jail.

n = length(properties)
marker = 1L
landing = properties[1]
set.seed(100)
for (i in 1:10000) {
  d1 = sample(1:6, 1)
  d2 = sample(1:6, 1)
  roll = d1+d2
  marker = (marker + roll - 1) %% n + 1
  landing = c(landing, properties[marker])
}

ggplot2::ggplot(data = NULL, ggplot2::aes(x = landing)) + 
  ggplot2::geom_bar() + 
  ggplot2::coord_flip()

From here, we can build our transition matrix, which will be quite a bit larger than our initial example of clear, rain, and snow.

trans_matrix <- matrix(0, n, n, dimnames = list(properties, properties))
n = length(landing)
for (i in 1:(n-1)) {
  trans_matrix[landing[i], landing[i+1]] = trans_matrix[landing[i], landing[i+1]] + 1
}

trans_matrix = sweep(trans_matrix, 1, rowSums(trans_matrix), FUN = "/")
trans_matrix |> 
  as.data.table(keep.rownames = "i") |>
  melt(variable.name = "i2") |> 
  _[,i := ordered(i, levels = properties)] |>
  _[,i2 := ordered(i2, levels = properties)] |> 
  ggplot(aes(y = i, x = i2, fill = value)) + 
  geom_tile() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))

So, let’s put our markov chains into practice. What first roll of the dice is most likely to land you on the Boardwalk?

lapply(1:8, \(roll) {
  x = trans_matrix %^% roll
  x = x[3:12,ncol(x)] |> as.data.table(keep.rownames = "start")
  setnames(x, c("starting_roll", "boardwalk_prob"))
  x[,roll:=roll]
  x
}) |>
  rbindlist() |>
  ggplot(aes(y = ordered(starting_roll, levels = properties), x = as.character(roll+1), fill = boardwalk_prob, label = scales::percent(boardwalk_prob, .1))) + 
  geom_tile() + 
  geom_text(color = "white") +
  theme_classic() +
  theme(legend.position = "none") +
  labs(x = "Dice Rolls", y = "Starting Position", title = "Probability of Landing on Boardwalk"
       , fill = NULL)

So, based on our simulation (which doesn’t account for Go to Jail or Chance / Community Chest), your highest probabilities for landing on boardwalk decrease as you move from either the tail end of a possible initial roll (12 St. Charles) or the head (2 Community Chest) towards Chance. In our simulation, if you can get to Connecticut Avenue or farther your chances are the best for landing on Boardwalk.