Software Transactional Memory

Dennis J. McWherter, Jr. bio photo By Dennis J. McWherter, Jr. Comment

Software Transactional Memory (or STM) is a useful concept for constructing concurrent software. The original proposal of STM by Shavit and Touitou was derived by adapting a hardware-based transactional memory system which aimed to make for conceptually simpler and safer concurrent code. Since that time, STM has come a long way being used to describe composable concurrent computations and other useful concepts. As you can tell, this has been a heavily researched topic in the literature. However, it seems that for the vast majority of people, locking and raw concurrency are still the norm when developing a parallel system. Through some explanation and examples, I€™ll help you understand why STM is really useful and how it can make complex systems more robust and performant.

NOTE: Though there are several existing implementations of STM, we will focus on using the GHC Haskell compiler€™s implementation of STM for our examples in this post. As a result, I assume some familiarity in working with monads though you don€™t need to be an expert with them.

Similarly, I expect familiarity with common locking techniques and problems associated with them.

Why Software Transactional Memory?

The first question we always tend to ask is why? Particularly, why should you spend the time to learn a new concept when you already understand how to use mutexes and semaphores to control execution of your application? The short answer is that, in general, managing concurrency is difficult. Not only is it difficult, but it€™s extremely important for any modern software system. If you notice the latest line of computers being manufactured, the speeds of the cores are staying the same but the number of them is growing. This means that you need to use many slower cores more effectively rather than a single fast core.

But just because something is difficult doesn€™t mean we don€™t try. If you look at the linux kernel or many other stable software solutions in existence today many of them use locks. That said, concurrency bugs are being fixed all of the time in these systems. Why is that? When I say concurrency is difficult, I mean it€™s really difficult. What€™s more, something like the linux kernel is a large system. Arguably some of the best developers in the world (i.e. linux kernel devs/maintainers) find issues with their subsystems on an almost daily basis. When dealing with locks you have problems such as deadlock (i.e. everything freezes waiting on resources) and what I will call €œover-serialization€ of your system (i.e. your threads start running serially due to overlocking to ensure data integrity). These problems ignore the fact that it€™s also an error-prone process. That is, if you forget to lock a particular critical section, memory corruption can occur and will become incredibly difficult to track down and debug.

The root cause of this difficulty is in the size of the system. For small, trivial systems, it is possible to create a locking solution which is likely faster than STM. However, scaling this system into something more complex (as we see with the linux kernel) becomes exponentially more difficult and will likely result in a suboptimal implementation.

As you may have guessed, STM solves many of these issues. Though there is overhead associated with STM (see around 11:00min), it can likely be ignored as we scale to larger systems. Similarly, this scaling of systems is likely inevitable.

How does it work?

Now that I€™ve convinced you that STM is a worthwhile concept in modern software engineering, how does it work? Well, if you know anything about databases and transactions, it€™s probably going to feel very intuitive to you. If not, don€™t worry: pictures follow.

But first, a we€™ll go through a quick overview of STM. The first thing we must be aware of is that any proper implementation of STM has the following two properties:

  • Atomic. That is to say, grouped sets of operations executed on memory occur together or not at all. For more information on atomicity, see this wikipedia article.
  • Serializable. Practically, this means that the current state of our data structure can be reconstructed by simply replaying our transaction log in sequential order from start to end.

These properties are general to all STM implementations. That said, we will be using the implementation presented in the original paper I linked to above as the basis for our brief high-level sketch to explore how software transactional memory works:

  1. Initiate a transaction to occur on a dataset
  2. Acquire ownership of affected data addresses on specified dataset
  3. If successful, log the old values, compute new values, then update dataset
  4. Otherwise fail transaction and help existing owner of address complete

For deeper details and insights, I suggest you read that paper. However, I think this provides us a foundation (albeit, a little hand-wavy) to begin our discussion of how this works (and what€™s so cool about it).

Let€™s first consider the diagram below:

image

This is the simplest use-case of STM. In summary, we have a single shared dataset between two threads. What abstraction we use to interact with our dataset (i.e. array, binary tree, hash map, etc.) is not very important. What is important is the fact that this shared dataset (like everything in computing) is composed of a set of unique addresses. In this case thread 1 and thread 2 access disjoint subsets of these addresses and can make their updates concurrently.

Though this is a simple example, it€™s implications are quite profound. In a typical locking scenario, one would have to lock on the granularity of the entire dataset and, thus, only one of these updates could occur at a time (making our application serial). Subtle yet powerful! 

The next diagram shows an example of contention between our two threads:

image

This situation is a little more interesting. What will happen is first T1 will acquire ownership on the addresses A1. Then, T2 will attempt to gain ownership of the same addresses. However, since T1 already owns those addresses, it will fail. When it fails, it will try to help T1 complete its transaction sooner so that it can start. After T1 finishes its transaction (thus, releasing its ownership on A1), T2 can retry its transaction on data at A1.

As you can see, STM is a pretty nifty little concept. It avoids deadlock and complicated locking issues while allowing your program to be highly concurrent. As I noted before, I am hiding some important details like the fact that the authors rely on hardware primitives such as Load_Link/Store_Conditional. Even so, these details are not as important as obtaining the general high-level understanding of STM so that you can use it in your own applications. However, if you€™re interested in those nitty-gritty details, I will again suggest you read the original paper.

Enough Talk. Show me how it can make my life easier!

You€™re right. It€™s an interesting idea, but without any practical application it doesn€™t seem so useful. For this section (as I noted above), I will be using Haskell with GHC€™s implementation of STM. Though the previous section should give you an idea of how this is implemented, you should not assume an identical translation (i.e. these come from different sources). Instead, you should be aware of the overall benefits and general concepts which do carry over and are not artifacts of implementation details. Let€™s begin.

The Banking Bandit. Assume you€™re setting up a new bank account with a local bank. Similarly, suppose there is a bandit on the loose. This bandit keeps taking money from people€™s bank accounts right as they make withdrawals and is causing $100€²s in overdraft fees. As a protection measure, banks have now stopped allowing anyone to withdraw more money than exists in the account. If you haven€™t guessed yet, we€™re the bank charged with this task. Let€™s get right to the code:

{-# LANGUAGE DeriveDataTypeable #-}
module Main where
import Control.Concurrent
import Control.Concurrent.STM.TVar
import Control.Exception
import Control.Monad.State.Lazy
import Control.Monad.STM
import Data.Typeable
import System.IO
import System.Random

data BankException = InsufficientFunds deriving (Show, Typeable)
instance Exception BankException

type Account = TVar Int

prompt :: String -> IO ()
prompt str = putStr str >> hFlush stdout

main :: IO ()
main = do
  prompt "Initial account balance: "
  initial <- getInt
  if initial < 0 then
    putStrLn "Must start with a balance >= 0."
  else do
    acct <- atomically $ openAccount initial
    forkIO $ banditThread acct
    forever $ do
      prompt "Enter adjustment (positive for deposit, negative for withdrawal): "
      adjust <- getInt
      failed <- atomically $ adjustAccount acct adjust `catchSTM` handleBankException
      when failed $ putStrLn "Could not complete transaction: insufficient funds."
      showBalance acct
  where getInt = getLine >>= (\x -> return (read x :: Int))
        showBalance acct = do
          balance <- atomically $ accountBalance acct
          putStrLn $ "Current balance: " ++ (show balance)

openAccount :: Int -> STM Account
openAccount = newTVar

accountBalance :: Account -> STM Int
accountBalance = readTVar

adjustAccount :: Account -> Int -> STM Bool
adjustAccount acct adjustment = do
  curVal <- readTVar acct
  let newVal = curVal + adjustment
  writeTVar acct newVal
  if newVal < 0 then
    throwSTM InsufficientFunds -- Could alternatively retry here
  else
    return False

handleBankException :: BankException -> STM Bool
handleBankException _ = return True

threadDelaySeconds :: Int -> IO ()
threadDelaySeconds = threadDelay . (*1000000)

banditThread :: Account -> IO ()
banditThread acct = do
  delayGen <- newStdGen
  takeGen <- newStdGen
  evalStateT beABandit (delayGen, takeGen)
  where
    beABandit :: StateT (StdGen, StdGen) IO ()
    beABandit = forever $ do
      (delayGen, takeGen) <- get
      nextDelayGen <- liftIO $ randomDelay (1, 5) delayGen
      (tval, nextTakeGen) <- liftIO $ randomTake takeGen
      put (nextDelayGen, nextTakeGen)
      failed <- liftIO $ atomically $ adjustAccount acct ((-1) * tval) `catchSTM` handleBankException
      liftIO $ when failed $ putStrLn "\nBandit tried to steal more money than is in account!"
    randomTake = return . randomR (1, 20)
    randomDelay intval gen = do
      (val, gen) <- return $ randomR intval gen
      threadDelaySeconds val
      return gen

The first thing you will notice about this code is that it doesn’t look much different than normal Haskell code. Similarly, Haskell’s type system makes it very clear where STM is used. For us, this is great news! It demonstrates that using STM turns out to be minimally invasive when using it in our Haskell code and feels very natural. Now, rather than explain every little usage detail of the STM library in Haskell (read: the docs are here and a good primer is here), I’ll give a brief overview of what’s going on.

In short, I’ve set out to simulate the situation I have described above. We have a main thread which simulates a primitive bank terminal and a banditThread which randomly decreases the value of our account over time. The bandit will continue to draw as much money as the system will let him (but we’ll never overdraw!). In the case of an overdraft by either you or the bandit, we throw an “InsufficientFunds” exception and abort the transaction entirely (i.e. the transaction is rolled back and not committed). To show how this works, you’ll notice I wrote the transaction in an interesting way. That is, I write the TVar before I throw the exception. However, since these actions are all part of the same transaction and the entire transaction is rolled back, none of the actions take effect on our final state. Pretty neat, huh?

As you can see, writing STM code in Haskell is pretty straightforward. In fact, there is very little surprise or difference in how we use STM as opposed to any other monad. Above all, we never have to concern ourselves with potentially dangerous locking techniques that make us worry about deadlock or other race conditions. What€™s more, the implementation is practical and accessible. We can simply and easily use it in our everyday applications with little extra effort other than modifying some types. Best of all, we get the safety of STM which provides us the ability to more rapidly build concurrent applications correctly!

comments powered by Disqus