# Software Transactional Memory

Software Transactional Memory (or STM) is a useful concept for constructing concurrent software. The original proposal of STM by Shavit and Touitou was derived by adapting a hardware-based transactional memory system which aimed to make for conceptually simpler and safer concurrent code. Since that time, STM has come a long way being used to describe composable concurrent computations and other useful concepts. As you can tell, this has been a heavily researched topic in the literature. However, it seems that for the vast majority of people, locking and raw concurrency are still the norm when developing a parallel system. Through some explanation and examples, Ill help you understand why STM is really useful and how it can make complex systems more robust and performant.

NOTE: Though there are several existing implementations of STM, we will focus on using the GHC Haskell compilers implementation of STM for our examples in this post. As a result, I assume some familiarity in working with monads though you dont need to be an expert with them.

Similarly, I expect familiarity with common locking techniques and problems associated with them.

## Why Software Transactional Memory?

The first question we always tend to ask is why? Particularly, why should you spend the time to learn a new concept when you already understand how to use mutexes and semaphores to control execution of your application? The short answer is that, in general, managing concurrency is difficult. Not only is it difficult, but its extremely important for any modern software system. If you notice the latest line of computers being manufactured, the speeds of the cores are staying the same but the number of them is growing. This means that you need to use many slower cores more effectively rather than a single fast core.

But just because something is difficult doesnt mean we dont try. If you look at the linux kernel or many other stable software solutions in existence today many of them use locks. That said, concurrency bugs are being fixed all of the time in these systems. Why is that? When I say concurrency is difficult, I mean its really difficult. Whats more, something like the linux kernel is a large system. Arguably some of the best developers in the world (i.e. linux kernel devs/maintainers) find issues with their subsystems on an almost daily basis. When dealing with locks you have problems such as deadlock (i.e. everything freezes waiting on resources) and what I will call over-serialization of your system (i.e. your threads start running serially due to overlocking to ensure data integrity). These problems ignore the fact that its also an error-prone process. That is, if you forget to lock a particular critical section, memory corruption can occur and will become incredibly difficult to track down and debug.

The root cause of this difficulty is in the size of the system. For small, trivial systems, it is possible to create a locking solution which is likely faster than STM. However, scaling this system into something more complex (as we see with the linux kernel) becomes exponentially more difficult and will likely result in a suboptimal implementation.

As you may have guessed, STM solves many of these issues. Though there is overhead associated with STM (see around 11:00min), it can likely be ignored as we scale to larger systems. Similarly, this scaling of systems is likely inevitable.

## How does it work?

Now that Ive convinced you that STM is a worthwhile concept in modern software engineering, how does it work? Well, if you know anything about databases and transactions, its probably going to feel very intuitive to you. If not, dont worry: pictures follow.

But first, a well go through a quick overview of STM. The first thing we must be aware of is that any proper implementation of STM has the following two properties:

• Atomic. That is to say, grouped sets of operations executed on memory occur together or not at all. For more information on atomicity, see this wikipedia article.
• Serializable. Practically, this means that the current state of our data structure can be reconstructed by simply replaying our transaction log in sequential order from start to end.

These properties are general to all STM implementations. That said, we will be using the implementation presented in the original paper I linked to above as the basis for our brief high-level sketch to explore how software transactional memory works:

1. Initiate a transaction to occur on a dataset
2. Acquire ownership of affected data addresses on specified dataset
3. If successful, log the old values, compute new values, then update dataset
4. Otherwise fail transaction and help existing owner of address complete

For deeper details and insights, I suggest you read that paper. However, I think this provides us a foundation (albeit, a little hand-wavy) to begin our discussion of how this works (and whats so cool about it).

Lets first consider the diagram below:

This is the simplest use-case of STM. In summary, we have a single shared dataset between two threads. What abstraction we use to interact with our dataset (i.e. array, binary tree, hash map, etc.) is not very important. What is important is the fact that this shared dataset (like everything in computing) is composed of a set of unique addresses. In this case thread 1 and thread 2 access disjoint subsets of these addresses and can make their updates concurrently.

Though this is a simple example, its implications are quite profound. In a typical locking scenario, one would have to lock on the granularity of the entire dataset and, thus, only one of these updates could occur at a time (making our application serial). Subtle yet powerful!

The next diagram shows an example of contention between our two threads:

This situation is a little more interesting. What will happen is first T1 will acquire ownership on the addresses A1. Then, T2 will attempt to gain ownership of the same addresses. However, since T1 already owns those addresses, it will fail. When it fails, it will try to help T1 complete its transaction sooner so that it can start. After T1 finishes its transaction (thus, releasing its ownership on A1), T2 can retry its transaction on data at A1.

As you can see, STM is a pretty nifty little concept. It avoids deadlock and complicated locking issues while allowing your program to be highly concurrent. As I noted before, I am hiding some important details like the fact that the authors rely on hardware primitives such as Load_Link/Store_Conditional. Even so, these details are not as important as obtaining the general high-level understanding of STM so that you can use it in your own applications. However, if youre interested in those nitty-gritty details, I will again suggest you read the original paper.

## Enough Talk. Show me how it can make my life easier!

Youre right. Its an interesting idea, but without any practical application it doesnt seem so useful. For this section (as I noted above), I will be using Haskell with GHCs implementation of STM. Though the previous section should give you an idea of how this is implemented, you should not assume an identical translation (i.e. these come from different sources). Instead, you should be aware of the overall benefits and general concepts which do carry over and are not artifacts of implementation details. Lets begin.

The Banking Bandit. Assume youre setting up a new bank account with a local bank. Similarly, suppose there is a bandit on the loose. This bandit keeps taking money from peoples bank accounts right as they make withdrawals and is causing $100²s in overdraft fees. As a protection measure, banks have now stopped allowing anyone to withdraw more money than exists in the account. If you havent guessed yet, were the bank charged with this task. Lets get right to the code: {-# LANGUAGE DeriveDataTypeable #-} module Main where import Control.Concurrent import Control.Concurrent.STM.TVar import Control.Exception import Control.Monad.State.Lazy import Control.Monad.STM import Data.Typeable import System.IO import System.Random data BankException = InsufficientFunds deriving (Show, Typeable) instance Exception BankException type Account = TVar Int prompt :: String -> IO () prompt str = putStr str >> hFlush stdout main :: IO () main = do prompt "Initial account balance: " initial <- getInt if initial < 0 then putStrLn "Must start with a balance >= 0." else do acct <- atomically$ openAccount initial
forkIO $banditThread acct forever$ do
prompt "Enter adjustment (positive for deposit, negative for withdrawal): "
failed <- atomically $adjustAccount acct adjust catchSTM handleBankException when failed$ putStrLn "Could not complete transaction: insufficient funds."
showBalance acct
where getInt = getLine >>= (\x -> return (read x :: Int))
showBalance acct = do
balance <- atomically $accountBalance acct putStrLn$ "Current balance: " ++ (show balance)

openAccount :: Int -> STM Account
openAccount = newTVar

accountBalance :: Account -> STM Int

adjustAccount :: Account -> Int -> STM Bool
let newVal = curVal + adjustment
writeTVar acct newVal
if newVal < 0 then
throwSTM InsufficientFunds -- Could alternatively retry here
else
return False

handleBankException :: BankException -> STM Bool
handleBankException _ = return True

threadDelaySeconds :: Int -> IO ()

banditThread :: Account -> IO ()
delayGen <- newStdGen
takeGen <- newStdGen
evalStateT beABandit (delayGen, takeGen)
where
beABandit :: StateT (StdGen, StdGen) IO ()
beABandit = forever $do (delayGen, takeGen) <- get nextDelayGen <- liftIO$ randomDelay (1, 5) delayGen
(tval, nextTakeGen) <- liftIO $randomTake takeGen put (nextDelayGen, nextTakeGen) failed <- liftIO$ atomically $adjustAccount acct ((-1) * tval) catchSTM handleBankException liftIO$ when failed $putStrLn "\nBandit tried to steal more money than is in account!" randomTake = return . randomR (1, 20) randomDelay intval gen = do (val, gen) <- return$ randomR intval gen