Initial Forays into Transactional Memory Programming Models
Welcome again to the TM blog! This time your host is Yossi Levanoni. I’m the dev lead on the TM project.
It was great to see the high-quality questions we’ve gotten in response to Dana’s previous (and our altogether first) post on the blog and I’m looking forward to many more interesting discussions. As Dana said, TM is not at a product development stage yet; it is currently an incubation and as such our dialogue with you—dear reader—is going to be less prescriptive (e.g., “this is how you use API XYZ”) and more a two-way conversation (e.g., “what kind of TM would be useful for you?” What value do you place on feature X?”) We will also discuss the semantics and implementation techniques of the TM systems we’re incubating (but remember that we are not an open source shop!).
I would like to devote this post to a discussion on TM Programming Models. We would try to see what kind of questions are pertinent in such a discussion by examining a few choices (and alternatives) used by Tim Harris from MSR and his colleagues working on the Bartok STM. A little bit of background: Tim and colleagues have been researching the topic and have published some seminal papers on TM. Some of Tim’s research on TM has been conducted before joining Microsoft. Then, Tim and Simon Peyton-Jones and others have implemented TM for the Haskell programming language. Finally this work culminated in an implementation of TM for the Bartok CLR which is described in this paper. Our group (Parallel Computing Platform) continues to work closely with the MSR folks including Tim, Simon and Martin Abadi, an expert on memory models their formal definition and verification.
Putting the first thing first, we have to define our scope of investigation: What is TM? Is it just an implementation technique for achieving thread-safety? Is it a class of programming models that is independent of particular implementation strategies? Maybe it’s both? What about hardware transactions, are they part of the game? Lock elision? What about concurrency-safety annotations? These things are obviously so interrelated from an end-result perspective that we will discuss them all in this blog, in the same way that Microsoft considers them all “fair game” in order to improve the mechanisms developers have in their toolbox to manage shared state. There are going to be two unifying themes that will encompass most discussions though:
1. Automating thread-safety concerns (what we want to achieve).
2. Executing code speculatively or tentatively (what the technology does at its core).
While this post confines itself to the STM model of Bartok we will definitely consider the broader picture on this blog, both in terms of programming models, and in terms of implementation.
The Bartok STM Programming Model
Without further ado let’s start considering the Bartok STM programming model. The central idea of this excellent work is that you can write:
atomic { <statements> }
The <statements> within the atomic block would be totally isolated from the effects of other atomic blocks executing concurrently. Great, what does that actually mean and where are the gotchas? Here are a few:
User-Initiated Abort
In the database world, both the programmer and the system have a way to specify that the current work being done against the database (the transaction) should be cancelled and its effects rolled back. The user has the ability to instruct the system to abort the transaction and the system has the prerogative to abort the transaction from under the user. What are the equivalents of these options when automatic isolation for shared memory is considered? Let’s first consider user-initiated abort.
In the Bartok TM programming model the user does have the option to abort a transaction. This is done by letting an exception escape the boundary of the atomic block. As commentators to the previous post have astutely noticed, this feature has some radical implications, not just for concurrency but for single-threaded execution as well. This feature indeed changes the semantics of single-threaded execution in the sense that things will get rolled back (automagically) if the transaction is aborted. This is a very powerful feature that holds a promise to drastically improve the reliability of code under error conditions.
Of course, nothing is for free and providing user-initiated abort comes with a price, both in terms of performance and implementation complexity but also in terms of the complexity of the programming model (which is the focus of this post). Consider this piece of pseudo-code:
atomic {
FormatHardDrive ();
}
FormatHardDrive cannot be rolled back once executed, but we have given the user the ability to abort the transaction, which we can’t retract. One option is to simply forbid operations like “FormatHardDrive” inside an atomic block; if the programmer tries to do something like that then the program will either fail to compile or raise an exception at runtime. In this particular case suppose that the implementation does allow any kind of operation inside an atomic block and the user wrote the code as:
atomic {
FormatHardDrive ();
throw new Exception();
}
Now we’re really trying to have our cake and eat it, too! FormatHardDrive cannot be rolled back, so presumably the user has the intent of not rolling back ever after executing this operation. On the other hand, the user also deterministically asked for the transaction to roll back. This contradiction between operations that signal to the system that the transaction shouldn’t roll back and explicit user-initiated abort could be solved only in so many ways, each leading to an interesting programming model in its own right. We will explore these options in future blog posts but as a quick tip of the hat I would mention the work that our Intel colleagues have been doing on irrevocable actions. (Digression: we have a little bit of an issue with the term “irrevocable transactions” because the outcome of a transaction, in a traditional transaction system, cannot be decided while the transaction is executing. Anyway, the concept is clear and useful: this thing, which is not a transaction in the usual sense, becomes irrevocable.)
System-Initiated Abort
As a somewhat related but separate issue we have the question of whether it is desirable to have a programming model where system “problems” (e.g., inability to commit due to contention) are exposed to the user. If we think of atomic blocks as a sort of lock replacement, then one analogy to the lock world would be the Monitor.TryEnter API. This API allows the user to take action if a lock cannot be acquired within a given deadline.
In the Bartok STM programming model there really isn’t a way to “try” executing a transaction---control flow resumes at the point following the atomic block only after the transaction has successfully completed. This completion event is always consistent, no matter whether the transaction has committed or aborted---both events are consistent, which means that all completed transactions in the system can be ordered serially with respect to each other. The research community uses the term serializability for this property.
“Purists” would typically object to API’s such as Monitor.TryEnter or TryAtomic on the grounds that it promotes user-level backoff and retry, which should be better left to the system. The “realists” would argue that sometimes there are other things to do, if the lock cannot be taken. The “purists” would argue back then that the application should express the availability of work using scheduling constraints rather than synchronization constraints. This argument also gives rise to thoughts about higher level constructs, such as a construct that lets the system choose between different alternative transactions. This way if the system experiences contention executing one alternative it can choose another alternative and try it out.
As with other considerations, we seek successful precedents in two domains to guide us. These domains are:
1. The lock-based programming domain
2. The database transactions domain.
We have discussed the former, what does the latter has to say on this? In the database world there is the assumption that since the system is distributed, something can always go awry and therefore the user must always be ready to deal with failures. This is a big burden for the application developer that is many times dealt with by simply turning around and presenting the error to the end user. One has to wonder whether the same considerations apply in shared memory programming, where we have the working assumption that storage (caches and RAM) and communication paths (data flow through the system) are much more reliable.
Different Forms of “abort” and “commit” Statements
In the Bartok model the only way to commit a transaction is by reaching the right curly brace of an atomic block. People often ask whether it makes sense to offer a “commit” statement that commits the work done so far and continue from that point in a separate transaction. The analogy to monitor based programming is the ability to briefly release and reacquire a lock somewhere (dynamically) inside a lock (<obj>) {} statement. This is obviously a dangerous pattern and a source of bugs, but it is also essential for coding coordination patterns with Monitor.Wait and Monitor.PulseAll (the reader is reminded that when Wait is called, the target monitor lock is released and then reacquired after the monitor is pulsed by another thread).
Similarly, when considering an explicit “abort”, in the Bartok model a transaction may be aborted only by letting an exception escape beyond the right curly brace of an atomic block. Again we could ask whether we should allow explicit statements to abort a transaction either in addition or instead of abort-by-exception. It is useful to consider the implications of decisions here to call-based composition of code. The Bartok model offers a remarkably strong proposition in terms of composability: a caller is always in control of the work a callee is doing and its inclusion in the outcome of the transaction:
1. The caller can always discard the changes done by the callee by wrapping the callee in a nested transaction and aborting it if necessary.
2. The caller is also in full control of exceptional situations. It can always catch exceptions from the callee and if it knows how to handle them, it can decide to still commit.
The following code snippet illustrates:
void Caller() {
atomic {
try {
Callee();
}
catch (MyException e) {
// Do something alternative here
}
If (IDontLikeTheStateOfTheWorldNow()) {
throw new VetoException()
}
}
}
The caller sometimes accepts and sometimes rejects success of the callee. The caller sometimes accepts and sometimes rejects failure of the callee. The caller is in control. This allows a great deal of call-based composition.
Coordination Constructs.
The atomic block as presented thus far provides isolation and atomicity but it doesn’t offer much help with coordination of concurrent activities. i.e., how would you implement a blocking queue using transactions? Harris et al promote the retry statement as the central way to coordinate threads involved in transactions. The semantics are dead-simple: if you hit a retry statement then the transaction rolls back and re-executes. e.g., consider this example:
public class SingleCellQueue<T> : where T : class {
T m_item;
public void T Get() {
atomic {
T temp = m_item;
if (temp == null) retry;
m_item = null;
return temp;
}
}
public void T Put(T item) {
atomic {
if (temp != null) retry;
m_item = item;
}
}
}
Suppose a thread calls Get() and m_item is null. In this case the retry statement would be encountered, the transaction will roll-back and re-execute. At this point I can imagine visible unease in the crowd: is it just going to spin and re-execute, eating up my CPU and killing my battery? Well, hopefully not---an implementation can be infinitely dumb or infinitely smart about how to implement retry. But still, let’s recognize that the developer doesn’t tell the system when would be a good time to pulse waiters, unlike condition variables and monitors. Harris et al also explain in their paper on composable transactions how retry and orElse can be combined together to achieve composability of coordination, which is not possible with explicit condition variables. So on one hand this is a simplification for the developer, but on the other hand it’s costly it for the system to implement.
Another property of retry is that it rolls back anything done so far in the transaction, so it is not possible to externalize to the outside world why you’re waiting and what it is that you wish be done by parallel activities, which is a pattern common with condition variables.
What are some other approaches to coordination? One possibility is what I would term “abstinence”. According to a majority of people I have sampled blocking is to be discouraged and replaced with various dependency/constrained scheduling schemes for tasks (e.g. the .NET asynchronous programming model). The core question is whether you own the thread or not. If you don’t own the thread, then blocking should be avoided. If you do own the thread, then you need to ask yourself “why am I owning the thread instead of using cooperative task scheduling?” In other words, cooperative task scheduling, with the ability to describe and enforce task dependencies is the “wave of the future”. So maybe this means that providing a blocking construct for atomic blocks is a second-level concern and maybe as such, a solution which is least impactful for the system should be preferred, at least initially.
Another option is to define transactional counterparts to condition variables that you explicitly wait and pulse. Condition variables have the semantics of “committing” when you call Wait and thus they bring the question of unconditional commit to the forefront again.
Conclusion
In this post we have started skimming the surface of possible programming models for a managed language TM system. There are many other considerations that we will cover in the future. My goal was to provide a brief “insider” look at the factors that we are looking into as a preparation to more concrete questions about the specific choices we are making. Please feel free to follow-up with questions or comments on any of the above topics or others.
Comments
Anonymous
October 31, 2008
You seem to have assumed away the basic problems of STM. How are developers supposed to decide what atomic{} needs to be wrapped around, and how are they going to be any better off than using lock{} as a result (modulo deadlocks, which are pretty trivial to fix given the stack traces)?Anonymous
November 01, 2008
This isn't just the basic problem of STM -- it's the basic problem of programming: what are the invariants that hold on data, and where do they hold? Like locks, "atomic" needs to be wrapped around any region of code that accesses shared data, either a read access where we depend on getting an invariant-consistent view of multiple variables, or a write access where we temporarily violate (but then re-establish) invariants across shared variables. I'm sorry we don't really know how to make this easier (but then I don't think anyone does.) I do think you sell the advantages of TM over lock-based programming a little short. In my experience, maintaining lock orders in large programs can get very complicated. And in a lot of other situations you have to invent total orders on objects in order to have a lock order. (I have a set of bank accounts, each protected by a distinct lock, and I want to do a transfer from account A to account B -- which one do I lock first?) And there's another problem associated with locking that goes away with transactions: keeping track of what lock protects what data? That problem goes away with transactions, since you don't name locks explicitly. So programming with TM is still like programming with locks in many ways -- but without a lot of the complications.Anonymous
November 03, 2008
Hello Barrkel, It was my intention to elaborate on what we perceive as the advantages of TM only after we have described what TM is about to the less knowledgeable reader. So I wasn’t assuming this away as much as I was as saving it for later... You’re kind of jumping ahead of me. Having said that, I agree with the points Dave lists and we will elaborate on these and others. I’ll just list some additional buckets of value associated with TM and leave it at that for now: scalability through optimistic concurrency control, enhanced atomicity correctness (one of the points we are apparently in disagreement about) and enhanced reliability through failure atomicity. Not all advantages hold equally well in different embodiments of the technology. (Of course we’ll also have a hard look at the limitations of the technology, not only the benefits.)Anonymous
November 05, 2008
Re: "We will explore these options in future blog posts but as a quick tip of the hat I would mention the work that our Intel colleagues have been doing on irrevocable actions [4]" Must some link be here?Anonymous
November 06, 2008
I think that first thing one have to say wrt advantages of TM over locking it that 80% of lock-based code is not susceptible to dead-locks in any way shape or form. If code holds at most one lock at a time... well, what's the problem? Yes, there is 20% (or less) of involved situations, like classical example with bank accounts. And there are 2 time-proved simple solutions: ordered locking and hierarchical locking. Example with bank accounts can be solved trivially with ordered locking. Also it's possible to not hold read-locks, if we will allow more than 2 versions of object to coexist and writers will be just atomically replacing object with new version. If above doesn't help, or just gets too complicated, only here TM comes into play with it's non-susceptibility to dead-locks (live-locks aside for a moment). In involved situation TM can help indeed. But what's the problem with concurrent queue? Regarding lock-based queues and "scalability through optimistic concurrency control". Well, all STM systems I've seen to date a kind of trying to bring the whole system to it's knees by hardly saturating coherence interconnects at the first opportunity. All those global timestamps, global lists of in-flight transactions, global descriptor tables etc. Yeah, there is theoretical possibility that "transaction on this end of the queue can run concurrently with transaction on that end of the queue, because they work on different data". Will it take place? I mean real parallel work and real scalability, i.e. no modifications of global state, no reading of global state, only local processing. Especially taking into account that most current STM implementations are lock-based. I.e. if thread inside of transaction will be descheduled or will be waiting for HDD access (10 ms)... Doesn't look superior to lock-based solutions... Just another set of caveats :)Anonymous
November 06, 2008
Btw, I've described some of my thoughts about how it's probably possible to make STM really scalable (or at least get it close to performance of fine-grained locking) here: http://software.intel.com/en-us/forums/whatif-alpha-software/topic/60873/reply/64582/ and here: http://software.intel.com/en-us/forums/whatif-alpha-software/topic/61536/Anonymous
November 06, 2008
The comment has been removedAnonymous
November 06, 2008
Here is another interesting moment regarding semantics and features STM can provide - partial commits. Assume we are iterating over lengthy linked-list, probably doing some processing for every item, or just searching for some node. Further assume we don't need total global atomicity/consistency wrt whole list (otherwise probably it will be 100% livelock), we need only local atomicity/consistency. I.e., for example, when we are processing node, we definitely don't want to process already removed node; but we don't require that no nodes was inserted into beginning of the list. With locks we can do following. Lock node, process node, lock next node, unlock current node, process next node, lock next after next node and so on. But we can mimic it with "atomic {}" syntax, because per-node transactions are overlapping!Anonymous
November 08, 2008
The comment has been removedAnonymous
November 29, 2008
Very intriguing... I wonder how transactions in C# interface with low level system code written in C++? Say I have some utilities I wrote in C++ that are called from C#. Today I use locks in C# to avoid problems. What would be the means to do the same in atomic world? Would I need to change my C++ code?