Software Engineering

A guide to writing real-life thread-safe code

When writing code that would be used by multiple processes simultaneously, it is important to make sure that code is thread-safe so that your application functions properly. In this article, I would explain thread-safety by showing a real-life thread-unsafe code and explaining ways to make that piece of code thread-safe.

 

How do we know if code is thread-safe?

We can tell that code is thread-safe if it only uses and updates shared resources in a way that guarantees safe execution by multiple threads at the same time. Shared resources could be a counter variable, an array or anything else.

 

What does code look like when it’s not thread-safe?

In this example, I won’t show the conventional counter++ example that is in textbooks but I’d show an example that is more relatable and can literally happen to anyone writing production code.

This example is in C#, but regardless of the programming language you’re working with, the concept remains the same.

The code above looks fine when you’re in a single-threaded environment. However, in a multithreaded or distributed environment where multiple processes will be calling your code simultaneously, this could actually be very dangerous and I would explain why.

In the case where we have 3 processes calling GetOrAddProduct simultaneously, the scenario described below could happen:

  1. Process A & Process C want to get or add Product A to the dictionary.
  2. Process B wants to get or add Product B to the dictionary.
  3. All three processes are started simultaneously.
  4. Process B gets to line 12 and sees that Product B doesn’t exist. It then jumps to line 16, gets a token and then goes to line 17 to upload the product. The upload process takes long so while Process B is still uploading…
  5. Process A gets to line 12 and sees that Product A doesn’t exist. It then jumps to line 16, gets a token and then goes to line 17 to upload the product. The upload process takes long so while Process A is still uploading…
  6. Process B is now done, adds Product B to the dictionary and exits the method.
  7. Process C gets to line 12 and sees that Product A doesn’t exist. It then jumps to line 16, gets a token and then goes to line 17 to upload the product. The upload process takes long so while Process C is still uploading…
  8. Process A is now done, adds Product A to the dictionary and exits the method.
  9. Process C is now done, adds Product A to the dictionary and exits the method.

In this scenario, two things can go wrong (and possibly have):

  1. Product A has been uploaded twice (or the second upload threw an Exception depending on how your upload logic is setup).
  2. Product A has been added to the list twice, so the size of the list is three instead of two.

This is called a race condition. In scenarios like this, we might be tempted to replace

storeProducts.Add(product.Name);

on line 18 with

if(!storeProducts.Contains(product.Name)){
  storeProducts.Add(product.Name);
}

however, this is not really scalable because you could decide to do this check based on the fact that this code is adding to a list. Imagine we had something like the code snippet below:

In the code snippet above, we are updating store revenue before adding the products to our list and there’s no direct way of checking if we’ve added a price to the overall revenue. This could be a disaster. Imagine a customer’s product worth $400,000,000.00 gets added twice. Audio money? Now, that’s a problem.

 

A more scalable solution

The more scalable solution is to write thread-safe code by adding synchronization to the part of your code that isn’t thread-safe. This helps protect access to shared resources. If a process owns a lock then it can access the protected shared resource. If a process does not own the lock then it cannot access the shared resource.

In our previous example, since Process B gets to the area of the unsafe code first, it acquires the lock and keeps executing. When Process B is done executing, it should release the lock for other processes. If Process A or C tries to acquire the lock when Process B is not done, it would have to wait.

There are a bunch of lockable objects, but I would be explaining two:

 

Mutex

A mutex is the short form of MUTual EXclusion. A mutex can be owned by one thread at a time. If we had to use a mutex to fix our code, it would look like this:

The code is wrapped in a try-finally block because regardless of what happens in our code, we want the code in the finally block to execute. If we do not wrap this in a try-finally block and  HelperClass.UploadProductDetails(token, product); throws an exception, the lock is never released and this could cause a deadlock, which is another concurrency problem. In basic terms, a deadlock means that processes waiting for a particular resource are blocked indefinitely.

 

Semaphores

A semaphore is basically a more advanced version of a mutex.

In the code snippet above, we created a semaphore with an initial count of 1 and a maximum count of 3.  This means that at any time, only three processes can gain access to the protected part of the code simultaneously. For our example, If we had to use a semaphore, we would create a semaphore with a maximum count of 1. If you’d like to learn more about semaphores, click here.

 

Conclusion

There are other ways to write thread-safe code in distributed or multithreaded environments. If you’d like to know more, I found a tutorial series that talks extensively about concurrency problems and fixing thread-safety issues.

 

Spread the love

2 thoughts on “A guide to writing real-life thread-safe code

  1. This is an awesome guide. Although I consider thread-safety in some of my C/C++ applications but never used Mutexes and Semaphores during the short period I worked with C# – I didn’t know it existed in C#, so another way I handled concurrency was using Akka.Net.

  2. Nice read ada. Very nice one.

    Welldone.

    But let me ask.

    will other processes have to wait? Well, that is what it is though.

    So what happens in a large transactional-based system. Eg Like banking where u can have over 5000 Requests/secs. Does this mean that request will have to wait?

    What is the best strategy to use for that?

    I am using HTTP as a use case.

Leave a Reply

Your email address will not be published. Required fields are marked *