Ruby & the Multicore CPU: Part Three - Processes and Threads

Posted: 28 Apr 2020 Updated: 28 Apr 2020 Verified: 28 Apr 2020

You can see all these code examples on github

Multiple processes

The simplest form of parallel processing is to simply create more processes. The OS will deal with spreading the processing across your available CPUs, and you don’t have to worry about shared memory, deadlocks, livelocks, or any of that stuff.

If you want to create separate processes from within your ruby code, you can use Process.fork. This will create a copy of your current process (except the existing threads) and execute the provided block.

ruby process forks fork_example.rb

4.times { |i|
  Process.fork {
    puts "Process.pid: #{Process.pid} Num: #{i}"
  }
  puts "Process.pid: #{Process.pid} Num: #{i}"
}

Process.pid: 88899 Num: 0
Process.pid: 88899 Num: 1
Process.pid: 88914 Num: 0
Process.pid: 88899 Num: 2
Process.pid: 88915 Num: 1
Process.pid: 88899 Num: 3
Process.pid: 88916 Num: 2
Process.pid: 88917 Num: 3

In fact, if you just want to execute some other arbitrary code on your system, you can use Process.spawn or Kernel.system

Processes, though, are slow to create, slow to manage context switching between processes, and use the most memory, but before ruby 3.0 having multiple processes is the only mechanism that will make use of more than one CPU core.

Unicorn is an example of concurrent and parallel processing using separate processes. I believe that spring also forks processes to create instances of rails that are waiting to processes your tests and rails commands without you waiting for the VM to spin up.

Threads & Mutexes

web_sites = [ 'google.com', 'reddit.com', 'news.bbc.co.uk', 'youtube.com', 'github.com' ]
threads = Array.new

websites.each do |site|
  threads << Thread.new { Webcrawler.parse(site) }
end

threads.each(&:join)

This simple example illustrates the power of the ruby thread. Instead of making an http request to each site, waiting for a response, then parsing the response, you can make all 5 requests at once, and start parsing as soon as the first request response is received.

However, what does Webcrawler.parse() do? Is it thread-safe? It’s a class method. Does that method write to any class variables? What happens if two threads write to those variables at the same time?

MRI ruby threads, whilst not useful for increasing the performance of CPU-bound tasks, can help to maximize the performance of IO-bound tasks. They are not easy to use safely, though.

Let’s have a look at some more examples, and do a little quiz.

Have a look at the following code and predict what will happen. What will the output look like? What will the last value printed be?

simple_counter_example.rb

class Counter
  def initialize
    @countup = 0
  end

  def increment
    @countup += 1
  end

  def countup
    @countup
  end
end

countup = Counter.new
threads = []

8000.times do
  threads << Thread.new { countup.increment; sleep 1 }
end
threads.map(&:join)
puts countup.countup

Trying running the code and see if you were correct. Try running the code 10 more times. Is the result always the same?

Here is my result running on ruby 2.7.0p0

You might expect that 8000 threads would be enough opportunities for a threading problem to occur. So is this code OK? I’ve run this a bunch of times, and the final output is always 8000.

Is that what you expected? Why did it work OK? Luck? Is += an atomic operation? Did the GIL save us?

Also note that even though each of the 8000 threads includes a sleep 1 instruction and the GIL is in effect because this is ruby 2.7.0, the program did not take 8000 seconds to complete. This is because the thread will yield on blocking IO and sleep instructions, allowing all the other threads to be created and yield to their own sleep instruction.

OK. Next example. We’ll change the implementation of the countup object.

struct_counter_example.rb

StructCounter = Struct.new(:countup) do
  def increment
    self.countup += 1
  end
end

countup = StructCounter.new(0)

8000.times do
  threads << Thread.new { countup.increment; sleep 1 }
end
threads.map(&:join)
puts countup.countup

Here, we’ve only changed the object doing the counting. The thread loop is identical. Should this change make any difference? Again, I encourage you to think about what result you expect to see, then run the code and see what you get.

Here is the output I got, again ruby 2.7.0p0

So, this time, threading has caused a race condition problem. Any idea what caused it? 26 errors out of 8000 threads is quite a small just over 0.3%, which could easily be missed without a good test.

My theory as to why the first example seems to work and the second doesn’t is that a Struct object takes more CPU cycles to access and modify that the original class. This would mean that each thread is around for a longer time, allowing greater opportunity for a race condition to occur.¹

The first example never exhibited a problem whenever I tested it. However, neither example is actually thread-safe. I’ve no reason to believe that the underlying MRI implementation of Class vs Struct makes the former any more thread safe. So, you might write code like that and only ever have a problem every now and then in production, with no way to track down what is occurring.

Matz regrets adding threads

I think this highlights how treacherous writing threaded code can be, but let’s finish with an egregious example and crank up the time that the counter object is spending executing the threaded code.

unsafe_counter_example.rb

class UnsafeCounter
  attr_accessor :countup
  def initialize
    @countup = 0
  end

  def increment
    temp = @countup
    sleep 0.1
    @countup = temp + 1
  end
end

countup = UnsafeCounter.new

8000.times do
  threads << Thread.new { countup.increment; sleep 1 }
end
threads.map(&:join)
puts countup.countup

In this example, we’re adding a sleep period inside the counter object, to simulate some significant amount of processing that you might want to do on some piece of data.

The result this time is:

This is a disaster. Almost all the threads are created and read the data before any have had time to modify the result and save the value back to the shared variable.

So, how to we fix it?

safer_counter_example.rb

class SaferCounter
  attr_accessor :countup
  def initialize
    @countup = 0
    @lock = Mutex.new
  end

  def increment
    @lock.synchronize do
      temp = @countup
      sleep 0.1
      @countup = temp + 1
    end
  end
end

We can create a mutually exclusive lock and synchronize the code, so that the reading, modifying and writing of the shared variable can only be done by one thread at a time. All other threads have to wait their turn.

This time, the result is:

…if you have the patience to wait the 13m20s for the program to execute. Since there is a sleep within each synchronised thread, and the GIL means that only one thread can run, the whole thing takes ages! This highlights MRIs shortcomings when it comes to processing of threads that are CPU-bound.

OK, one last concept to demonstrate.

We’ll simulate an IO bottleneck by adding another wait period in the counter. Let’s pretend that that code is simulating waiting for a response from an http service or a user input. This input is not shared, and therefore doesn’t need to be within the synchronised bit of code. Also, let’s adjust the timings of the sleep calls, so that our test is a bit more bearable.

io_simulation_counter_example.rb

class IOCounter
  attr_accessor :countup
  def initialize
    @countup = 0
    @lock = Mutex.new
  end

  def increment
    sleep 0.1
    @lock.synchronize do
      temp = @countup
      sleep 0.001
      @countup = temp + 1
    end
  end
end

countup = IOCounter.new
threads=[]

8000.times do
  threads << Thread.new { countup.increment; sleep 1 }
end
threads.map(&:join)
puts countup.countup

Here is a representative example of the results with some timing information.

⚙[~/blog]>posts/ruby_concurrency ●> rbenv local 2.7.0
⚙[~/blog]>posts/ruby_concurrency ●> time ruby io_simulation_counter.rb
8000
ruby test2.rb  0.59s user 12.19s system 80% cpu 15.782 total
⚙[~/blog]>posts/ruby_concurrency ●> rbenv local jruby-9.2.10.0
⚙[~/blog]>posts/ruby_concurrency ●> time ruby io_simulation_counter.rb
8000
ruby test2.rb  10.44s user 5.57s system 106% cpu 15.019 total

Even in a little test like this, JRuby manages to take advantage of native threads to run on more than 1 CPU and execute faster.

For a nicely thorough look at multi-threading, see this datanorris description

Conclusion

Threads allow your software to make more efficient use of IO, and can increase throughput on a CPU core. Multiple processes allow you take utilize more that one CPU core at once. A combination of both allows for both concurrent and parallel computation, and the most effective use of the CPU resources available to you.

This post has gone on longer than I expected, so I’ll leave a look at the ruby community’s most common approaches to make best use of concurrent and parallel processing to the next post.

Introduction
1. Concurrency vs. Parallelism
2. Processes
3. Threads
4. Fibers
5. Synchronicity
Ruby up to 2.7
1. MRI Ruby
2. JRuby
3. Rubinius
4. TruffleRuby
Processes and Threads
Current Concurrency Paradigms
1. Queues and Jobs
2. Communicating Sequential Processes
3. Actor Model
4. Reactor Model
Ruby 3.0 Concurrency and Parallelism

Original image & license information

Is this true? Let’s do a quick check.

benchmarks.rb

require 'benchmark'

counter = Counter.new
structcounter = StructCounter.new(0)

Benchmark.bmbm do |b|
  b.report('counter') do
    1_000_000.times { structcounter.increment }
  end
  b.report('struct') do
    1_000_000.times { structcounter.increment }
  end
end

Rehearsal -------------------------------------------
counter   0.059883   0.000218   0.060101 (  0.060268)
struct    0.094432   0.000519   0.094951 (  0.095450)
---------------------------------- total: 0.155052sec

              user     system      total        real
counter   0.059973   0.000259   0.060232 (  0.060450)
struct    0.091966   0.000209   0.092175 (  0.092393)

So… maybe? ¯\(ツ)/¯

Concurrency is hard. ↩

MEHColeman

Ruby & the Multicore CPU: Part Three - Processes and Threads

Multiple processes

Threads & Mutexes

Conclusion

Related Posts

Ruby & the Multicore CPU: Part One - Introduction 24 Apr 2020

Google-Fu: My RSpec script test is failing! 22 Dec 2019

Keeping Code Up To Date 20 Jan 2020

Love Your Self 21 Feb 2015