Ruby & the Multicore CPU: Part Three - Processes and Threads
You can see all these code examples on github
Multiple processes
The simplest form of parallel processing is to simply create more processes. The OS will deal with spreading the processing across your available CPUs, and you don’t have to worry about shared memory, deadlocks, livelocks, or any of that stuff.
If you want to create separate processes from within your ruby code, you can
use Process.fork
. This will create a copy of your current process (except
the existing threads) and execute the provided block.
ruby process forks fork_example.rb
4.times { |i|
Process.fork {
puts "Process.pid: #{Process.pid} Num: #{i}"
}
puts "Process.pid: #{Process.pid} Num: #{i}"
}
Process.pid: 88899 Num: 0
Process.pid: 88899 Num: 1
Process.pid: 88914 Num: 0
Process.pid: 88899 Num: 2
Process.pid: 88915 Num: 1
Process.pid: 88899 Num: 3
Process.pid: 88916 Num: 2
Process.pid: 88917 Num: 3
In fact, if you just want to execute some other arbitrary code on your system,
you can use Process.spawn
or Kernel.system
Processes, though, are slow to create, slow to manage context switching between processes, and use the most memory, but before ruby 3.0 having multiple processes is the only mechanism that will make use of more than one CPU core.
Unicorn is an example of concurrent and parallel processing using separate processes. I believe that spring also forks processes to create instances of rails that are waiting to processes your tests and rails commands without you waiting for the VM to spin up.
Threads & Mutexes
web_sites = [ 'google.com', 'reddit.com', 'news.bbc.co.uk', 'youtube.com', 'github.com' ]
threads = Array.new
websites.each do |site|
threads << Thread.new { Webcrawler.parse(site) }
end
threads.each(&:join)
This simple example illustrates the power of the ruby thread. Instead of making an http request to each site, waiting for a response, then parsing the response, you can make all 5 requests at once, and start parsing as soon as the first request response is received.
However, what does Webcrawler.parse()
do? Is it thread-safe? It’s a class
method. Does that method write to any class variables? What happens if two
threads write to those variables at the same time?
MRI ruby threads, whilst not useful for increasing the performance of CPU-bound tasks, can help to maximize the performance of IO-bound tasks. They are not easy to use safely, though.
Let’s have a look at some more examples, and do a little quiz.
Have a look at the following code and predict what will happen. What will the output look like? What will the last value printed be?
class Counter
def initialize
@countup = 0
end
def increment
@countup += 1
end
def countup
@countup
end
end
countup = Counter.new
threads = []
8000.times do
threads << Thread.new { countup.increment; sleep 1 }
end
threads.map(&:join)
puts countup.countup
Trying running the code and see if you were correct. Try running the code 10 more times. Is the result always the same?
Here is my result running on ruby 2.7.0p0
8000
You might expect that 8000 threads would be enough opportunities for a threading problem to occur. So is this code OK? I’ve run this a bunch of times, and the final output is always 8000.
Is that what you expected? Why did it work OK? Luck? Is += an atomic operation? Did the GIL save us?
Also note that even though each of the 8000 threads includes a sleep 1
instruction and the GIL is in effect because this is ruby 2.7.0, the program
did not take 8000 seconds to complete. This is because the thread will yield on
blocking IO and sleep instructions, allowing all the other threads to be
created and yield to their own sleep instruction.
OK. Next example. We’ll change the implementation of the countup object.
StructCounter = Struct.new(:countup) do
def increment
self.countup += 1
end
end
countup = StructCounter.new(0)
8000.times do
threads << Thread.new { countup.increment; sleep 1 }
end
threads.map(&:join)
puts countup.countup
Here, we’ve only changed the object doing the counting. The thread loop is identical. Should this change make any difference? Again, I encourage you to think about what result you expect to see, then run the code and see what you get.
Here is the output I got, again ruby 2.7.0p0
7974
So, this time, threading has caused a race condition problem. Any idea what caused it? 26 errors out of 8000 threads is quite a small just over 0.3%, which could easily be missed without a good test.
My theory as to why the first example seems to work and the second doesn’t is that a Struct object takes more CPU cycles to access and modify that the original class. This would mean that each thread is around for a longer time, allowing greater opportunity for a race condition to occur.1
The first example never exhibited a problem whenever I tested it. However, neither example is actually thread-safe. I’ve no reason to believe that the underlying MRI implementation of Class vs Struct makes the former any more thread safe. So, you might write code like that and only ever have a problem every now and then in production, with no way to track down what is occurring.
I think this highlights how treacherous writing threaded code can be, but let’s finish with an egregious example and crank up the time that the counter object is spending executing the threaded code.
class UnsafeCounter
attr_accessor :countup
def initialize
@countup = 0
end
def increment
temp = @countup
sleep 0.1
@countup = temp + 1
end
end
countup = UnsafeCounter.new
8000.times do
threads << Thread.new { countup.increment; sleep 1 }
end
threads.map(&:join)
puts countup.countup
In this example, we’re adding a sleep period inside the counter object, to simulate some significant amount of processing that you might want to do on some piece of data.
The result this time is:
21
This is a disaster. Almost all the threads are created and read the data before any have had time to modify the result and save the value back to the shared variable.
So, how to we fix it?
class SaferCounter
attr_accessor :countup
def initialize
@countup = 0
@lock = Mutex.new
end
def increment
@lock.synchronize do
temp = @countup
sleep 0.1
@countup = temp + 1
end
end
end
We can create a mutually exclusive lock and synchronize the code, so that the reading, modifying and writing of the shared variable can only be done by one thread at a time. All other threads have to wait their turn.
This time, the result is:
8000
…if you have the patience to wait the 13m20s for the program to execute. Since there is a sleep within each synchronised thread, and the GIL means that only one thread can run, the whole thing takes ages! This highlights MRIs shortcomings when it comes to processing of threads that are CPU-bound.
OK, one last concept to demonstrate.
We’ll simulate an IO bottleneck by adding another wait period in the counter. Let’s pretend that that code is simulating waiting for a response from an http service or a user input. This input is not shared, and therefore doesn’t need to be within the synchronised bit of code. Also, let’s adjust the timings of the sleep calls, so that our test is a bit more bearable.
io_simulation_counter_example.rb
class IOCounter
attr_accessor :countup
def initialize
@countup = 0
@lock = Mutex.new
end
def increment
sleep 0.1
@lock.synchronize do
temp = @countup
sleep 0.001
@countup = temp + 1
end
end
end
countup = IOCounter.new
threads=[]
8000.times do
threads << Thread.new { countup.increment; sleep 1 }
end
threads.map(&:join)
puts countup.countup
Here is a representative example of the results with some timing information.
⚙[~/blog]>posts/ruby_concurrency ●> rbenv local 2.7.0
⚙[~/blog]>posts/ruby_concurrency ●> time ruby io_simulation_counter.rb
8000
ruby test2.rb 0.59s user 12.19s system 80% cpu 15.782 total
⚙[~/blog]>posts/ruby_concurrency ●> rbenv local jruby-9.2.10.0
⚙[~/blog]>posts/ruby_concurrency ●> time ruby io_simulation_counter.rb
8000
ruby test2.rb 10.44s user 5.57s system 106% cpu 15.019 total
Even in a little test like this, JRuby manages to take advantage of native threads to run on more than 1 CPU and execute faster.
For a nicely thorough look at multi-threading, see this datanorris description
Conclusion
Threads allow your software to make more efficient use of IO, and can increase throughput on a CPU core. Multiple processes allow you take utilize more that one CPU core at once. A combination of both allows for both concurrent and parallel computation, and the most effective use of the CPU resources available to you.
This post has gone on longer than I expected, so I’ll leave a look at the ruby community’s most common approaches to make best use of concurrent and parallel processing to the next post.
- Introduction
- Concurrency vs. Parallelism
- Processes
- Threads
- Fibers
- Synchronicity
- Ruby up to 2.7
- MRI Ruby
- JRuby
- Rubinius
- TruffleRuby
- Current Concurrency Paradigms
- Queues and Jobs
- Communicating Sequential Processes
- Actor Model
- Reactor Model
- Ruby 3.0 Concurrency and Parallelism
Original image & license information
-
Is this true? Let’s do a quick check.
require 'benchmark' counter = Counter.new structcounter = StructCounter.new(0) Benchmark.bmbm do |b| b.report('counter') do 1_000_000.times { structcounter.increment } end b.report('struct') do 1_000_000.times { structcounter.increment } end end
Rehearsal ------------------------------------------- counter 0.059883 0.000218 0.060101 ( 0.060268) struct 0.094432 0.000519 0.094951 ( 0.095450) ---------------------------------- total: 0.155052sec user system total real counter 0.059973 0.000259 0.060232 ( 0.060450) struct 0.091966 0.000209 0.092175 ( 0.092393)
So… maybe? ¯\(ツ)/¯
Concurrency is hard. ↩