Question For Every Ruby (on Rails) Developer - Is it pass by value or pass by reference?

Before you begin

Before you answer this question, think about if you had given any thought on how parameters are passed around in ruby. If you have never given it a thought but still had been using ruby for a long time, now get the answer in your mind before you read further.

People saying pass by value - you are wrong.

People saying pass by reference - you are also wrong.

Then? Read on.

Ruby and its essence

In ruby everything is an object. Even the primitive value integer is an object. Variables are nothing but references to objects' memory location. Consider the following:

a = { :message => 'success' }

In here a is a variable and points to a memory location in the ruby's object space. You can see that by doing this:

2.7.0 :039 > a.object_id

 => 320 

Remember there's a object_id for every variable, even for variables having nil as their value.

2.7.0 :040 > a = nil

2.7.0 :041 > a.object_id

 => 8

The object_id is supposed to change everytime the ruby process is rerun and because the memory allocation happens during the runtime, the object_id will be different for various runs of the ruby process. But, the object_id of certain objects like false, true, nil, integers upto a limit are all of the same id always. You can ask why. There is an internal magic that happens which makes always of the same object_id in MRI. I would post another article about it later because it would take our discussion somewhere else. But you can remember this: object_id is always different for objects that we create except false, true, nil and some integers.

Heap, stack, what?

If you had known about how application (ie, process) memory is being allocated, you would know that there are 2 types of memory that are used - stacks and heaps. Stacks are used for method calls and thus helps to push objects into the stack during method invocation and pops the objects after the method returns (i am keeping it in very simple terms). Heaps are used for bigger objects - typically a pointer is stored on the stack while the memory is allocated on the heap. A good example would be to use a c program to explain this.

int main() {
  int a = 42;
  int *p;
  p = (int*)malloc(sizeof(int));

This program would approximately create a memory space like this:


The stack gets the variables a and p in that order. Since a contains an integer value, its stored in the stack. But p is a pointer to the heap memory. It contains the reference to the memory location in the heap. The heap contains the memory block that contains the actual data.

This is how it happens typically in a C Program.

Let's consider how this is different in ruby. The memory space in a ruby process is stored a little bit differently. Ruby's memory space consists of heap that can further divided into 2 types. Please remember I am talking about the MRI version of ruby (which is built using c) and other versions of ruby will have different implementation. 

You can see that there's a managed heap and an unmanaged heap. It would be great to understand what happens inside the managed heap. So lets focus on it for a moment. Also you can see that there's no stack that's being used. You will understand why in a little bit.

Ruby's managed heap consists of pages and each page consists of slots of 40 bytes each. It's easier to explain this using a diagram:

Pages are of static size. Usually 16KiB. And each page contains slots of 40 bytes. So a page contain 407 or 408 slots (why the difference in some cases is for another post). So lets see how pages look.

The pages can be full or can be partially used or totally free. The slots are used when objects are allocated. So when you say `a = 'hello'`, a free slot is found and the value is stored into that slot. Now the slot becomes occupied/used. 

So lets consider the ruby program -

a = 'hello'
b = {}
c = { message: 'hi' }
d = '', password: 'somethingrandom')

All these objects are stored into the ruby's managed heap. So the variable a could point to an object_id of lets say 40. b could point to the next free slot of 80. c could point to free slot of 160. And d could point to 240.

You can ask me how the entire object of User class is stored within the 40 bytes? The answer is that it is not stored in the managed heap. But rather the memory is allocated in the unmanaged heap and the memory location is stored in the slot instead. So it becomes like a reference to the object in the unmanaged heap. 

So the thumb rule is this - if it can be stored within the 40B slot, it will be stored there. Otherwise it will be allocated memory in the unmanaged heap and stored there. The reference will be stored in the slot though.

A quick note: Hashes are stored as a slot and the slot contains references to other slots for their k-v pairs. Similary small arrays are stored in the slot itself otherwise unmanaged heap is used.

So this is how ruby's memory allocation happens in a nutshell.

Reference or Value?

So when you are calling a method in ruby, the parameters are copied from the source variables but they are utimately references to the slots. We can show it this way.

2.7.0 :042 > a = { :message => 'success' }
2.7.0 :043 > a.object_id
 => 340 
2.7.0 :044 > def test(val)
2.7.0 :045 >   puts val.object_id
2.7.0 :046 > end
 => :test 
2.7.0 :047 > test(a)
 => nil 
2.7.0 :048 > 

You can see that the parameter inside the method call also contains the same object_id as the outside variable that was passed to it. Both the parameter and the outside variable point to the same slot.

Does this mean pass by reference? Actually no. 


2.7.0 :052 > a = { :message => 'success' }

2.7.0 :053 > a.object_id

 => 340 

2.7.0 :054 > def test(val)

2.7.0 :055 >   val = { :message => 'another value' }

2.7.0 :056 >   puts val.object_id

2.7.0 :057 > end

 => :test 

2.7.0 :058 > test(a)


 => nil 

When you are reassigning the paremeter inside the method, you are actually allocating a new slot. So the object_id actually changes to the next free slot (here 360 instead of the actual 340). What does that mean? The variable when its reassigned, its given a new slot and the old slot (340) is never changed.

So pass by value? Actually no.

Consider this,

2.7.0 :059 > a = { :message => 'success' }

2.7.0 :060 > def test(val)

2.7.0 :061 >   val[:added] = 'yes'

2.7.0 :062 > end

 => :test 

2.7.0 :063 > a

 => {:message=>"success"} 

2.7.0 :064 > test(a)

 => "yes" 

2.7.0 :065 > a

 => {:message=>"success", :added=>"yes"}

You can see that the original hash is changed while the parameter value is modified. So why does this happen?

The answer is simple - ruby copies the slot object_id during method invocations to the parameters. But this is a new variable and not a reference to the original variable. But since the new variable points to the same page slot, any modifications you do to this object is also done on the original variable. But when you reassign the parameter variable to a new slot (or object), the original variable is not affected because the object_id of the parameter variable is changed and not the original variable.

So this is exactly what happens in ruby. So what exactly is it? Some call it pass by reference value, others call it pass by object reference. But I call it pass by object_id. Makes it easier to remember.

Remember: It's pass by object_id 

Talk to me on TwitterFacebookLinkedIn or Website