Functional enumerators in Ruby

Functional enumerators in Ruby

Functional enumerators in Ruby

Ruby is a beautifully designed object-oriented programming language with heritage in Smalltalk, Perl, and Lisp among others. People are attracted to many different features in Ruby: almost everything in Ruby is an object, we have terse constructs for iterating through collections, and it's friendly to beginners and experts alike. One of the reasons people love programming in Ruby is the Enumerable module. While developers can loop though collections using more traditional constructs like for and while loops, idiomatic Ruby typically uses the methods found in Enumerable.

Enumerable adds a lot of methods to the class where it's included, and all of these methods are based on the each method. I've read (and also written) a lot of iterative code using each when there's a more elegant, explicit, and performant method that I could have used. Using the Array and Range classes as examples, I'll start with a brief overview of each and map and conclude with an overview of reduce.

What makes some of the iterators functional?

One of the many principles of functional programming is higher order functions. For a function to be a "higher order" function, it needs to either receive a function as an argument or return a function as the return value. The main idea is using functions as values.

In JavaScript we call these "anonymous functions" while in Ruby we call them lambdas and procs. In most functional languages I've surveyed and explored, they're just called lambdas. It's outside the scope of this post to go into detail on lambdas and procs in Ruby but there are a ton of great resources online that explain what they are and how to use them.

Now that we understand why some of Enumerable's iterators are functional, let's look at how to use them in our code.

Iterating with each

As I mentioned, Ruby allows developers to iterate through collections with traditional, imperative-style constructs like while and for. For example, using a Range with for:

$ irb
irb(main):001:0> for n in 0..5
irb(main):002:1>   puts n + 1
irb(main):003:1> end
1
2
3
4
5
6
=> 0..5

We have a range of integers from 0 to 5 and we print each number plus 1 to the console. This works just fine. This is how we would get the same result using each:

$ irb
irb(main):001:0> (0..5).each do |n|
irb(main):002:1*   puts n + 1
irb(main):003:1> end
1
2
3
4
5
6
=> 0..5

This does the exact same thing as the code with for: we print the values of every element in the collection incremented by 1. So why do many Rubyists say using each is better than for? Personally, I think it has to do with the mindset of the developer. Using each makes me think more about what I want the computer to do and less about how I want it to perform the operation. Thinking about the result I want creates a layer of abstraction so I can focus on the bigger picture of my current task.

This additional layer of abstraction can be shown by using another one of Enumerable's methods: map.

Using Enumerable's map

We can print each value of an array with each, but what if we wanted to save these incremented values into a new array? We know we should use each to iterate of the collection instead of for, so let's take a look at one approach to saving the incremented values using each:

$ irb
irb(main):001:0> incremented = []
=> []
irb(main):002:0> (0..5).each do |n|
irb(main):003:1*   incremented << n + 1
irb(main):004:1> end
=> 0..5
irb(main):005:0> incremented
=> [1, 2, 3, 4, 5, 6]

This works. We start with a variable to hold our new collection (incremented), iterate through each number in the range, and append each incremented number to the new array. However, in writing this example I found myself caught in the details of performing each step of the task instead of simply requesting the result I wanted: an array of incremented integers. From the world of functional programming, Enumerable provides map (also aliased as collect) to better manage this:

$ irb
irb(main):001:0> incremented = (0..5).map do |n|
irb(main):002:1*   n + 1
irb(main):003:1> end
=> [1, 2, 3, 4, 5, 6]

The result of calling map on the collection is exactly what we're looking for. In this case we save it to the incremented variable. Also, as Enumerable is tuned for better performance, we get those benefits for free. I don't care how the computer creates this new Array, I just care that each value is incremented by 1.

Using map also has a more succinct way of expressing this operation using Symbol#to_proc and Fixnum's succ method (you can also use next if you prefer):

$ irb
irb(main):001:0> incremented = (0..5).map(&:succ)
=> [1, 2, 3, 4, 5, 6]
irb(main):002:0> %w(foo bar baz).map(&:capitalize)
=> ["Foo", "Bar", "Baz"]

In case you haven't seen it yet, you can use the unary & operator with a symbol as the method name to call on each member of the collection. I won't go into the details here but you can check out the Symbol#to_proc documentation. (Keep in mind this only works if every element in the collection responds to the method. Otherwise you'll get a NoMethodError.)

This is pretty great! We can take a collection and easily create another collection with some modified values. In the same vein of map, Enumerable provides find, select, reject, etc. All are ways of building new collections from an existing collection and applying the same method to each of the collection's elements.

Sometimes we don't want a new collection, though. Instead, we want a singular value back. Ruby again borrows from the functional programming world again to provide reduce (aliased as inject): we take a collection and reduce all of it's values into a single value. Let's take a look at it.

Using reduce

I'll start with an example:

$ irb
irb(main):001:0> (0..5).reduce { |accumulator, n| accumulator + n }
=> 15
irb(main):002:0> (0..5).reduce(0) { |accumulator, n| accumulator + n }
=> 15
irb(main):003:0> (0..5).reduce(&:+)
=> 15
irb(main):004:0> (0..5).reduce(5, &:+)
=> 20

Here we take the same range we've been using 0..5 and add all of its numbers together to produce a single integer: 15. Let's look into the first example with the block a bit more: (0..5).reduce { |accumulator, n| accumulator + n }.

In this form, reduce doesn't take any arguments except the block. The block must declare an accumulator variable (I usually call it acc but opted for the more descriptive accumulator name in this case) and a variable for the member of the collection for each iteration, n in this case. reduce then sets the value of accumulator to the result of the block, in this case adding the current value of accumulator to n. This happens until all the values in the collection have been reached.

In the second form of the method (when 0 is passed to reduce), 0 is the initial value for accumulator, which is why the result is the same as the example above it.

The last two forms use the more terse (and I'd argue more readable) Symbol#to_proc syntax, one without explicitly setting the accumulators initial value and one with setting the initial value to 5.

Here's another way to get the same result using each:

$ irb
irb(main):001:0> accumulator = 0
=> 0
irb(main):002:0> result = (0..5).each do |n|
irb(main):003:1*   accumulator = accumulator + n
irb(main):004:1> end
=> 0..5
irb(main):005:0> accumulator
=> 15

As with our example for using map instead of each (when appropriate), using reduce frees us from telling the computer how to do the calculation. We tell it what we want in the end: the sum of every integer in the collection.

One more thing: Lazy Enumeration

Using a dynamically typed language like Ruby, we can create arrays with values of mixed types. We also encourage duck typing over checking an object's type. However, this language feature and programming best practice can create some unexpected consequences. For example, the + method is defined for both Strings and Integers but they can't be mixed. If we have an array of both Integers and Strings and call reduce on the array with +, we get a TypeError:

$ irb
irb(main):001:0> mixed = [1, 2, "foo", 3, 4, "bar", 5]
=> [1, 2, "foo", 3, 4, "bar", 5]
TypeError: String can't be coerced into Fixnum
        from (irb):6:in `+'
        from (irb):6:in `each'
        from (irb):6:in `reduce'
        from (irb):6
        from /Users/cmoel/.rubies/ruby-2.2.3/bin/irb:11:in `<main>'

This happens because + is defined for both Integers and Strings but it has a different (and incompatible) meaning for each class: addition for Integer and concatenation for String. While the solution to this problem is completely dependent on the needs of the application, we'll say we do not need the String values and will only keep the Integer values. We can do this with select:

$ irb
irb(main):001:0> mixed = [1, 2, "foo", 3, 4, "bar", 5]
=> [1, 2, "foo", 3, 4, "bar", 5]
irb(main):002:0> mixed.select{ |n| n.is_a?(Integer) }
=> [1, 2, 3, 4, 5]

Now we can do the same reduce we did previously:

$ irb
irb(main):001:0> mixed = [1, 2, "foo", 3, 4, "bar", 5]
=> [1, 2, "foo", 3, 4, "bar", 5]
irb(main):002:0> mixed.select{ |n| n.is_a?(Integer) }.reduce(&:+)
=> 15

There's a problem here, though. Each time we use one of the Enumerable functions, we fully traverse the collection. In the last example, we run through the collection twice, once for select and once for reduce. If mixed was a very large collection, we would eventually run out of memory and the program would crash. To solve this, in Ruby 2.0, Enumerator::Lazy was introduced. I won't give a full explanation in this post, but the gist is to call lazy on the collection before using any Enumerable methods:

$ irb
irb(main):001:0> mixed = [1, 2, "foo", 3, 4, "bar", 5]
=> [1, 2, "foo", 3, 4, "bar", 5]
irb(main):002:0> mixed.lazy.select{ |n| n.is_a?(Integer) }.reduce(&:+)
=> 120

Removing the call to reduce, we can see what we're actually building:

$ irb
irb(main):001:0> mixed = [1, 2, "foo", 3, 4, "bar", 5]
=> [1, 2, "foo", 3, 4, "bar", 5]
irb(main):002:0> mixed.lazy.select{ |n| n.is_a?(Integer) }
=> #<Enumerator::Lazy: #<Enumerator::Lazy: [1, 2, "foo", 3, 4, "bar", 5]>:select>

Calling reduce forces our Enumerator::Lazy to evaluate and perform the reduction. If we simply wanted a new array, we could call to_a on the Enumerator::Lazy to force evaluation. Check out the docs for more details. Pat Shaughnessy has also written a very detailed blog post on Enumerator::Lazy that I wholeheartedly recommend.

Conclusion

Ruby allows developers to write clear concise code following object-oriented principles and using procedural or functional principles as well. To me, using the more functionally-inspired methods from Enumerable makes the developer's intent more clear without getting stuck in the details. It also allows developers to think at a higher level about what they want instead of telling the computer how to create what they want, making for more declarative code.

What do you think? Are there any Enumerable methods you tend to reach for over others? Which ones are your favorites?

Categories: Software Development | Tags: Ruby, Functional programming

Portrait photo for Christopher Moeller Christopher Moeller

Christopher is a self-taught developer and has been working with Ruby and Ruby on Rails since 2010. He enjoys building the simple, elegant solution to current problem he's solving and has recently picked up functional programming, mostly with the Elixir programming language.

Comments


LET US HELP YOU!

We provide a free consultation to discover competitive advantages for your business. Contact us today to schedule an appointment.