Simple Recommendation Module in Elixir

Recommendation module in Elixir

The first little elixir module I wanted to write was one which implemented the Jaccard index. There will only be three methods in the module.

  1. Calculate the index between 2 sets. Arity 2: Set, Set.
  2. Calculate the distance between the 2 sets. Arity 2: Set, Set.
  3. Return a set from a list of sets that is most similar to the set in the first argument. Arity 2: Set, List of sets.

Before we get started, check out this guide on the Elixir’s page to create your first project. Work through it to understand the structure of an Elixir project.

Let’s get started

We will start by writing some tests to practice TDD. First, we have to create our base sets.

# Initialise sets for testing
defp s1 do
   Enum.into(["a", "b", "c"], HashSet.new)
end

defp s2 do
  Enum.into(["a", "b"], HashSet.new)
end

defp s3 do
  Enum.into(["a", "c"], HashSet.new)
end

Method 1: Calculate the index between 2 sets. Arity 2: Set, Set.

Next, we will use these sets to test our index method. The first test will comapre s1 (set 1) to itself. By understanding how the Jaccard index works, we will return a value of 3 / 3. How did we derive this value?

Sets A and B have 3 items each.

Between these 2 sets, there are 3 similar items in both the sets:
Set A intersect Set B = 3

In these 2 sets, there are 3 unique items:
Set A union Set B = 3

Therefore the result should be 3 / 3

Using this method of thinking, We will create 3 tests.

test "index test 1" do
  result = Jaccard.index s1, s1
  assert result == 3 / 3
end

test "index test 2" do
  result = Jaccard.index s1, s2
  assert result == 2 / 3
end

test "index test 3" do
  result = Jaccard.index s2, s3
  assert result == 1 / 3
end

Run mix test to ensure our tests are working.

$ mix test

  1) test index test 1 (JaccardTest)
     test/jaccard_test.exs:17
     ** (UndefinedFunctionError) undefined function: Jaccard.index/2
     stacktrace:
       (jaccard) Jaccard.index(#HashSet<["a", "b", "c"]>, #HashSet<["a", "b", "c"]>)
       test/jaccard_test.exs:18



  2) test index test 3 (JaccardTest)
     test/jaccard_test.exs:27
     ** (UndefinedFunctionError) undefined function: Jaccard.index/2
     stacktrace:
       (jaccard) Jaccard.index(#HashSet<["a", "b"]>, #HashSet<["a", "c"]>)
       test/jaccard_test.exs:28



  3) test index test 2 (JaccardTest)
     test/jaccard_test.exs:22
     ** (UndefinedFunctionError) undefined function: Jaccard.index/2
     stacktrace:
       (jaccard) Jaccard.index(#HashSet<["a", "b", "c"]>, #HashSet<["a", "b"]>)
       test/jaccard_test.exs:23



Finished in 0.04 seconds (0.04s on load, 0.00s on tests)
3 tests, 3 failures

Randomized with seed 669435

We have our working tests! Let’s write some code to pass these 3 tests. We wrote the specifications of our first method when thinking about our tests.

Intersection of Set A and B divided by union of Set A and Set B

def index(a, b) do
  Set.intersection(a, b).size / Set.union(a, b).size
end

Run mix test to check the tests pass.

$ mix test
Compiled lib/jaccard.ex
Generated jaccard.app
...

Finished in 0.2 seconds (0.1s on load, 0.01s on tests)
3 tests, 0 failures

Randomized with seed 899184

That’s great! We have just written our first Elixir module. Although there is only one method, it will work with all other Elixir projects if they import this module. That’s an achievement. Pat yourself on your back.

Method 2: Calculate the distance between the 2 sets. Arity 2: Set, Set.

Now, onwards with our second method. You may feel excited and rush forth to write this method but we should follow the practice of TDD.

Before writing tests for finding the distance between 2 sets. We should know what the term distance represents. The Jaccard distance, measures dissimilarity between sets by subtracting the Jaccard coefficient from 1 or by dividing the difference of the sizes of the union and the intersection of 2 sets by the size of the union

Alright, let’s just do that. Assert that the distance between the 2 sets is equals to 1 - the index from the same 2 sets.

test "distance 1" do
  assert (Jaccard.distance s1, s1) == (1.0 - Jaccard.index s1, s1)
end

test "distance 2" do
  assert (Jaccard.distance s1, s2) == (1.0 - Jaccard.index s1, s2)
end

test "distance 3" do
  assert (Jaccard.distance s2, s3) == (1.0 - Jaccard.index s2, s3)
end

We have to ensure our tests fail again.

$ mix test
Compiled lib/jaccard.ex
Generated jaccard.app


  1) test distance 2 (JaccardTest)
     test/jaccard_test.exs:36
     ** (UndefinedFunctionError) undefined function: Jaccard.distance/2
     stacktrace:
       (jaccard) Jaccard.distance(#HashSet<["a", "b", "c"]>, #HashSet<["a", "b"]>)
       test/jaccard_test.exs:37



  2) test distance 3 (JaccardTest)
     test/jaccard_test.exs:40
     ** (UndefinedFunctionError) undefined function: Jaccard.distance/2
     stacktrace:
       (jaccard) Jaccard.distance(#HashSet<["a", "b"]>, #HashSet<["a", "c"]>)
       test/jaccard_test.exs:41

.

  3) test distance 1 (JaccardTest)
     test/jaccard_test.exs:32
     ** (UndefinedFunctionError) undefined function: Jaccard.distance/2
     stacktrace:
       (jaccard) Jaccard.distance(#HashSet<["a", "b", "c"]>, #HashSet<["a", "b", "c"]>)
       test/jaccard_test.exs:33

..

Finished in 0.05 seconds (0.05s on load, 0.00s on tests)
6 tests, 3 failures

Randomized with seed 342134

This method is pretty straight forward. Just subtract the index of the 2 sets from 1.

def distance(a, b) do
  1.0 - index a, b
end

This should pass all our tests. Why do we do it this way? It ensures that any new code we write, will pass all previous tests as well as the new tests.

$ mix test
Compiled lib/jaccard.ex
Generated jaccard.app
......

Finished in 0.06 seconds (0.06s on load, 0.00s on tests)
6 tests, 0 failures

Randomized with seed 155573

Method 3: Return a set from a list of sets that is most similar to the set in the first argument. Arity 2: Set, List of sets.

Similar to the previous 2 methods, we write our tests first. We know what the result should be.

test "closest_to 1" do
  assert Jaccard.closest_to(s2, [s3, s1]) == s1
end

test "closest_to 2" do
  assert Jaccard.closest_to(s3, [s2, s1]) == s1
end

test "closest_to 3" do
  assert Jaccard.closest_to(s1, [s3, s2, s1]) == s1
end

Running our test suite again to make sure the new tests are failing.

$ mix test
Compiled lib/jaccard.ex
Generated jaccard.app


  1) test closest_to 1 (JaccardTest)
     test/jaccard_test.exs:44
     ** (UndefinedFunctionError) undefined function: Jaccard.closest_to/2
     stacktrace:
       (jaccard) Jaccard.closest_to(#HashSet<["a", "b"]>, [#HashSet<["a", "c"]>, #HashSet<["a", "b", "c"]>])
       test/jaccard_test.exs:45

....

  2) test closest_to 2 (JaccardTest)
     test/jaccard_test.exs:48
     ** (UndefinedFunctionError) undefined function: Jaccard.closest_to/2
     stacktrace:
       (jaccard) Jaccard.closest_to(#HashSet<["a", "c"]>, [#HashSet<["a", "b"]>, #HashSet<["a", "b", "c"]>])
       test/jaccard_test.exs:49

.

  3) test closest_to 3 (JaccardTest)
     test/jaccard_test.exs:52
     ** (UndefinedFunctionError) undefined function: Jaccard.closest_to/2
     stacktrace:
       (jaccard) Jaccard.closest_to(#HashSet<["a", "b", "c"]>, [#HashSet<["a", "c"]>, #HashSet<["a", "b"]>, #HashSet<["a", "b", "c"]>])
       test/jaccard_test.exs:53

.

Finished in 0.07 seconds (0.07s on load, 0.00s on tests)
9 tests, 3 failures

Randomized with seed 591730

For this method, we need some way to loop through the list of sets. Picking them off one by one to compare them with the first set. There is no for or foreach loop in Elixir. We have to use recursion because of Elixir’s Immunability.

We have to use a combination of concepts.

With list manipulation’s | operator, we can split our list into head and tail sections and process them independently.

Pattern matching the arity and contents of the arguments. The 3rd helper method is executed when the list is empty, thus returning the selected set.

Naming the private methods with a _ prefix helps us identify which methods are helper methods and not accessible from outside the module.

def closest_to(a, [head | tail]) do
  _closest_to a, tail, head
end

defp _closest_to(a, [head | tail], selected) do
  if (distance(a, selected) > distance(a, head)) do
    _closest_to a, tail, head
  else
    _closest_to a, tail, selected
  end
end

defp _closest_to(_, [], selected) do
  selected
end

Running the test suite again to check for regression and acceptance after our new method is written.

$ mix test
Compiled lib/jaccard.ex
Generated jaccard.app
.........

Finished in 0.07 seconds (0.07s on load, 0.00s on tests)
9 tests, 0 failures

Randomized with seed 909675

You are awesome! We have written 9 tests and 3 methods in Elixir with a basic understanding of functional programming.

Take aways

Our completed test suite.

defmodule JaccardTest do
  use ExUnit.Case, async: true

  # Initialise sets for testing
  defp s1 do
     Enum.into(["a", "b", "c"], HashSet.new)
  end

  defp s2 do
    Enum.into(["a", "b"], HashSet.new)
  end

  defp s3 do
    Enum.into(["a", "c"], HashSet.new)
  end

  test "index test 1" do
    result = Jaccard.index s1, s1
    assert result == 3 / 3
  end

  test "index test 2" do
    result = Jaccard.index s1, s2
    assert result == 2 / 3
  end

  test "index test 3" do
    result = Jaccard.index s2, s3
    assert result == 1 / 3
  end

  test "distance 1" do
    assert (Jaccard.distance s1, s1) == (1.0 - Jaccard.index s1, s1)
  end

  test "distance 2" do
    assert (Jaccard.distance s1, s2) == (1.0 - Jaccard.index s1, s2)
  end

  test "distance 3" do
    assert (Jaccard.distance s2, s3) == (1.0 - Jaccard.index s2, s3)
  end

  test "closest_to 1" do
    assert Jaccard.closest_to(s2, [s3, s1]) == s1
  end

  test "closest_to 2" do
    assert Jaccard.closest_to(s3, [s2, s1]) == s1
  end

  test "closest_to 3" do
    assert Jaccard.closest_to(s1, [s3, s2, s1]) == s1
  end
end

As well as our basic Jaccard module.

defmodule Jaccard do
  def index(a, b) do
    Set.intersection(a, b).size / Set.union(a, b).size
  end

  def distance(a, b) do
    1.0 - index a, b
  end

  def closest_to(a, [head | tail]) do
    _closest_to a, tail, head
  end

  defp _closest_to(a, [head | tail], selected) do
    if (distance(a, selected) > distance(a, head)) do
      _closest_to a, tail, head
    else
      _closest_to a, tail, selected
    end
  end

  defp _closest_to(_, [], selected) do
    selected
  end
end

Writing code in a functional programming paradigm helps developers write code that is short, fast, and maintainable. In this example, we used pattern matching, list manipulation and recursion to compute our result.

Yes, it might be confusing at first. Although once you understand it, the imperative way looks clunky.

If you have never tried functional programming, give Elixir a go and it will change how you think about programming.

Stanley Tan
@stnly