The first little elixir module I wanted to write was one which implemented the Jaccard index. There will only be three methods in the module.
Before we get started, check out this guide on the Elixir’s page to create your first project. Work through it to understand the structure of an Elixir project.
We will start by writing some tests to practice TDD. First, we have to create our base sets.
# Initialise sets for testing
defp s1 do
Enum.into(["a", "b", "c"], HashSet.new)
end
defp s2 do
Enum.into(["a", "b"], HashSet.new)
end
defp s3 do
Enum.into(["a", "c"], HashSet.new)
end
Next, we will use these sets to test our index
method. The first test will comapre s1 (set 1) to itself. By understanding how the Jaccard index works, we will return a value of 3 / 3. How did we derive this value?
Sets A and B have 3 items each.
Between these 2 sets, there are 3 similar items in both the sets:
Set A intersect Set B = 3
In these 2 sets, there are 3 unique items:
Set A union Set B = 3
Therefore the result should be 3 / 3
Using this method of thinking, We will create 3 tests.
test "index test 1" do
result = Jaccard.index s1, s1
assert result == 3 / 3
end
test "index test 2" do
result = Jaccard.index s1, s2
assert result == 2 / 3
end
test "index test 3" do
result = Jaccard.index s2, s3
assert result == 1 / 3
end
Run mix test
to ensure our tests are working.
$ mix test
1) test index test 1 (JaccardTest)
test/jaccard_test.exs:17
** (UndefinedFunctionError) undefined function: Jaccard.index/2
stacktrace:
(jaccard) Jaccard.index(#HashSet<["a", "b", "c"]>, #HashSet<["a", "b", "c"]>)
test/jaccard_test.exs:18
2) test index test 3 (JaccardTest)
test/jaccard_test.exs:27
** (UndefinedFunctionError) undefined function: Jaccard.index/2
stacktrace:
(jaccard) Jaccard.index(#HashSet<["a", "b"]>, #HashSet<["a", "c"]>)
test/jaccard_test.exs:28
3) test index test 2 (JaccardTest)
test/jaccard_test.exs:22
** (UndefinedFunctionError) undefined function: Jaccard.index/2
stacktrace:
(jaccard) Jaccard.index(#HashSet<["a", "b", "c"]>, #HashSet<["a", "b"]>)
test/jaccard_test.exs:23
Finished in 0.04 seconds (0.04s on load, 0.00s on tests)
3 tests, 3 failures
Randomized with seed 669435
We have our working tests! Let’s write some code to pass these 3 tests. We wrote the specifications of our first method when thinking about our tests.
Intersection of Set A and B divided by union of Set A and Set B
def index(a, b) do
Set.intersection(a, b).size / Set.union(a, b).size
end
Run mix test
to check the tests pass.
$ mix test
Compiled lib/jaccard.ex
Generated jaccard.app
...
Finished in 0.2 seconds (0.1s on load, 0.01s on tests)
3 tests, 0 failures
Randomized with seed 899184
That’s great! We have just written our first Elixir module. Although there is only one method, it will work with all other Elixir projects if they import this module. That’s an achievement. Pat yourself on your back.
Now, onwards with our second method. You may feel excited and rush forth to write this method but we should follow the practice of TDD.
Before writing tests for finding the distance between 2 sets. We should know what the term distance represents. The Jaccard distance, measures dissimilarity between sets by subtracting the Jaccard coefficient from 1 or by dividing the difference of the sizes of the union and the intersection of 2 sets by the size of the union
Alright, let’s just do that. Assert that the distance between the 2 sets is equals to 1 - the index from the same 2 sets.
test "distance 1" do
assert (Jaccard.distance s1, s1) == (1.0 - Jaccard.index s1, s1)
end
test "distance 2" do
assert (Jaccard.distance s1, s2) == (1.0 - Jaccard.index s1, s2)
end
test "distance 3" do
assert (Jaccard.distance s2, s3) == (1.0 - Jaccard.index s2, s3)
end
We have to ensure our tests fail again.
$ mix test
Compiled lib/jaccard.ex
Generated jaccard.app
1) test distance 2 (JaccardTest)
test/jaccard_test.exs:36
** (UndefinedFunctionError) undefined function: Jaccard.distance/2
stacktrace:
(jaccard) Jaccard.distance(#HashSet<["a", "b", "c"]>, #HashSet<["a", "b"]>)
test/jaccard_test.exs:37
2) test distance 3 (JaccardTest)
test/jaccard_test.exs:40
** (UndefinedFunctionError) undefined function: Jaccard.distance/2
stacktrace:
(jaccard) Jaccard.distance(#HashSet<["a", "b"]>, #HashSet<["a", "c"]>)
test/jaccard_test.exs:41
.
3) test distance 1 (JaccardTest)
test/jaccard_test.exs:32
** (UndefinedFunctionError) undefined function: Jaccard.distance/2
stacktrace:
(jaccard) Jaccard.distance(#HashSet<["a", "b", "c"]>, #HashSet<["a", "b", "c"]>)
test/jaccard_test.exs:33
..
Finished in 0.05 seconds (0.05s on load, 0.00s on tests)
6 tests, 3 failures
Randomized with seed 342134
This method is pretty straight forward. Just subtract the index of the 2 sets from 1.
def distance(a, b) do
1.0 - index a, b
end
This should pass all our tests. Why do we do it this way? It ensures that any new code we write, will pass all previous tests as well as the new tests.
$ mix test
Compiled lib/jaccard.ex
Generated jaccard.app
......
Finished in 0.06 seconds (0.06s on load, 0.00s on tests)
6 tests, 0 failures
Randomized with seed 155573
Similar to the previous 2 methods, we write our tests first. We know what the result should be.
test "closest_to 1" do
assert Jaccard.closest_to(s2, [s3, s1]) == s1
end
test "closest_to 2" do
assert Jaccard.closest_to(s3, [s2, s1]) == s1
end
test "closest_to 3" do
assert Jaccard.closest_to(s1, [s3, s2, s1]) == s1
end
Running our test suite again to make sure the new tests are failing.
$ mix test
Compiled lib/jaccard.ex
Generated jaccard.app
1) test closest_to 1 (JaccardTest)
test/jaccard_test.exs:44
** (UndefinedFunctionError) undefined function: Jaccard.closest_to/2
stacktrace:
(jaccard) Jaccard.closest_to(#HashSet<["a", "b"]>, [#HashSet<["a", "c"]>, #HashSet<["a", "b", "c"]>])
test/jaccard_test.exs:45
....
2) test closest_to 2 (JaccardTest)
test/jaccard_test.exs:48
** (UndefinedFunctionError) undefined function: Jaccard.closest_to/2
stacktrace:
(jaccard) Jaccard.closest_to(#HashSet<["a", "c"]>, [#HashSet<["a", "b"]>, #HashSet<["a", "b", "c"]>])
test/jaccard_test.exs:49
.
3) test closest_to 3 (JaccardTest)
test/jaccard_test.exs:52
** (UndefinedFunctionError) undefined function: Jaccard.closest_to/2
stacktrace:
(jaccard) Jaccard.closest_to(#HashSet<["a", "b", "c"]>, [#HashSet<["a", "c"]>, #HashSet<["a", "b"]>, #HashSet<["a", "b", "c"]>])
test/jaccard_test.exs:53
.
Finished in 0.07 seconds (0.07s on load, 0.00s on tests)
9 tests, 3 failures
Randomized with seed 591730
For this method, we need some way to loop through the list of sets. Picking them off one by one to compare them with the first set. There is no for
or foreach
loop in Elixir. We have to use recursion because of Elixir’s Immunability.
We have to use a combination of concepts.
With list manipulation’s |
operator, we can split our list into head and tail sections and process them independently.
Pattern matching the arity and contents of the arguments. The 3rd helper method is executed when the list is empty, thus returning the selected set.
Naming the private methods with a _
prefix helps us identify which methods are helper methods and not accessible from outside the module.
def closest_to(a, [head | tail]) do
_closest_to a, tail, head
end
defp _closest_to(a, [head | tail], selected) do
if (distance(a, selected) > distance(a, head)) do
_closest_to a, tail, head
else
_closest_to a, tail, selected
end
end
defp _closest_to(_, [], selected) do
selected
end
Running the test suite again to check for regression and acceptance after our new method is written.
$ mix test
Compiled lib/jaccard.ex
Generated jaccard.app
.........
Finished in 0.07 seconds (0.07s on load, 0.00s on tests)
9 tests, 0 failures
Randomized with seed 909675
You are awesome! We have written 9 tests and 3 methods in Elixir with a basic understanding of functional programming.
Our completed test suite.
defmodule JaccardTest do
use ExUnit.Case, async: true
# Initialise sets for testing
defp s1 do
Enum.into(["a", "b", "c"], HashSet.new)
end
defp s2 do
Enum.into(["a", "b"], HashSet.new)
end
defp s3 do
Enum.into(["a", "c"], HashSet.new)
end
test "index test 1" do
result = Jaccard.index s1, s1
assert result == 3 / 3
end
test "index test 2" do
result = Jaccard.index s1, s2
assert result == 2 / 3
end
test "index test 3" do
result = Jaccard.index s2, s3
assert result == 1 / 3
end
test "distance 1" do
assert (Jaccard.distance s1, s1) == (1.0 - Jaccard.index s1, s1)
end
test "distance 2" do
assert (Jaccard.distance s1, s2) == (1.0 - Jaccard.index s1, s2)
end
test "distance 3" do
assert (Jaccard.distance s2, s3) == (1.0 - Jaccard.index s2, s3)
end
test "closest_to 1" do
assert Jaccard.closest_to(s2, [s3, s1]) == s1
end
test "closest_to 2" do
assert Jaccard.closest_to(s3, [s2, s1]) == s1
end
test "closest_to 3" do
assert Jaccard.closest_to(s1, [s3, s2, s1]) == s1
end
end
As well as our basic Jaccard module.
defmodule Jaccard do
def index(a, b) do
Set.intersection(a, b).size / Set.union(a, b).size
end
def distance(a, b) do
1.0 - index a, b
end
def closest_to(a, [head | tail]) do
_closest_to a, tail, head
end
defp _closest_to(a, [head | tail], selected) do
if (distance(a, selected) > distance(a, head)) do
_closest_to a, tail, head
else
_closest_to a, tail, selected
end
end
defp _closest_to(_, [], selected) do
selected
end
end
Writing code in a functional programming paradigm helps developers write code that is short, fast, and maintainable. In this example, we used pattern matching, list manipulation and recursion to compute our result.
Yes, it might be confusing at first. Although once you understand it, the imperative way looks clunky.
If you have never tried functional programming, give Elixir a go and it will change how you think about programming.
Published on 19 Sep 2014 by Stanley Tan