Generating Combinations

Overview

This article presents a class which allows a DotNet application to process all possible combinations of items selected from a set. The class is coded in Visual Basic, but it can of course be easily translated into C# if desired. The class also demonstrates the use of Iterator Functions and implementation of the IEnumerable(Of T) interface (actually the derived interface “IReadOnlyCollection(Of T)”).

Introduction

I have an application that needs to test all possible combinations of a set of numbers, and I decided to write a class that would do this.

For those who don’t remember them from maths class, combinations are the different ways you can pick a given number of items from a larger set of items. The order that the items are picked doesn’t matter (if it did matter, we would be talking about permutations rather than combinations). There are two types of combination:

  1. Where each item can only be picked once. This is like picking numbered balls for a lottery drawing. You might have 50 balls numbered 1 through 50; you pick out 6 balls and they will each have a different number.
  2. Where each item can be picked multiple times. You might have 16 teams in a football league; you ask 6 people which team they support and you may find that more than one person supports a particular team.

There is more maths to cover when we look at calculating how many possible combinations there are, but I’ll leave that for later.

Framing the problem

So how should I frame the problem in a way that a class can solve it? I need the user of the class to provide:

  • The number of items to pick (an Integer)
  • The number of items to pick from (an Integer)
  • Whether repetitions are allowed (a Boolean)

I can have the user supply these values to the constructor (Sub New) of the class and then save them as class level fields. Here is the class with its constructor:

Public Class  Combinations
  Private choose, total As Integer, repetition As  Boolean
 
  Sub New(pick As  Integer, pickFrom As Integer, allowRepetition As  Boolean)
    If pick <= 0 Then Throw  New ArgumentException( _
      "Number of items to pick must be a positive integer", "pick")
    If pickFrom <= 0 Then Throw  New ArgumentException( _
      "Number of items to pick from must be a positive integer", "pickFrom")
    If Not  allowRepetition And  pick > pickFrom Then  Throw New  ArgumentException( _
      "Number of items to pick must not exceed the number of items to pick from", "pick")
 
    repetition = allowRepetition
    choose = pick
    total = pickFrom
  End Sub
End Class

Note that the constructor validates the two integers. They must both be greater than zero, and if repetition is not allowed, the number of items to pick can’t be greater than the number of items to pick from. If those conditions aren't met, an ArgumentException is thrown.

Presenting the result

A combination can be presented as an array of Integer. If I am picking three items from a set of five then the array will have three elements with each element being a number from zero through four (five possible values). What if the items I want to pick from are not a set of consecutive numbers starting at zero? No problem, I can always store the items to pick from in an array or a list; each item can be referred to using its index and the index numbers are a set of consecutive numbers starting at zero.

I need to return not just one combination, but all possible combinations. I could create a list or an array of arrays to do that, but an alternative approach is suggested by how the class will be used. The user needs to process all possible combinations and so is almost certain to use a For loop to iterate through the combinations. Any class that implements the IEnumerable interface can be used in a For Each loop, so my first thought was to implement IEnumerable(Of Integer()).

After studying the interfaces in System.Collections.Generic I decided that ReadOnlyCollection(Of Integer()) was more appropriate. It is derived from IEnumerable(Of T) and it adds a Count property to return the number of items in the collection. It is possible to calculate the number of possible combinations, and it may be useful to provide a Count property to return that value. There is an extension method of IEnumerable called Count that does the same thing, but it does so by iterating through the entire collection to count the number of items and that could take much longer than just calculating it.

Note that IReadOnlyCollection(Of T) was added in version 4.5 of the .Net Framework. Projects targeting earlier versions (version 2.0 or later) can use IEnumerable(Of T) instead. The only difference is that the Count property will not be automatically added to the class and must be added manually (without the Implements IReadOnlyCollection(Of Integer()).Count).

If I add the statement “Implements IReadOnlyCollection(Of Integer())” immediately after my Class statement, Visual Studio adds three new methods to the class so that it now looks like this:

Public Class  Combinations
  Implements IReadOnlyCollection(Of Integer())
 
  Private choose, total As Integer, repetition As  Boolean
 
  Sub New(pick As  Integer, pickFrom As Integer, allowRepetition As  Boolean)
    If pick <= 0 Then Throw  New ArgumentException( _
      "Number of items to pick must be a positive integer", "pick")
    If pickFrom <= 0 Then Throw  New ArgumentException( _
      "Number of items to pick from must be a positive integer", "pickFrom")
    If Not  allowRepetition And  pick > pickFrom Then  Throw New  ArgumentException( _
      "Number of items to pick must not exceed the number of items to pick from", "pick")
 
    repetition = allowRepetition
    choose = pick
    total = pickFrom
  End Sub
 
  Public Function  GetEnumerator() As  IEnumerator(Of Integer()) _
    Implements IEnumerable(Of Integer()).GetEnumerator
  End Function
 
  Public ReadOnly  Property Count As Integer  Implements IReadOnlyCollection(Of Integer()).Count
    Get
    End Get
  End Property
 
  Public Function  GetEnumerator1() As  IEnumerator Implements IEnumerable.GetEnumerator
  End Function
End Class

Adding the code

The three methods that were added when I Implemented IReadOnlyCollection(Of Integer()) are the three methods that the IReadOnlyCollection interface requires. I’ll cover each one in turn.

The Count property

This returns an integer that contains the total number of items in the collection, in this case the total number of combinations. These numbers can get very big, so it is quite possible that the number will be larger than the maximum value of an Integer (about 2 billion). I would like to be able to return the number of items in a Long, but Count must return an Integer. The way I handle this is to have a LongCount property that returns the number as a Long as well as a Count property that returns the number as an Integer (or throws an exception if the number is too large).

The formula for the number of combinations of r items chosen without repetition from a set of n items is

n! / (r! * (n-r)!)

The “!” indicates a factorial (3! = 1 * 2 * 3 = 6; 5! = 1 * 2 * 3 * 4 * 5 = 120). Factorials can get very large very quickly; 20! Is that largest factorial that will fit in a Long and 22! Is the largest factorial that can fit in a Double without losing precision. Fortunately it is possible to simplify the calculation to something that is easier to code:

(r+1) \ 1 * (r+2) \ 2 * (r+3) \ 3 ..……. n \ (n-r)

The formula for the number of combinations of r items chosen with repetition from a set of n items is

(n+r-1)! / (r! * (n-1)!)

This can be simplified for coding purposes to:

(r+1) \ 1 * (r+2) \ 2 * (r+3) \ 3 ..……. (n+r-1) \ (n-1)

The interesting thing about these sequences is that there is never a remainder in any of the division operations, so I can use integer division (the “\ operator) without losing precision. Here is the code for the Count and LongCount properties.

  Public ReadOnly  Property Count As Integer  Implements IReadOnlyCollection(Of Integer).Count
    Get
      Return CInt(LongCount)
    End Get
  End Property
 
  Public ReadOnly  Property LongCount As Long
    Get
      Dim result As Long  = 1
      For iter As Long  = 1 To  total - If(repetition, 1, choose)
        result *= choose + iter
        result \= iter
      Next
      Return result
    End Get
  End Property

It is possible to further optimise this code. If you write out the calculations that are being done, you will see that there can be long sequences of number that appear in both the numerator and the denominator and those sequences could be skipped to save time. I have chosen not to do so in order to avoid making the code more complex. It is likely that the optimisation would save at most a few milliseconds on an operation that will typically do done only once.

The GetEnumerator1 function

In addition to the generic function IEnumerable(Of T).GetEnumerator, we are also required (for compatibility) to implement the non-generic function IEnumerable.GetEnumerator. Visual Studio added GetEnumerator1 to implement the non-generic version, and all we need to do is have it call the generic version.

  Private Function  GetEnumerator1() As  IEnumerator Implements IEnumerable.GetEnumerator
    Return GetEnumerator()
  End Function

Note that as I have no need to ever call this function, I have changed it from Public to Private.

The GetEnumerator function

This is where the real work is done. GetEnumerator must return an object that implements IEnumerator and is able to sequentially select each item in the collection. I don’t need to create my own class that implements IEnumerator, there are two easy ways to create such an object.

  1. Create every item in the collection and store them in an array (or anything that implements IEnumerable), then call GetEnumerator on the array. I want to avoid trying to store the entire collection in memory as it might be too big, so I chose not to use this method.
  2. Use an Iterator function. This is a special type of function used for iterating through a collection. Each time you call it, it returns the next item. The Yield statement is used to return the item and save the current state of the function; the next time the function is called, processing resumes immediately after the Yield statement. When the function exits, that signals that the iteration is complete. This is the method I have used, as it only requires one item to be in memory at a time.

Here is the code for GetEnumerator:

  Public Iterator Function GetEnumerator() As IEnumerator(Of Integer()) _
    Implements IEnumerable(Of Integer()).GetEnumerator
 
    Dim value(choose - 1) As Integer
    If Not  repetition Then  value = Enumerable.Range(0, choose).ToArray
    Dim index As Integer  = choose - 1
 
    Do
      Yield value
      'Exit if iteration is complete
      If value(0) = total - If(repetition, 1, choose) Then  Exit Do
 
      'Find the rightmost element that can be incremented and increment it
      Dim oldIndex As Integer  = index
      Do While  value(index) = total - If(repetition, 1, choose - index)
        index -= 1
      Loop 
      value(index) += 1
 
      'If index changed, reset all elements to the right of the new index
      If index <> oldIndex Then
        Do
          index += 1
          value(index) = value(index - 1) + If(repetition, 0, 1)
        Loop Until  index = choose - 1
      End If
    Loop
  End Function

The local variable “value” is the current item, it is an array of Integer with one element for each number that is to be picked. If repetition is allowed, value starts out with all elements equal to zero. If repetition is not allowed, value starts out with elements in the sequence 0, 1, 2, 3,…. (provided by the Enumerable.Range method). The local variable “index” is the index (into the value array) of the element that is to be changed next; it is initialised to the last element.

I am going to provide the items in numerical sequence. For example, when choosing three items from a set of four without repetition, I will return:

0, 1, 2
0, 1, 3
0, 2, 3
1, 2, 3

And when choosing two items from a set of three with repetition, I will return:

0, 0
0, 1
0, 2
1, 1
1, 2
2, 2

The rest of the function is an infinite loop (there is an Exit Do statement to provide a way out) and the first thing the loop does is to yield the current “value”. The Iterator function pauses at this point and continues at the next statement when the next value is required.

The statement after the Yield checks whether the last item has been returned and if so it exits the main loop and the iteration is compete. When using repetition, the last item is the only one where the first element of the array is the maximum possible number (total – 1). When not using repetition, the last item is the only one where the first element of the array is the difference between the number of items to select from and the number of items to be selected (total – choose).

The next paragraph finds the rightmost element that is not at its maximum value and increments it. When repetition is allowed, the maximum value is (total – 1), when repetition is not allowed, the maximum value is (total -1) for the rightmost element, (total – 2) for the second from the right, and so on.

The final paragraph checks if the current index was changed (because the current element had reached its maximum value) and if so resets all element to the right of the new index. When repetition is allowed, the elements are reset to the same value as the current one, When repetition is not allowed, each element is one more than the one on its left.

Note that although the Count and LongCount properties can’t return numbers higher than Integer.MaxValue and Long.MaxValue respectively, there is no limit on the total number of items that can be returned (other than the fact that very large numbers will take a long time to process).

The complete code

Here is the complete code for the class.

Option Explicit On
Option Infer Off
Option Strict On
 
''' <summary>Represents a set of possible combinations of integers</summary>
Public Class  Combinations
  Implements IReadOnlyCollection(Of Integer())
 
  Private choose, total As Integer, repetition As  Boolean
 
  ''' <summary>Create a new Combinations object</summary>
  ''' <param name="pick">Number of items to be picked</param>
  ''' <param name="pickfrom">Number of items to pick from (0 to pick - 1)</param>
  ''' <param name="allowRepetition">True if items may be picked more than once</param>
  Sub New(pick As  Integer, pickFrom As Integer, allowRepetition As  Boolean)
    If pick <= 0 Then Throw  New ArgumentException( _
      "Number of items to pick must be a positive integer", "pick")
    If pickFrom <= 0 Then Throw  New ArgumentException( _
      "Number of items to pick from must be a positive integer", "pickFrom")
    If Not  allowRepetition And  pick > pickFrom Then  Throw New  ArgumentException( _
      "Number of items to pick must not exceed the number of items to pick from", "pick")
 
    repetition = allowRepetition
    choose = pick
    total = pickFrom
  End Sub
 
  ''' <summary>Returns the number of possible values</summary>
  ''' <returns>Calculated number of possible values</returns>
  ''' <exception cref="OverflowException">The number of possible values exceeds Integer.MaxValue</exception>
  Public ReadOnly  Property Count As Integer  Implements IReadOnlyCollection(Of Integer()).Count
    Get
      Return CInt(LongCount)
    End Get
  End Property
 
  ''' <summary>Returns the number of possible values</summary>
  ''' <returns>Calculated number of possible values</returns>
  ''' <exception cref="OverflowException">The number of possible values exceeds or approaches Long.MaxValue</exception>
  ''' <remarks>This replaces the LongCount extension method of IEnumerable which iterates through the entire collection</remarks>
  Public ReadOnly  Property LongCount As Long
    Get
      Dim result As Long  = 1
      For iter As Long  = 1 To  total - If(repetition, 1, choose)
        result *= choose + iter
        result \= iter
      Next
      Return result
    End Get
  End Property
 
  ''' <summary>Gets the enumerator that can be used to select each possible value</summary>
  Public Iterator Function GetEnumerator() As IEnumerator(Of Integer()) _
    Implements IEnumerable(Of Integer()).GetEnumerator
 
    Dim value(choose - 1) As Integer
    If Not  repetition Then  value = Enumerable.Range(0, choose).ToArray
    Dim index As Integer  = choose - 1
 
    Do
      Yield value
      'Exit if iteration is complete
      If value(0) = total - If(repetition, 1, choose) Then  Exit Do
 
      'Find the rightmost element that can be incremented and increment it
      Dim oldIndex As Integer  = index
      Do While  value(index) = total - If(repetition, 1, choose - index)
        index -= 1
      Loop
      value(index) += 1
 
      'If index changed, reset all elements to the right of the new index
      If index <> oldIndex Then
        Do
          index += 1
          value(index) = value(index - 1) + If(repetition, 0, 1)
        Loop Until  index = choose - 1
      End If
    Loop
  End Function
 
  ''' <summary>Non-generic method required for compatibility</summary>
  Private Function  GetEnumerator1() As  IEnumerator Implements IEnumerable.GetEnumerator
    Return GetEnumerator()
  End Function
End Class

Example of use

Here is some code that illustrates how the Combinations class might be used. It simply displays all the possible combinations of three colours chosen without repetition from a palette of Red, Blue, Black, Pink and Green.

Dim colours As New  List(Of String) From {"Red", "Blue",  "Black", "Pink", "Green"}
Dim display As New  List(Of String)
 
For Each  combo In  New Combinations(3, 5, False)
  Dim line(combo.Length - 1) As String
  For i As Integer  = 0 To  combo.Length - 1
    line(i) = colours(combo(i))
  Next
  display.Add(String.join(", ", line))
Next
 
MessageBox.Show(String.Join(vbCrLf, display))

The resulting MessageBox displays:

Red, Blue, Black

Red, Blue, Pink

Red, Blue, Green

Red, Black, Pink

Red, Black, Green

Red, Pink, Green

Blue Black, Pink

Blue, Black, Green

Blue Pink, Green

Black, Pink, Green