How to perform distributed combinatorial (N choose K) in Spark .NET?

Robert Hogue 96 Reputation points
2020-08-13T17:20:33.797+00:00

I have a project where I have a large C(100,20) number of combinations with minor work being done for each combination set.

I am using Spark .NET with visual studio as my technology (see setup below): https://video2.skills-academy.com/en-us/dotnet/spark/tutorials/get-started

Spark .NET has a dataframe with SQL type commands. I am assuming I need to do a SQL type command to create the N choose K combinations with a user defined worker function to process the combinations.

The question is what does the code look like using Spark .NET with a DataFrame? If a DataFrame doesn't support an N choose K option, are there other options to keep the generation of the combinations distributed?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,047 questions
{count} votes

Accepted answer
  1. Robert Hogue 96 Reputation points
    2020-08-14T23:42:41.49+00:00

    My basic question was answered from the spark dotnet github area

    https://github.com/dotnet/spark/issues/627

    By using a cross join on two dataframes, I was able to create the combinations. This may not be the best way, and perhaps others will follow up with a better solution.

    For N Choose K that would be K crossjoins using the N set.


0 additional answers

Sort by: Most helpful