LINQ/Lambda Group by multiple properties (VB.NET)

Introduction

There are often cases when data needs to be grouped by multiple properties for reporting, this part of the series presents simple examples for grouping using both LINQ and lambda syntax style of writing group statements along with going from anonymous to strong typed result sets. More examples will be added over time from the existing examples in the GitHub repository which contains source for both this and part 1 of this series.

Many of the code samples presented are demonstrated using classes with mocked data or with Entity Framework 6 code first with an existing database. Data scripts are located in the root of the Visual Studio solution available in a GitHub repository with code samples for part 1 of the series. 

Part 1 of the series

Requirements 

Suggested reading

Query vs. Method Syntax in LINQ

There are two different types of syntax that you can use to query with LINQ: query syntax and method syntax. In the code samples both types are used, as one becomes familiar with with writing queries one style lends itself better than the other.

Group by one property and multiple properties

The difference between grouping one property verses multiple properties is specifying multiple properties in an anonymous array as shown in the examples below. In figure 1 (using LINQ), a single property defines the group while in figure 2 (using Lambda) a anonymous array specifies which properties to group by. 

The following screenshot shows two Lambda style queries, the first a single property, City while the second groups on City and Country.

Besides grouping a count is set as a property .PersonCount which is many time needed in reports. A similar requirement may be to not include groups with one object, in this case a person object which can be done with a where predicate e.g.

Dim groupResults = personList.
        GroupBy(Function(person) New  With {Key person.City, Key person.Country}).
        Where(Function(grp) grp.Count() > 1).ToList()

Then, as in the prior examples shown in the screenshots adding a select clause.

Dim groupResults = personList.GroupBy(Function(person) New  With {Key person.City, Key person.Country}).
    Where(Function(grp) grp.Count() > 1).
    Select(Function(grp) New  PersonGroup With  {
              .City = grp.Key.City,
              .Country = grp.Key.Country, .List = grp.ToList()
              }).ToList()

Additionally, the last query written in LINQ syntax which some developers favor although it may also be dependent on the actual query complexity.

Dim groupResults = (
        From person In  personList
        Group By personCityCountry = New  With {
            Key .City = person.City,
            Key .Country = person.Country} Into group = Group
        Where group.Count() > 1
        Select New  PersonGroup With  {
            .City = personCityCountry.City,
            .Country = personCityCountry.Country,
            .List = group.ToList()
        }).ToList()

Both of the above examples use mocked data, nothing read from a database or text file.

GroupJoin

A GroupJoin is in SQL what is called a “Left Outer JOIN” while a Join in SQL refer to “Inner Join” (see join definition). In short, a GroupJoin will do a link between 2 entities even if the right side of the link has nothing to link to. In contrast, the Join will link 2 entities only if both entities contain a link between them.

The GroupJoin operator performs the same task as Join operator except that GroupJoin returns a result in group based on specified group key. The GroupJoin operator joins two sequences based on key and groups the result by matching key and then returns the collection of grouped result and key. The following example performs a group join on customer/order relations by customer identifier which has been setup in the SQL-Server database.

Results are placed into the following container.

Public Class  CustomerOrder
    Public Property  Customer As  Customer
    Public Property  Order As  IEnumerable(Of Order)
End Class

Code

Public Function  GroupJoinCustomersWithCompanyNameStartsWith(
    context As  NorthWindContext,
    startsWithValue As  String) As List(Of CustomerOrder)
 
    context.Configuration.LazyLoadingEnabled = True
 
    '
    ' Note to use Include the following Import is needed
    '   Imports System.Data.Entity 
    '
    Dim results As List(Of CustomerOrder) = context.CustomerIncludes().
            Where(Function(customer) customer.CompanyName.StartsWith(startsWithValue)).
            GroupJoin(context.Orders, Function(c) c.CustomerIdentifier,
                      Function(order) order.CustomerIdentifier,
                      Function(customer, order) New  With {
                         Key .Customer = customer,
                         Key .Order = order}).
            Select(Function(customerData) New  CustomerOrder With  {
                      .Customer = customerData.Customer,
                      .Order = customerData.Order}).
            ToList()
 
    Return results
 
End Function

Using the following to run.

Imports NorthWindEntityLibrary
Imports NorthWindEntityLibrary.Classes
Imports NorthWindEntityLibrary.Models
 
Module Module1
    Private operation As New  NorthOperations
    Sub Main()
 
        Dim standardColor = Console.ForegroundColor
 
        Using context = New  NorthWindContext
 
            Dim results As List(Of CustomerOrder) =
                    operation.GroupJoinCustomersWithCompanyNameStartsWith(context, "A")
 
            For Each  current As  CustomerOrder In  results
 
                Console.ForegroundColor = ConsoleColor.Green
                Console.WriteLine($"{current.Customer.CompanyName} - {current.Customer.Contact.FullName}")
                Console.ForegroundColor = standardColor
 
                For Each  order As  Order In  current.Order
 
                    Console.ForegroundColor = ConsoleColor.White
                    Console.WriteLine($"   {order.OrderID}, {order.OrderDate.Value.ToShortDateString()}")
                    Console.ForegroundColor = standardColor
 
                    For Each  orderDetail As  OrderDetail In  order.OrderDetails
                        Console.WriteLine($"      {orderDetail.Product.ProductName}, {orderDetail.Quantity}")
                    Next
                Next
 
                Console.WriteLine()
 
            Next
 
        End Using
 
    End Sub
 
End Module

Results

Working with readonly properties

There is a possibility that Entity Framework will throw an exception because the readonly property is not part of the model, in these cases an Expression builder may be needed or use .ToList off the DbSet the query is executed againsts. This will not happen when working with a Plain Old CLR Objects (POCO) class.

Statistics

If statistics are required e.g. minimum, maximum, count, sum etc then setup predicates in the GroupBy, in the following example min, max and count are returned by age of a person. The following example reads data from a SQL-Server database using Entity Framework 6.

Sub MinMaxCountForAgeProperty()
    Using context As  New PeopleContext
        Dim people = context.People.ToList()
        Dim query = people.GroupBy(
            Function(person) Math.Floor(person.Age),
            Function(person) person.Age,
            Function(baseAge, ages) New  With {
                      .Key = baseAge,
                      .Count = ages.Count(),
                      .Min = ages.Min(),
                      .Max = ages.Max()}
            )
 
 
        Dim output As New  System.Text.StringBuilder
        ' Iterate over each anonymous type.
        For Each  result In  query
            output.AppendLine(vbCrLf & "Age group: "  & result.Key)
            output.AppendLine("Number of people in this age group: " & result.Count)
            output.AppendLine("Minimum age: " & result.Min)
            output.AppendLine("Maximum age: " & result.Max)
        Next
 
        ' Display the output.
        Console.WriteLine(output.ToString)
    End Using
 
End Sub

Multiple tables multiple property grouping

Using the what has been presented so far the following performs a group by on three tables, Customer, Orders and Order Details. The new aspect is using three From clauses, one for each table.

To ensure there are proper relations e.g. order for each customer and customer for an order in the From a Where condition uses DefaultIfEmpty method.

Important: In the group sum there can be no null values on either (in this case) UnitPrice or Quantity properties. Since Entity Framework code first is used in the provided database, Order details table its possible to have a null value for UnitPrice so the the type was changed from Decimal to a nullable Decimal e.g. Decimal?

The query will look extremely complex (do not worry there is another one more complex than this) if time has not been spent learning the basics and practicing. It's extremely important as a query becomes complex to have decent names for property names and variables within a query as even an hour later the query can be difficult to figure out let along months later.

Public Sub  ThreeTablesGrouping(north As NorthWindContext)
 
    Dim query = From order In north.Orders
                From customer In  north.Customers.
                    Where(Function(c) CBool(c.CustomerIdentifier = order.CustomerIdentifier)).DefaultIfEmpty()
                From orderDetail In  north.OrderDetails.Where(Function(d) d.OrderID = order.OrderID).DefaultIfEmpty()
                Group New  With {
                Key .order = order,
                Key .customer = customer,
                Key .details = orderDetail
                } By GroupKey = New  With {Key order.OrderDate.Value.Year, Key customer.CompanyName
                } Into group = Group
                Select New  With {
                    Key .Company = GroupKey.CompanyName,
                    Key .OrderYear = GroupKey.Year,
                    Key .Amount = group.Sum(Function(e) e.details.UnitPrice * e.details.Quantity)
                }
 
    Dim queryResults = query.ToList()
    Console.WriteLine()
 
End Sub

As with any anonymous result set this one can be done strongly typed.

IEqualityComparer / IEnumerable(Of IGrouping(Of String, Product))

When a distinct results is needed for grouping, IEqualityComparer(Of In T) Interface against a list can solve this. The following example has a list of products which has duplicate entries by Make and Model.

Namespace Classes
 
    Public Class  Products
        Public Shared  Function List() As List(Of Product)
            Dim products As New  List(Of Product) From {
                New Product With {.Id = 1, .Make = "Samsung", .Model = "Galaxy S3"},
                New Product With {.Id = 2, .Make = "Samsung", .Model = "Galaxy S4"},
                New Product With {.Id = 3, .Make = "Samsung", .Model = "Galaxy S5"},
                New Product With {.Id = 4, .Make = "Apple", .Model = "iPhone 5"},
                New Product With {.Id = 5, .Make = "Apple", .Model = "iPhone 6"},
                New Product With {.Id = 6, .Make = "Apple", .Model = "iPhone 6"},
                New Product With {.Id = 7, .Make = "Apple", .Model = "iPhone 6"},
                New Product With {.Id = 8, .Make = "HTC", .Model = "Sensation"},
                New Product With {.Id = 9, .Make = "HTC", .Model = "Desire"},
                New Product With {.Id = 11, .Make = "HTC", .Model = "Desire"},
                New Product With {.Id = 12, .Make = "Nokia", .Model = "Lumia 735"},
                New Product With {.Id = 13, .Make = "Nokia", .Model = "Lumia 930"},
                New Product With {.Id = 14, .Make = "Nokia", .Model = "Lumia 930"},
                New Product With {.Id = 15, .Make = "Sony", .Model = "Xperia Z3"}
            }
 
            Return products
 
        End Function
 
    End Class
End Namespace

ProductComparer class provides methods that carry out value comparisons which allow greater control of results in a consistent manner.

Namespace Classes
    ''' <summary>
    ''' Comparer on Make and Model properties
    ''' </summary>
    Public Class  ProductComparer
        Implements IEqualityComparer(Of Product)
 
        Public Shadows  Function Equals(p1 As Product, p2 As Product) As Boolean  _
            Implements IEqualityComparer(Of Product).Equals
 
            If ReferenceEquals(p1, p2) Then
                Return True
            End If
 
            If ReferenceEquals(p1, Nothing) OrElse  ReferenceEquals(p2, Nothing) Then
                Return False
            End If
 
            Return p1.Make = p2.Make AndAlso p1.Model = p2.Model
 
        End Function
        Public Shadows  Function GetHashCode(product As Product) As Integer  _
            Implements IEqualityComparer(Of Product).GetHashCode
 
            If ReferenceEquals(product, Nothing) Then
                Return 0
            End If
 
            Dim hashProductName As Integer  = If(product.Make Is  Nothing, 0, product.Make.GetHashCode())
            Dim hashProductCode As Integer  = product.Model.GetHashCode()
 
            Return hashProductName Xor hashProductCode
 
        End Function
    End Class
End Namespace

In the following code the base source list is presented followed by results from the comparer then a lambda group by is used to group products.

Module Module1
    ''' <summary>
    ''' Example to first get distinct products in one query
    ''' followed by grouping on make property in a second query.
    ''' </summary>
    Sub Main()
        Dim productList = Products.List()
 
        Console.WriteLine("Original list")
 
        productList.ForEach(Sub(product) Console.WriteLine(product))
 
        Console.WriteLine()
        Console.WriteLine("ProductComparer results")
 
        Dim productsQuery As IEnumerable(Of Product) =
                productList.Distinct(New ProductComparer).
                OrderBy(Function(product) product.Make)
 
 
        For Each  product As  Product In  productsQuery
            Console.WriteLine(product)
        Next
 
        Console.WriteLine()
        Console.WriteLine("Group by make")
 
        Dim grouped As IEnumerable(Of IGrouping(Of String, Product)) =
                productsQuery.GroupBy(Function(product) product.Make)
 
        For Each  grouping As  IGrouping(Of String, Product) In  grouped
            Console.WriteLine(grouping.Key)
            For Each  product As  Product In  grouping
                Console.WriteLine($"   {product}")
            Next
        Next
 
        Console.ReadLine()
 
    End Sub
 
End Module

Results

Extremely complex group by queries

Extremely complex grouping for business requirements can also be done once the basics have been mastered and there are other options dependent on willingness to think outside of LINQ and Lambda, especially when the work can be performed with conventional SQL.

Example, a database table is know to have duplicate rows of data where the duplicates are by multiple columns in a table which requires a business user to make decisions on which data is valid and which data is not valid along with the possibilities to merge data then delete extra rows.

In the following example a customer table has duplicates based on the following columns, CompanyName, ContactName, ContactTitle, Address, City and postal code. To get duplicates the following SELECT will work were the data may be stored in a DataTable or List(Of T).

Note that in a proper database schema there would be 

  • Proper constraints to guard against duplicates
  • Unit testing done on applications to interact with the database
  • The data should be normalized, it's not to keep the exercise easy to follow.
SELECT A.*
FROM Customers A
INNER JOIN
    (
    SELECT
        CompanyName,
        ContactName,
        ContactTitle,
        Address,
        City,
        PostalCode
    FROM
        Customers
    GROUP BY
        CompanyName,
        ContactName,
        ContactTitle,
        Address,
        City,
        PostalCode
    HAVING COUNT(*) > 1
    ) B
ON
A.CompanyName = B.CompanyName AND
A.ContactName = B.ContactName AND
A.ContactTitle = B.ContactTitle AND
A.Address = B.Address AND
A.City = B.City AND
A.PostalCode = B.PostalCode
ORDER BY
    A.CompanyName

Data presented (less controls to perform deletes and merging).

To perform the same operation with grouping is a two step process to keep code easy to read.

Using a DataTable for returning results to a control e.g. DataGridView in this case.

Step 1
Read data from the database followed by grouping data on the exact same column as per above. The variable duplicates is an anonymous type.

Step 2
Iterate results from step 1 into a pre defined DataTable.

Returning duplicates in a List of a class which inherits from a base class to represent a single customer.

Namespace Classes
    ''' <summary>
    ''' Provides a column to mark a row in a DataGridView
    ''' to act upon
    ''' </summary>
    Public Class  CustomerRigger
        Inherits Customer
 
        Public Property  Process() As  Boolean
 
    End Class
End Namespace

Customer class

Namespace Classes
    Public Class  Customer
        Public Property  Identifier As  Integer
        Public Property  CompanyName As  String
        Public Property  ContactName As  String
        Public Property  ContactTitle As  String
        Public Property  Address As  String
        Public Property  City As  String
        Public Property  PostalCode As  String
        Public Property  Exists As  Boolean
        Public ReadOnly  Property ItemArray() As String()
            Get
                Return {CStr(Identifier).PadLeft(3, "0"c), CompanyName, ContactName, ContactTitle, Address, City, PostalCode}
            End Get
        End Property
        Public Overrides  Function ToString() As String
            Return $"'{CompanyName}' '{ContactName}' '{ContactTitle}' '{Address}' '{City}' '{PostalCode}'"
        End Function
    End Class
End Namespace

The following code differs from the first example with a DataTable is to use a LINQ statement to create the strong type results.

Public Function  GetCustomerDuplicatesAsList() As List(Of CustomerRigger)
    Dim dataOperations As New  DataOperations
 
    Dim dt As DataTable = dataOperations.ReadCustomersFromDatabase()
 
    Dim duplicates As IEnumerable(Of IEnumerable(Of CustomerRigger)) = From dataRow In  dt.AsEnumerable() Select item = New With  {
            Key .Identifier = dataRow.Field(Of Integer)("Identifier"),
            Key .CompanyName = dataRow.Field(Of String)("CompanyName"),
            Key .ContactName = dataRow.Field(Of String)("ContactName"),
            Key .ContactTitle = dataRow.Field(Of String)("ContactTitle"),
            Key .Street = dataRow.Field(Of String)("Address"),
            Key .City = dataRow.Field(Of String)("City"),
            Key .PostalCode = dataRow.Field(Of String)("PostalCode")}
group temp = item By Key = New  With {
                Key .CompanyName = item.CompanyName,
                Key .ContactName = item.ContactName,
                Key .ContactTitle = item.ContactTitle}
            Into Group Where Group.Count() > 1 Select  Group.Select(Function(g) New  CustomerRigger With  {
                                     .Identifier = g.Identifier,
                                     .CompanyName = g.CompanyName,
                                     .ContactName = g.ContactName,
                                     .ContactTitle = g.ContactTitle,
                                     .Address = g.Street,
                                     .City = g.City,
                                     .PostalCode = g.PostalCode
                              })
 
 
    Return (From item In duplicates From row In item Select row).ToList()
 
End Function

Special note (from Microsoft documentation)
The Key keyword enables you to specify behavior for properties of anonymous types. Only properties you designate as key properties participate in tests of equality between anonymous type instances, or calculation of hash code values. The values of key properties cannot be changed.

Tips

Analysers recommendations

When a piece of code is greyed out this is an indication that the code is never coded or utilized. Most of the time this is true although there are some cases Visual Studio does not analyze code correctly. This may be frustrating when writing Group By logic. In the following snippet note the shadow properties which works as expected.

Removing the anonymous properties

Clearly Visual Studio did not make the correct recommendation so when using standard analysers and third analysers don't always trust their recommendations.

Code folding

Code folding is taking a single line of code and "folding" a long line.

Without folding as shown in figure 1 below is inherently harder to read then in figure 2. Consider writing figure 1 today then need to modify figure 1 six or more months down the road verses the code in figure 2. 

Figure 1

Dim groupResults = (
        From person In  personList Group By personCityCountry = New With  {Key .City = person.City, Key .Country = person.Country} Into group = Group
        Where group.Count() > 1 Select  New PersonGroup With {.City = personCityCountry.City, .Country = personCityCountry.Country, .List = group.ToList()}).ToList()

Figure 2 

Dim groupResults = (
        From person In  personList
        Group By personCityCountry = New  With {
            Key .City = person.City,
            Key .Country = person.Country} Into group = Group
        Where group.Count() > 1
        Select New  PersonGroup With  {
            .City = personCityCountry.City,
            .Country = personCityCountry.Country,
            .List = group.ToList()
        }).ToList()

Know complete results of queries

A common mistake when writing any code which interacts with any data is knowing the results beforehand. When working with LINQ and Lambda against a database first write SQL statements, review the results then write the code quires, run the queries in a unit test method and validate results against SQL. By first writing SQL a deeper knowledge of data will be when there are unexpected results in code. This also ties into optimization of queries, write a bad SQL statement is no different than writing a bad Entity Framework query that when viewing a statement looks great yet Entity Framework takes the code and writes good or bad underlying SQL dependent on a developers understanding of their data. 

Debugging

Take time to learn how to properly debug code which includes using Visual Studio breakpoints, breakpoint conditions, data tipslocal window, watch and quick watch windows. Depending on the edition of Visual Studio include learning to work with IntelliTrace which should only be turned on when needed.

Summary

Basic to advance grouping methods have been presented although not ever possible situation have been covered. By using what has been presented in this and part 1 of the series provides what is needed to write statements for most situations. One thing to be cognitive of is that if a query is written poorly the performance will suffer. There are countless post on the Internet about grouping poor performance where 95 percent of the time the query needs optimization and since each query is different so will be what needs to be done performance wise. Learn to use IntelliTrace in a debug session to assist with optimizing queries.

See also

Introduction to Grouping with LINQ/Lambda (VB.NET)

Source code

The following GitHub repository  contains all code samples presented along with extra code samples all guaranteed to work.