Join operators are used to define relationships within sequences in LINQ queries. From a SQL and relational point of view, almost every query requires joining one or more tables. In LINQ, a set of join operators is defined to implement this behavior.
Join
The first operator of this group is of course the Join method, which is defined by the following signatures:
Join requires a set of four generic types. The T type represents the type of the outer source sequence, and the U type describes the type of the inner source sequence. The predicates outerKeySelector and innerKeySelector define how to extract the identifying keys from the outer and inner source sequence items, respectively. These keys are both of type K, and their equivalence defines the join condition. The resultSelector predicate defines what to project into the result sequence, which will be an implementation of IEnumerable<V>. V is the last generic type needed by the operator, and it defines the type of each single item in the join result sequence. The second overload of the method has an additional custom equality comparer, used to compare the keys. If the comparer argument is NULL or if the first overload of the method is invoked, a default key comparer (EqualityComparer<TKey>.Default) will be used.
Here is an example that will make the use of Join more clear. Think about our customers, with their orders and products. In Listing 4-19, a query joins orders with their corresponding products.
Listing 4-19: The Join operator used to map orders with products
var expr =
customers
.SelectMany(c => c.Orders)
.Join( products,
o => o.IdProduct,
p => p.IdProduct,
(o, p) => new {o.Month, o.Shipped, p.IdProduct, p.Price });
The following is the result of the query:
In this example, orders represents the outer sequence and products is the inner sequence. The o and p used in lambda expressions are of type Order and Product, respectively. Internally, the operator collects the elements of the inner sequence into a hash table, using their keys extracted with innerKeySelector. It then enumerates the outer sequence and maps its elements, based on the Key value extracted with outerKeySelector, to the hash table of items. Because of its implementation, the Join operator result sequence keeps the order of the outer sequence first, and then uses the order of the inner sequence for each outer sequence element.
From an SQL point of view, the example in Listing 4-19 can be thought of as an inner equijoin somewhat like the following SQL query:
If you want to translate the SQL syntax into the Join operator syntax, you can think about the columns selection in SQL as the resultSelector predicate, while the equality condition on IdProduct columns (of orders and products) corresponds to the pair of innerKeySelector and outerKeySelector predicates.
The Join operator has a corresponding LINQ syntax, which is shown in Listing 4-20.
Listing 4-20: The Join operator query expression syntax
var expr =
from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select new {o.Month, o.Shipped, p.IdProduct, p.Price };
|
Important |
The order of items to relate (o.IdProduct equals p.IdProduct) in LINQ query syntax must have the outer sequence first and the inner sequence after; otherwise, the LINQ query will not compile. This requirement is different from standard SQL queries, in which item ordering does not matter.
|
GroupJoin
In cases in which you need to define something similar to a LEFT OUTER JOIN or a RIGHT OUTER JOIN, you need to use the GroupJoin operator. Its signatures are quite similar to the Join operator:
The only difference is the definition of the resultSelector predicate. It requires an instance of IEnumerable<U>, instead of a single object of type U, because it projects a hierarchical result of type IEnumerable<V>, made of a selection of each item extracted from the inner sequence joined with a group of items, of type U, extracted from the outer sequence.
As a result of this behavior, the output is not a flattened outer equijoin, which would be produced by using the Join operator, but a hierarchical sequence of items. Nevertheless, you can define queries using GroupJoin with results equivalent to the Join operator, whenever the mapping is a one-to-one relationship. In case of the absence of a corresponding element group in the inner sequence, the GroupJoin operator extracts the outer sequence element paired with an empty sequence (Count = 0). In Listing 4-21, you can see an example of this operator.
Listing 4-21: The GroupJoin operator used to map products with orders, if present
var expr =
products
.GroupJoin(
customers.SelectMany(c => c.Orders),
p => p.IdProduct,
o => o.IdProduct,
(p, orders) => new { p.IdProduct, Orders = orders });
foreach(var item in expr) {
Console.WriteLine("Product: {0}", item.IdProduct);
foreach (var order in item.Orders) {
Console.WriteLine(" {0}", order); }}
The following is the result of Listing 4-21:
You can see that products 4 and 6 have no mapping orders, but the query returns them nonetheless. You can think about this operator like a SELECT … FOR XML AUTO query in Transact-SQL in Microsoft SQL Server 2000 and 2005. In fact, it returns results hierarchically grouped like a set of XML nodes nested within their parent nodes, similar to the default result of a FOR XML AUTO query.
In a query expression, the GroupJoin operator is defined as a join … into … clause. The query expression shown in Listing 4-22 is equivalent to Listing 4-21.
Listing 4-22: A query expression with a join into clause
var customersOrders =
from c in customers
from o in c.Orders
select o;
var expr =
from p in products
join o in customersOrders
on p.IdProduct equals o.IdProduct
into orders
select new { p.IdProduct, Orders = orders };
In this example, we first define an expression called customersOrders to extract the flat list of orders. (This expression still uses the SelectMany operator.) We could also define a single query expression, nesting the customersOrders expression within the main query. This approach is shown in Listing 4-23.
Listing 4-23: The query expression of Listing 4-22 in its compact version
var expr =
from p in products
join o in (
from c in customers
from o in c.Orders
select o
) on p.IdProduct equals o.IdProduct
into orders
select new { p.IdProduct, Orders = orders };