Introduce DataBase,Asp.net,JavaScript,Xml,Html,Css,Sql,Php,ASP.NET Controls,AJAX,Tools,HTML,CSS,JavaScript,Open Source Project,WPF,.Net Framework,Linq
Top Recommended Hosting

Linq Query Operators(2)

by the3factory 3/13/2008 7:25:00 AM

Grouping Operators

Now you have seen how to select, filter, and order sequences of items. Sometimes when querying contents, you also need to group results based on specific criteria. To realize content groupings, you use a grouping operator.

The GroupBy operator, also called a grouping operator, is the only operator of this family and provides the following overloads:

public static IEnumerable<IGrouping<K, T>> GroupBy<T, K>(
this IEnumerable<T> source, Func<T, K> keySelector);
public static IEnumerable<IGrouping<K, T>> GroupBy<T, K>(
this IEnumerable<T> source, Func<T, K> keySelector,
IEqualityComparer<K> comparer);
public static IEnumerable<IGrouping<K, E>> GroupBy<T, K, E>(
this IEnumerable<T> source, Func<T, K> keySelector,
Func<T, E> elementSelector);
public static IEnumerable<IGrouping<K, E>> GroupBy<T, K, E>(
this IEnumerable<T> source, Func<T, K> keySelector,
Func<T, E> elementSelector, IEqualityComparer<K> comparer);

All of these overloads return IEnumerable<IGrouping<K, T>>, where the IGrouping<K, T> generic interface is a specialized implementation of IEnumerable<T>. This implementation can return a specific Key of type K for each item within the enumeration:

public interface IGrouping<K, T> : IEnumerable<T> {
K Key { get; }
}

From a practical point of view, a type that implements this generic interface is simply a typed enumeration with an identifying type Key for each item. All the GroupBy methods work on a source sequence as usual, and they call the keySelector function to extract the Key value from each item to group results based on the different Key values. The elementSelector argument, if present, defines a function that maps the source element within the source sequence to the destination element of the resulting sequence. If you do not specify the elementSelector, elements are mapped directly from the source to the destination. (You will see an example of this later in the chapter, in Listing 4-18.)

The GroupBy method selects pairs of keys and items for each item in source, using the keySelector predicate and, if present, the elementSelector argument. Then it yields a sequence of IGrouping<K, T> objects, where each group consists of a sequence of items with a common Key value. The last optional argument you can pass to the method is a custom comparer, which is useful when you need to compare key values and define group membership. If no custom comparer is provided, the EqualityComparer<K>.Default is used. The order of keys and items within each group corresponds to their occurrence within the source. Listing 4-16 shows an example of using the GroupBy operator.

Listing 4-16: The GroupBy operator used to group customers by Country
Image from book

var expr = customers.GroupBy(c => c.Country);
foreach(IGrouping<Countries, Customer> customerGroup in expr) {
Console.WriteLine("Country: {0}", customerGroup.Key);
foreach(var item in customerGroup) {
Console.WriteLine(item);
}
}
Image from book

As Listing 4-16 shows, you need to enumerate all group keys before iterating over the items contained within each group. Every group is an instance of a type that implements IGrouping<Countries, Customer>, because we are using the default elementSelector that directly projects the source Customer instances into the result. In query expressions, the GroupBy operator can be defined using the group by syntax, which is shown in Listing 4-17.

Listing 4-17: A query expression with a group by syntax
Image from book

var expr =
from  c in customers
    group c by c.Country;
foreach(IGrouping<Countries, Customer> customerGroup in expr) {
Console.WriteLine("Country: {0}", customerGroup.Key);
foreach(var item in customerGroup) {
Console.WriteLine(item);
}
}
Image from book

The code defined in Listing 4-17 is semantically equivalent to the code shown in Listing 4-16.

Listing 4-18 is another example of grouping, this time with a custom elementSelector.

Listing 4-18: The GroupBy operator used to group customer names by Country
Image from book

var expr =
customers
    .GroupBy(c => c.Country, c => c.Name);
foreach(IGrouping<Countries, string> customerGroup in expr) {
Console.WriteLine("Country: {0}", customerGroup.Key);
foreach(var item in customerGroup) {
Console.WriteLine("  {0}", item);
}
}
Image from book

Here is the result of this code:

Country: Italy
Paolo
Marco
Country: USA
James
Frank

In this last example, the result is a class that implements IGrouping<Countries, string>, because the elementSelector predicate projects only the customers’ names (of type string) into the output sequence.

Join Operators

Join operators are used to define relationships within sequences in LINQ queries. From a SQL and relational point of view, almost every query requires joining one or more tables. In LINQ, a set of join operators is defined to implement this behavior.

Join

The first operator of this group is of course the Join method, which is defined by the following signatures:

public static IEnumerable<V> Join<T, U, K, V>(
this IEnumerable<T> outer,
IEnumerable<U> inner,
Func<T, K> outerKeySelector,
Func<U, K> innerKeySelector,
Func<T, U, V> resultSelector);
public static IEnumerable<V> Join<T, U, K, V>(
this IEnumerable<T> outer,
IEnumerable<U> inner,
Func<T, K> outerKeySelector,
Func<U, K> innerKeySelector,
Func<T, U, V> resultSelector,
IEqualityComparer<K> comparer);

Join requires a set of four generic types. The T type represents the type of the outer source sequence, and the U type describes the type of the inner source sequence. The predicates outerKeySelector and innerKeySelector define how to extract the identifying keys from the outer and inner source sequence items, respectively. These keys are both of type K, and their equivalence defines the join condition. The resultSelector predicate defines what to project into the result sequence, which will be an implementation of IEnumerable<V>. V is the last generic type needed by the operator, and it defines the type of each single item in the join result sequence. The second overload of the method has an additional custom equality comparer, used to compare the keys. If the comparer argument is NULL or if the first overload of the method is invoked, a default key comparer (EqualityComparer<TKey>.Default) will be used.

Here is an example that will make the use of Join more clear. Think about our customers, with their orders and products. In Listing 4-19, a query joins orders with their corresponding products.

Listing 4-19: The Join operator used to map orders with products
Image from book

var expr =
customers
.SelectMany(c => c.Orders)
.Join( products,
o => o.IdProduct,
p => p.IdProduct,
(o, p) => new {o.Month, o.Shipped, p.IdProduct, p.Price });
Image from book

The following is the result of the query:

{Month=January, Shipped=False, IdProduct=1, Price=10}
{Month=May, Shipped=True, IdProduct=2, Price=20}
{Month=July, Shipped=False, IdProduct=1, Price=10}
{Month=December, Shipped=True, IdProduct=3, Price=30}
{Month=January, Shipped=True, IdProduct=3, Price=30}
{Month=July, Shipped=False, IdProduct=4, Price=40}

In this example, orders represents the outer sequence and products is the inner sequence. The o and p used in lambda expressions are of type Order and Product, respectively. Internally, the operator collects the elements of the inner sequence into a hash table, using their keys extracted with innerKeySelector. It then enumerates the outer sequence and maps its elements, based on the Key value extracted with outerKeySelector, to the hash table of items. Because of its implementation, the Join operator result sequence keeps the order of the outer sequence first, and then uses the order of the inner sequence for each outer sequence element.

From an SQL point of view, the example in Listing 4-19 can be thought of as an inner equijoin somewhat like the following SQL query:

SELECT     o.Month, o.Shipped, p.IdProduct, p.Price
FROM       Orders AS o
INNER JOIN Products AS p
ON   o.IdProduct = p.IdProduct

If you want to translate the SQL syntax into the Join operator syntax, you can think about the columns selection in SQL as the resultSelector predicate, while the equality condition on IdProduct columns (of orders and products) corresponds to the pair of innerKeySelector and outerKeySelector predicates.

The Join operator has a corresponding LINQ syntax, which is shown in Listing 4-20.

Listing 4-20: The Join operator query expression syntax
Image from book

var expr =
from c in customers
from   o in c.Orders
        join   p in products
               on o.IdProduct equals p.IdProduct
select new {o.Month, o.Shipped, p.IdProduct, p.Price };
Image from book
Important 

The order of items to relate (o.IdProduct equals p.IdProduct) in LINQ query syntax must have the outer sequence first and the inner sequence after; otherwise, the LINQ query will not compile. This requirement is different from standard SQL queries, in which item ordering does not matter.

GroupJoin

In cases in which you need to define something similar to a LEFT OUTER JOIN or a RIGHT OUTER JOIN, you need to use the GroupJoin operator. Its signatures are quite similar to the Join operator:

public static IEnumerable<V> GroupJoin<T, U, K, V>(
this IEnumerable<T> outer,
IEnumerable<U> inner,
Func<T, K> outerKeySelector,
Func<U, K> innerKeySelector,
Func<T, IEnumerable<U>, V> resultSelector);
public static IEnumerable<V> GroupJoin<T, U, K, V>(
this IEnumerable<T> outer,
IEnumerable<U> inner,
Func<T, TKey> outerKeySelector,
Func<U, TKey> innerKeySelector,
Func<T, IEnumerable<U>, V> resultSelector,
IEqualityComparer<TKey> comparer);

The only difference is the definition of the resultSelector predicate. It requires an instance of IEnumerable<U>, instead of a single object of type U, because it projects a hierarchical result of type IEnumerable<V>, made of a selection of each item extracted from the inner sequence joined with a group of items, of type U, extracted from the outer sequence.

As a result of this behavior, the output is not a flattened outer equijoin, which would be produced by using the Join operator, but a hierarchical sequence of items. Nevertheless, you can define queries using GroupJoin with results equivalent to the Join operator, whenever the mapping is a one-to-one relationship. In case of the absence of a corresponding element group in the inner sequence, the GroupJoin operator extracts the outer sequence element paired with an empty sequence (Count = 0). In Listing 4-21, you can see an example of this operator.

Listing 4-21: The GroupJoin operator used to map products with orders, if present
Image from book

var expr =
products
.GroupJoin(
customers.SelectMany(c => c.Orders),
p => p.IdProduct,
o => o.IdProduct,
(p, orders) => new { p.IdProduct, Orders = orders });
foreach(var item in expr) {
Console.WriteLine("Product: {0}", item.IdProduct);
foreach (var order in item.Orders) {
Console.WriteLine("    {0}", order); }}
Image from book

The following is the result of Listing 4-21:

Product: 1
3 - False - January – 1
10 - False - July – 1
Product: 2
5 - True - May – 2
Product: 3
20 - True - December – 3
10 - True - January – 3
Product: 4
Product: 5
20 - False - July – 5
Product: 6

You can see that products 4 and 6 have no mapping orders, but the query returns them nonetheless. You can think about this operator like a SELECT FOR XML AUTO query in Transact-SQL in Microsoft SQL Server 2000 and 2005. In fact, it returns results hierarchically grouped like a set of XML nodes nested within their parent nodes, similar to the default result of a FOR XML AUTO query.

In a query expression, the GroupJoin operator is defined as a join into clause. The query expression shown in Listing 4-22 is equivalent to Listing 4-21.

Listing 4-22: A query expression with a join into clause
Image from book

var customersOrders =
from c in customers
from o in c.Orders
select o;
var expr =
from   p in products
    join   o in customersOrders
                on p.IdProduct equals o.IdProduct
                into orders
select new { p.IdProduct, Orders = orders };
Image from book

In this example, we first define an expression called customersOrders to extract the flat list of orders. (This expression still uses the SelectMany operator.) We could also define a single query expression, nesting the customersOrders expression within the main query. This approach is shown in Listing 4-23.

Listing 4-23: The query expression of Listing 4-22 in its compact version
Image from book

var expr =
from   p in products
join   o in (
           from c in customers
               from   o in c.Orders
               select o
           ) on p.IdProduct equals o.IdProduct
into orders
select new { p.IdProduct, Orders = orders };
Image from book

Set Operators

Our journey through LINQ operators continues with a group of methods that are used to handle sets of data, applying common set operations (union, intersect, and except) and selecting unique occurrences of items (distinct).

Distinct

Imagine that you want to extract all products that are mapped to orders, avoiding duplicates. This requirement could be solved in standard SQL using a DISTINCT clause within a JOIN query. LINQ provides a Distinct operator, a member of the set operators. Its signature is quite simple. It requires just a source sequence, from which all the distinct occurrences of items will be yielded. An example of the operator is shown in Listing 4-24.

public static IEnumerable<T> Distinct<T>(
this IEnumerable<T> source);
Listing 4-24: The Distinct operator applied to the list of products used in orders
Image from book

var expr =
customers
.SelectMany(c => c.Orders)
.Join(products,
o => o.IdProduct,
p => p.IdProduct,
(o, p) => p)
.Distinct();
Image from book

Distinct does not have an equivalent query expression clause; hence, as we did in Listing 4-15, we can apply this operator to the result of a query expression, as shown in Listing 4-25.

Listing 4-25: The Distinct operator applied to a query expression
Image from book

var expr =
(from c in customers
from   o in c.Orders
join   p in products
on o.IdProduct equals p.IdProduct
select p
).Distinct();
Image from book

By default, Distinct compares and identifies elements using their GetHashCode and Equals methods because, internally, it uses a default comparer of type EqualityComparer<T>.Default. We can, if necessary, override our type behavior to change the Distinct result, or we can just use the second overload of the Distinct method.

public static IEnumerable<T> Distinct<T>(
this IEnumerable<T> source,
IEqualityComparer<T> comparer);

This last overload accepts a comparer argument, available to provide a custom comparer for instances of type T.

Note 

We will see an example of how to compare reference types in the Union operator examples in Listing 4-26.

Union, Intersect, and Except

Within the set operators group, three more operators are useful for classic set operations. They are Union, Intersect, and Except, and they share a similar definition:

public static IEnumerable<T> Union<T>(
this IEnumerable<T> first,
IEnumerable<T> second);
public static IEnumerable<T> Union<T>(
this IEnumerable<T> first,
IEnumerable<T> second,
IEqualityComparer<T> comparer);
public static IEnumerable<T> Intersect<T>(
this IEnumerable<T> first,
IEnumerable<T> second);
public static IEnumerable<T> Intersect<T>(
this IEnumerable<T> first,
IEnumerable<T> second,
IEqualityComparer<T> comparer);
public static IEnumerable<T> Except<T>(
this IEnumerable<T> first,
IEnumerable<T> second);
public static IEnumerable<T> Except<T>(
this IEnumerable<T> first,
IEnumerable<T> second,
IEqualityComparer<T> comparer);

The Union operator yields the first sequence elements and the second sequence elements, skipping duplicates. For instance, in Listing 4-26, you can see how to merge the orders of the second customer with the orders of the third.

Listing 4-26: The Union operator applied to the second and third customer orders
Image from book

var expr = customers[1].Orders.Union(customers[2].Orders);
Image from book

As with the Distinct operator, in Union, Intersect, and Except, the elements are compared by using the GetHashCode and Equals methods in the first overload, or by using a custom comparer in the second overload. Here is the result of Listing 4-26:

10 - False - July – 1
20 - True - December – 3
20 - True - December - 3

The result might seem unexpected because we have two rows that appear to be the same. However, if you look at the initialization code used in all of our examples, each order is a different instance of the Order reference type. Even if the second order of the second customer is semantically equal to the first order of the third customer, they have two different hash codes. You can see this effect in the following code, where the two semantically equivalent Order instances are in bold:

customers[1] = new Customer {Name = "Marco", City = "Torino",
Country = Countries.Italy, Orders = new Order[] {
new Order {Quantity = 10, IdProduct = 1 ,
Shipped = false, Month = "July"},
    new Order {Quantity = 20, IdProduct = 3 ,
      Shipped = true, Month = "December"}}};
customers[2] = new Customer {Name = "James", City = "Dallas",
Country = Countries.USA, Orders = new Order[] {
    new Order {Quantity = 20, IdProduct = 3 ,
      Shipped = true, Month = "December"}}};

We have not defined a value type semantic for our Order reference type. To get the expected result, we can implement a value type semantic by overriding the GetHashCode and Equals implementations of the type to be compared. In this situation, it might be useful to do that, as you can see in this new Order implementation:

public class Order {
public int Quantity;
public bool Shipped;
public string Month;
public int IdProduct;
public override string ToString() {
return String.Format("{0} - {1} - {2} - {3}",
this.Quantity, this.Shipped, this.Month, this.IdProduct);
}
    public override bool Equals(object obj) {
if (!(obj is Order))
return false;
else {
Order o = (Order)obj;
return(o.IdProduct == this.IdProduct &&
o.Month == this.Month &&
o.Quantity == this.Quantity &&
o.Shipped == this.Shipped); }
}
    public override int GetHashCode() {
return String.Format("{0}|{1}|{2}|{3}", this.IdProduct,
this.Month, this.Quantity, this.Shipped).GetHashCode();
}
}

Another way to get the correct result is to use the second overload of the Union method, providing a custom comparer for the Order type. A final way to get the expected distinct behavior is to define the Order type as a value type, using struct instead of class in its declaration. By the way, it is not always possible to define a struct, because sometimes you need to implement an object-oriented infrastructure using type inheritance.

// Using struct instead of class, we get a value type
public struct Order {
public int Quantity;
public bool Shipped;
public string Month;
public int IdProduct;
}

Remember that an anonymous type is defined as a reference type with a value type semantic. In other words, all anonymous types are defined as a class with an override of GetHashCode and Equals written by the compiler.

In Listing 4-27, you can find an example of using Intersect and Except.

Listing 4-27: The Intersect and Except operators applied to the second and third customer orders
Image from book

var expr1 = customers[1].Orders.Intersect(customers[2].Orders);
var expr2 = customers[1].Orders.Except(customers[2].Orders);
Image from book

The Intersect operator yields only the elements that occur in both sequences, and the Except operator yields all the elements in the first sequence that are not present in the second sequence. Once again, there are no compact clauses to define set operators in query expressions, but we can apply them to LINQ query results, as in Listing 4-28.

Listing 4-28: Set operators applied to query expressions
Image from book

var expr =
(from c in customers
from   o in c.Orders
where  c.Country == Countries.Italy
select o
).Intersect(
from c in customers
from   o in c.Orders
where  c.Country == Countries.USA
select o);
Image from book

Aggregate Operators

At times, you need to make some aggregations over sequences to make calculations on source items. To accomplish this, LINQ provides the family of aggregate operators that implement the most common aggregate functions: Count, LongCount, Sum, Min, Max, Average, and Aggregate. Many of these operators are simple to use because their behavior is easy to understand.

Count and LongCount

Imagine that you want to list all customers, each one followed by the number of orders the customer has placed. In Listing 4-29, you can see an equivalent syntax, based on the Count operator.

Listing 4-29: The Count operator applied to customer orders
Image from book

var expr =
from   c in customers
select new {c.Name, c.City, c.Country, OrdersCount = c.Orders.Count() };
Image from book

The Count operator provides a couple of signatures, as does the LongCount operator:

public static int Count<T>(
this IEnumerable<T> source);
public static int Count<T>(
this IEnumerable<T> source,
Func<T, bool> predicate);
public static long LongCount<T>(
this IEnumerable<T> source);
public static long LongCount<T>(
this IEnumerable<T> source,
Func<T, bool> predicate);

The signature shown in Listing 4-29 is the common and simpler one that simply counts items in the source sequence. The second method overload accepts a non-nullable predicate, which is used to filter the items to count. LongCount variations simply return a long instead of an integer.

Sum

The Sum operator requires more attention because it has multiple definitions:

public static Numeric Sum(
this IEnumerable<Numeric> source);
public static Numeric Sum<T>(
this IEnumerable<T> source,
Func<T, Numeric> selector);

We used Numeric in the syntax to generalize the return type of the Sum operator. In practice, it has many definitions, one for each of the main Numeric types: int, int?, long, long?, float, float?, double, double?, decimal, and decimal?.

Important 

As you probably know, in C# 2.0 and later, the question mark that appears after a value type name (T?) defines a nullable type (Nullable<T>) of this type. For instance, int? means Nullable<System.Int32>.

The first implementation sums the source sequence items, assuming that the items are all the same numeric type, and returns the result. In the case of an empty source sequence, zero is returned. In the case of nullable types, the result might be null. This implementation can be used when the items can be summed directly. For example, we can sum an array of integers as in this code:

int[] values = { 1, 3, 9, 29 };
int   total  = values.Sum();

When the sequence is not made up of simple Numeric types, we need to extract values to be summed from each item in the source sequence. To do that, we can use the second overload, which accepts a selector argument. You can see an example of this syntax in Listing 4-30.

Listing 4-30: The Sum operator applied to customer orders
Image from book

var customersOrders =
from c in customers
from   o in c.Orders
join   p in products
on o.IdProduct equals p.IdProduct
select new { c.Name, OrderAmount = o.Quantity * p.Price };
var expr =
from   c in customers
join   o in customersOrders
on c.Name equals o.Name
into customersWithOrders
select new { c.Name,
                 TotalAmount = customersWithOrders.Sum(o => o.OrderAmount) };
Image from book

In Listing 4-30, we join customers with the customersOrders sequence, returning for each customer the total number of orders, calculated with the Sum operator. As usual, we can collapse the previous code using nested queries, which is the approach shown in Listing 4-31.

Listing 4-31: The Sum operator applied to customer orders, with a nested query
Image from book

var expr =
from   c in customers
join   o in (
from c in customers
from   o in c.Orders
join   p in products
on o.IdProduct equals p.IdProduct
select new { c.Name, OrderAmount = o.Quantity * p.Price }
) on c.Name equals o.Name
into customersWithOrders
select new { c.Name,
TotalAmount = customersWithOrders.Sum(o => o.OrderAmount) };
Image from book

Min and Max

Within the set of aggregate operators, Min and Max calculate the minimum and maximum values of the source sequence, respectively. Both of these extension methods provide a rich set of overloads:

public static Numeric Min/Max(
this IEnumerable<Numeric> source);
public static T Min<T>/Max<T>(
this IEnumerable<T> source);
public static Numeric Min<T>/Max<T>(
this IEnumerable<T> source,
Func<T, Numeric> selector);
public static S Min<T, S>/Max<T, S>(
this IEnumerable<T> source,
Func<T, S> selector);

The first signature, as in the Sum operator, provides many definitions for the main numeric types (int, int?, long, long?, float, float?, double, double?, decimal, and decimal?), and it computes the minimum or maximum value on an arithmetic basis, using the elements of the source sequence. This signature is useful when the source elements are numbers by themselves, as in Listing 4-32.

Listing 4-32: The Min operator applied to order quantities
Image from book

var expr =
(from c in customers
from   o in c.Orders
select o.Quantity
).Min();
Image from book

The second signature computes the minimum or maximum value of the source elements regardless of their type. The comparison is made using the IComparable<T> interface implementation, if supported by the source elements, or the nongeneric IComparable interface implementation. If the source type T does not implement either of these interfaces, an ArgumentException error will be thrown, with an Exception.Message equal to “At least one object must implement IComparable.” To examine this situation, take a look at Listing 4-33, in which the resulting anonymous type does not implement either of the interfaces required by the Min operator.

Listing 4-33: The Min operator applied to wrong types (thereby throwing an ArgumentException)
Image from book

var expr =
(from c in customers
from o in c.Orders
select new { o.Quantity}
).Min();
Image from book

In the case of an empty source or null source values, the result will be null whenever the Numeric type is a nullable type; otherwise, ArgumentNullException will be thrown. The selector predicate, available in the last two signatures, defines the function with which to extract values from the source sequence elements. For instance, you can use these overloads to avoid errors related to missing interface implementations (IComparable<T>/IComparable), as in Listing 4-34.

Listing 4-34: The Max operator applied to custom types, with a value selector
Image from book

var expr =
(from c in customers
from o in c.Orders
select new { o.Quantity}
).Min(o => o.Quantity);
Image from book

Tags:

Linq

Related posts

Sign up for PayPal and start accepting credit card payments instantly.


Powered by BlogEngine.NET 1.2.0.0