Average
The Average operator calculates the arithmetic average of a set of values, extracted from a source sequence. Like the previous operators, this function works with the source elements themselves or with values extracted using a selector predicate:
The Numeric type can be int, int?, long, long?, float, float?, double, double?, decimal, or decimal?. The Result type always reflects the “nullability” of the numeric type. When the Numeric type is int or long, the Result type is double. When the Numeric type is int? or long?, the Result type is double?. Otherwise, the Numeric and Result types are the same.
When the sum of the values used to compute the arithmetic average is too large for the result type, an OverflowException error is thrown. Because of its definition, the Average operator’s first signature can be invoked only on a Numeric sequence. If you want to invoke it on a source sequence, you need to provide a selector predicate. In Listing 4-35, you can see an example of both of the overloads.
Listing 4-35: Both Average operator signatures applied to product prices
var expr =
(from p in products
select p.Price
).Average();
var expr =
(from p in products
select new { p.Price }
).Average(p => p.Price);
The second signature is useful when you are defining a query in which the average is just one of the results to extract. An example is shown in Listing 4-36, where we extract all customers and their average order amounts.
Listing 4-36: Customers and their average order amounts
var expr =
from c in customers
join o in (
from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select new { c.Name, OrderAmount = o.Quantity * p.Price }
) on c.Name equals o.Name
into customersWithOrders
select new { c.Name,
AverageAmount = customersWithOrders.Average(o =>
o.OrderAmount) };
The results will be similar to the following:
Aggregate
The last operator in this set is Aggregate. Take a look at its definition:
This operator repeatedly invokes the func function, storing the result in an accumulator. Every step calls the function with the current accumulator value as the first argument, starting from seed, and with the current element within the source sequence as the second argument. At the end of the iteration, the operator returns the final accumulator value.
The only difference between the first two signatures is that the second requires an explicit value for the seed of type U. The first signature uses the first element in the source sequence as the seed and infers the seed type from the source sequence itself. The third signature looks like the second, but it requires a resultSelector predicate to call when extracting the final result.
In Listing 4-37, we use the Aggregate operator to extract the most expensive order for each customer.
Listing 4-37: Customers and their most expensive orders
var expr =
from c in customers
join o in (
from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select new { c.Name, o.IdProduct,
OrderAmount = o.Quantity * p.Price }
) on c.Name equals o.Name
into orders
select new { c.Name,
MaxOrderAmount =
orders
.Aggregate((t, s) => t.OrderAmount > s.OrderAmount ?
t : s)
.OrderAmount };
As you can see, the function called by the Aggregate operator compares the OrderAmount property of each order executed by the current customer and accumulates the more expensive one. At the end of each customer aggregation, the accumulator will contain the most expensive order, and its OrderAmount property will be projected into the final result, coupled with the customer Name property. The following is the output from this query:
In Listing 4-38, you can see another sample of aggregation. This example calculates the total ordered amount for each product.
Listing 4-38: Products and their ordered amounts
var expr =
from p in products
join o in (
from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select new { p.IdProduct, OrderAmount = o.Quantity * p.Price }
) on p.IdProduct equals o.IdProduct
into orders
select new { p.IdProduct,
TotalOrderedAmount =
orders
.Aggregate(0m, (a, o) => a += o.OrderAmount)};
Here is the output of this query:
In this second sample, the aggregate function uses an accumulator of Decimal type. It is initialized to zero (seed = 0m) and accumulates the OrderAmount values for every step. The result of this function will also be a Decimal type.
Both of the previous examples could also be defined by invoking the Max or Sum operators, respectively. They are shown in this section to help you learn about the Aggregate operator’s behavior. In general, keep in mind that the Aggregate operator is useful whenever there are no specific aggregation operators available; otherwise, you should use an operator such as Min, Max, Sum, and so on. For instance, consider the example in Listing 4-39.
Listing 4-39: Customers and their most expensive orders paired with the month of execution
var expr =
from c in customers
join o in (
from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select new { c.Name, o.IdProduct, o.Month,
OrderAmount = o.Quantity * p.Price }
) on c.Name equals o.Name into orders
select new { c.Name,
MaxOrder =
orders
.Aggregate( new { Amount = 0m, Month = String.Empty },
(t, s) => t.Amount > s.OrderAmount
? t
: new { Amount = s.OrderAmount,
Month = s.Month })};
The result of Listing 4-39 is shown here:
In this example, the Aggregate operator returns a new anonymous type called MaxOrder: it is a tuple composed of the amount and month of the most expensive order made by each customer. The Aggregate operator used here cannot be replaced by any of the other predefined aggregate operators because of its specific behavior and result type.
|
Note |
For further information about anonymous types, refer to Chapter 2, “C# Language Features,” or Chapter 3, “Microsoft Visual Basic 9.0 Language Features.”
|
The only way to produce a similar result using standard aggregate operators is to call two different aggregators. That would require two source sequence scannings: one to get the max amount and one to get its month. Be sure to pay attention to the seed definition, which declares the resulting anonymous type that will be used by the aggregation function as well.
Generation Operators
When working with data by applying aggregates, arithmetic operations, and mathematical functions, sometimes you need to also iterate over numbers or item collections. For example, think about a query that needs to extract orders placed for a particular set of years, between 2000 and 2007, or a query that needs to repeat the same operation over the same data. The generation operators are useful for operations such as these.
Range
The first operator of this set is Range. It is a simple extension method that yields a set of Integer numbers, selected within a specified range of values, as shown in its signature:
The code in Listing 4-40 illustrates a means to filter orders for the years between 2005 and 2007.
|
Important |
Please note that in the following example, a where condition would be more appropriate because we are iterating orders many times. The example in Listing 4-40 is provided only for demonstration and is not the best solution for the specific query.
|
Listing 4-40: A set of years generated by the Range operator, used to filter orders
var expr =
Enumerable.Range(2005, 3)
.SelectMany(x => (from o in orders
where o.Year == x
select new { o.Year, o.Amount }));
The Range operator can also be used to implement classical mathematical operations such as square, power, factorial, and so on. Listing 4-41 shows an example of using Range and Aggregate to calculate the factorial of a number.
Listing 4-41: A factorial of a number using the Range operator
static int Factorial(int number) {
return (Enumerable.Range(0, number + 1)
.Aggregate(0, (s, t) => t == 0 ? 1 : s *= t)); }
Repeat
Another generation operator is Repeat, which returns a set of count occurrences of element. When the element is an instance of a reference type, each repetition returns a reference to the same instance, not a copy of it.
The Repeat operator is useful for initializing enumerations (using the same element for all instances) or for repeating the same query many times. In Listing 4-42, we repeat the customer name selection two times.
Listing 4-42: The Repeat operator, used to repeat the same query many times
var expr =
Enumerable.Repeat( (from c in customers
select c.Name), 2)
.SelectMany(x => x);
Please note that in this example, Repeat returns a sequence of sequences, formed by two lists of customer names. For this reason, we used SelectMany to get a flat list of names.
Empty
The last of the generation operators is Empty, which is used to create an empty enumeration of a particular type T. This operation can be useful to initialize empty sequences.
Listing 4-43 provides an example that uses Empty to fill an empty enumeration of Customer.
Listing 4-43: The Empty operator used to initialize an empty set of customers
IEnumerable<Customer> customers = Enumerable.Empty<Customer>();
Quantifiers Operators
Imagine that you need to check for the existence of elements within a sequence, based on conditions or selection rules. First you select items with Restriction operators, and then you use aggregate operators such as Count to determine whether any item that verifies the condition exists. There is, however, a set of operators, called quantifiers, specifically used to check for existence conditions over sequences.
Any
The first operator we will describe in this group is the Any method, which evaluates a predicate and returns a Boolean result:
As you can see from the method’s signatures, the method has an overload that requires only the source sequence, without a predicate. This method returns true when at least one element in the source sequence exists or false if the source sequence is empty. To optimize its execution, Any returns as soon as a result is available. In Listing 4-44, you can see an example that determines whether there is any order of product one (IdProduct == 1) within all the customer orders.
Listing 4-44: The Any operator applied to all customer orders to check orders of IdProduct == 1
bool result =
(from c in customers
from o in c.Orders
select o)
.Any(o => o.IdProduct == 1);
result = Enumerable.Empty<Order>().Any(o => o.IdProduct == 1);
In this example, the operator evaluates items only until the first order matching the condition (IdProduct == 1) is found. The second example in Listing 4-44 illustrates a trivial example of the Any operator with a false result, using the Empty operator described earlier.
All
When you want to determine whether all of the items of a sequence verify a filtering condition, you can use the All operator. It returns a true result only if the condition is verified by all the elements in the source sequence:
For instance, in Listing 4-45 we determine whether every order has a positive quantity.
Listing 4-45: The All operator applied to all customer orders to check the quantity
bool result =
(from c in customers
from o in c.Orders
select o)
.All(o => o.Quantity > 0);
result = Enumerable.Empty<Order>().All(o => o.Quantity > 0);
|
Important |
The All predicate applied to an empty sequence will always return true. The internal operator implementation in LINQ to Objects enumerates all the source sequence items. It returns false as soon as an element that does not verify the predicate is found. If the sequence is empty, the predicate is never called and the true value is returned.
|
Contains
The last quantifier operator is the Contains extension method, which determines whether a source sequence contains a specific item value:
In the LINQ to Objects implementation, the method tries to use the Contains method of ICollection<T> if the source sequence implements this interface. In cases when ICollection<T> is not implemented, Contains enumerates all the items in source, comparing each one with the given value of type T and using a custom comparer if provided, the second method overload, or EqualityComparer<T>.Default otherwise.
In Listing 4-46, you can see an example of the Contains method as it is used to check for the existence of a specific order within the collection of orders of a customer.
Listing 4-46: The Contains operator applied to the first customer’s orders
orderOfProductOne = new Order {Quantity = 3, IdProduct = 1 ,
Shipped = false, Month = "January"};
bool result = customers[0].Orders.Contains(orderOfProductOne);
Because of its behavior, the Contains method invoked in Listing 4-46 returns true only if you use the same instance of Order as the value to compare. Otherwise, you need a custom comparer or a value type semantic for Order type (a reference type that overloads the GetHashCode and Equals methods or a value type, as we have already seen) to look for an equivalent order in the sequence.
Partitioning Operators
Selection and filtering operations sometimes need to be applied only to a subset of the elements of the source sequence. For instance, you might need to extract only the first N elements that verify a condition. You can use the Where and Select operators with the zero-based index argument of their predicate, but this approach is not always useful and intuitive. It is better to have specific operators for these kinds of operations because they are performed quite frequently.
A set of partitioning operators is provided to satisfy these needs. Take and TakeWhile select the first N items or the first items that verify a predicate, respectively. Skip and SkipWhile complement the Take and TakeWhile operators, skipping the first N items or the first items that validate a predicate.
Take
We will start with the Take and TakeWhile family:
The Take operator requires a count argument that represents the number of items to take from the source sequence. Negative values of count determine an empty result; values over the sequence size return the full source sequence. This method is useful for all queries in which you need the top N items. For instance, you could use this method to select the top N customers based on their order amount, as shown in Listing 4-47.
Listing 4-47: The Take operator, applied to extract the two top customers ordered by order amount
var topTwoCustomers =
(from c in customers
join o in (
from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select new { c.Name, OrderAmount = o.Quantity * p.Price }
) on c.Name equals o.Name
into customersWithOrders
let TotalAmount = customersWithOrders.Sum(o => o.OrderAmount)
orderby TotalAmount descending
select new { c.Name, TotalAmount }
).Take(2);
As you can see, the Take operator clause is quite simple, while the whole query is more articulated. The query contains several of the basic elements and operators we have previously discussed. The let clause, in addition to Take, is the only clause that we have not already seen in action. The let keyword is useful to define an alias for a value or for a variable representing a formula. In this sample, we need to use the sum of all order amounts on a customer basis as a value to project into the resulting anonymous type. At the same time, the same value is used as a sorting condition. Therefore, we defined an alias named TotalAmount to avoid duplicate formulas.
TakeWhile
The TakeWhile operator works like the Take operator, but it checks a formula to extract items instead of using a counter. Here are the method’s signatures:
There are two overloads of the method. The first requires a predicate that will be evaluated on each source sequence item. The method enumerates the source sequence and yields items if the predicate is true; it stops the enumeration when the predicate result becomes false, or when the end of the source is reached. The second overload also requires a zero-based index for the predicate to indicate where the query should start evaluating the source sequence.
Imagine that you want to identify your top customers, generating a list that makes up a minimum aggregate amount of orders. The problem looks similar to the one we solved with the Take operator in Listing 4-47, but we do not know how many customers we need to examine. TakeWhile can solve the problem by using a predicate that calculates the aggregate amount and uses that number to stop the enumeration when the target is reached. The resulting query is shown in Listing 4-48.
Listing 4-48: The TakeWhile operator, applied to extract the top customers that form 80 percent of all orders
// globalAmount is the total amount for all the orders
var limitAmount = globalAmount * 0.8m;
var aggregated = 0m;
var topCustomers =
(from c in customers
join o in (
from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select new { c.Name, OrderAmount = o.Quantity * p.Price }
) on c.Name equals o.Name
into customersWithOrders
let TotalAmount = customersWithOrders.Sum(o => o.OrderAmount)
orderby TotalAmount descending
select new { c.Name, TotalAmount }
)
.TakeWhile( X => {
bool result = aggregated < limitAmount;
aggregated += X.TotalAmount;
return result;
} );
Skip and SkipWhile
The Skip and SkipWhile signatures are very similar to those for Take and TakeWhile:
As we mentioned previously, these operators complement the Take and TakeWhile couple. In fact, the following code returns the full sequence of customers:
The only point of interest is that SkipWhile skips the source sequence items while the predicate evaluates to true and starts yielding items as soon as the predicate result is false, suspending the predicate evaluation on all the remaining items.
Element Operators
Element operators are defined to work with single items of a sequence, to extract a specific element by position or by using a predicate, rather than a default value in case of missing elements.
First
We will start with the First method, which extracts the first element in the sequence by using a predicate or a positional rule:
The first overload returns the first element in the source sequence, and the second overload uses a predicate to identify the first element to return. If there are no elements that verify the predicate or there are no elements at all in the source sequence, the operator will throw an InvalidOperationException error. Listing 4-49 shows an example of the First operator.
Listing 4-49: The First operator, used to select the first American customer
var item = customers.First(c => c.Country == Countries.USA);
Of course, this example could be defined by using a Where and Take operator. However, the First method better demonstrates the intention of the query, and it also guarantees a single (partial) scan of the source sequence.
FirstOrDefault
If you need to find the first element only if it exists, without any exception in case of failure, you can use the FirstOrDefault method. This method works like First, but if there are no elements that verify the predicate or if the source sequence is empty, it returns a default value:
The default returned is default(T) in the case of an empty source, where that default(T) returns null for reference types and nullable types. If no predicate argument is provided, the method returns the first element of the source if it exists. An example is shown in Listing 4-50.
Listing 4-50: Examples of the FirstOrDefault operator syntax
var item = customers.FirstOrDefault(c => c.City == "Las Vegas");
Console.WriteLine(item == null ? "null" : item.ToString()); // returns null
IEnumerable<Customer> emptyCustomers = Enumerable.Empty<Customer>();
item = emptyCustomers.FirstOrDefault(c => c.City == "Las Vegas");
Console.WriteLine(item == null ? "null" : item.ToString()); // returns null
Last and LastOrDefault
The Last and LastOrDefault operators are complements of First and FirstOrDefault. The former have signatures and behaviors that mirror the latter:
These methods work like First and FirstOrDefault. The only difference is that they select the last element in source instead of the first.
Single
Whenever you need to select a specific and unique item from a source sequence, you can use the operators Single or SingleOrDefault:
If no predicate is provided, single extracts from the source sequence the first single element. Otherwise, it extracts the single element that verifies the predicate. If there is no predicate and the source sequence contains more than one item, an InvalidOperationException error will be thrown. If there is a predicate and there are no matching elements or there is more than one match in the source, the method will throw an InvalidOperationException error, too. You can see some examples in Listing 4-51.
Listing 4-51: Examples of the Single operator syntax
// returns Product 1
var item = products.Single(p => p.IdProduct == 1);
Console.WriteLine(item == null ? "null" : item.ToString());
// InvalidOperationException
item = products.Single();
Console.WriteLine(item == null ? "null" : item.ToString());
// InvalidOperationException
IEnumerable<Product> emptyProducts = Enumerable.Empty<Product>();
item = emptyProducts.Single(p => p.IdProduct == 1);
Console.WriteLine(item == null ? "null" : item.ToString());
SingleOrDefault
The SingleOrDefault operator provides a default result value in the case of an empty sequence or no matching elements in source. Its signatures are like those for Single:
The default value returned by this method is default(T), as in the FirstOrDefault and LastOrDefault extension methods.
|
Important |
The default value is returned only if no elements match the predicate. An InvalidOperationException error is thrown when the source sequence contains more than one matching item.
|
ElementAt and ElementAtOrDefault
Whenever you need to extract a specific item from a sequence based on its position, you can use the ElementAt or ElementAtOrDefault method:
The ElementAt method requires an index argument that represents the position of the element to extract. The index is zero based; therefore, you need to provide a value of 2 to extract the third element. When the value of index is negative or greater than the size of the source sequence, an ArgumentOutOfRangeException error is thrown. The ElementAtOrDefault method differs from ElementAt because it returns a default value-default(T) for reference types and nullable types-in the case of a negative index or an index greater than the size of the source sequence. Listing 4-52 shows some examples of how to use these operators.
Listing 4-52: Examples of the ElementAt and ElementAtOrDefault operator syntax
// returns Product 2
var item = products.ElementAt(2);
Console.WriteLine(item == null ? "null" : item.ToString());
// returns null
item = Enumerable.Empty<Product>().ElementAtOrDefault(6);
Console.WriteLine(item == null ? "null" : item.ToString());
// returns null
item = products.ElementAtOrDefault(6);
Console.WriteLine(item == null ? "null" : item.ToString());
DefaultIfEmpty
DefaultIfEmpty returns a default element for an empty sequence:
By default, it returns the list of items of a source sequence. In the case of an empty source, it returns a default value that is default(T) in the first overload or defaultValue if you use the second overload of the method.
Defining a specific default value can be helpful in many circumstances. For instance, imagine that you have a public static property named Empty, used to return an empty instance of a Customer:
Sometime this is useful, especially when unit testing code. Another situation is when a query uses GroupJoin to realize a left outer join. The possible resulting NULLs can be replaced by a default value chosen by the query author.
In Listing 4-53, you can see how to use DefaultIfEmpty, eventually with a custom default value such as Customer.Empty.
Listing 4-53: Example of the DefaultIfEmpty operator syntax, both with default(T) and a custom default value
var expr = customers.DefaultIfEmpty();
var customers = Enumerable.Empty<Customer>(); // Empty array
IEnumerable<Customer> customersEmpty =
customers.DefaultIfEmpty(Customer.Empty);
Other Operators
To complete our coverage of LINQ query operators, we describe a few final extension methods in this section.
Concat
The first one is the concatenation operator, named Concat. As its name suggests, it simply appends a sequence to another, as we can see from its signature:
The only requirement for Concat arguments is that they enumerate the same type T. We can use this method to append any IEnumerable<T> sequence to another of the same type. Listing 4-54 shows an example of customer concatenation.
Listing 4-54: The Concat operator, used to concatenate Italian customers with customers from the United States
var italianCustomers =
from c in customers
where c.Country == Countries.Italy
select c;
var americanCustomers =
from c in customers
where c.Country == Countries.USA
select c;
var expr = italianCustomers.Concat(americanCustomers);
SequenceEqual
Another useful operator is the equality operator, which corresponds to the SequenceEqual extension method:
This method compares each item in the first sequence with each corresponding item in the second sequence. If the two sequences have exactly the same number of items with equal items in every position, the two sequences are considered equal. Remember the possible issues of reference type semantics in this kind of comparison. You can consider overriding GetHashCode and Equals to drive the result of this operator, or you can use the second method overload, providing a custom implementation of IEqualityComparer<T>.