The cost and inconvenience of a communication protocol dictates how you should use the medium. You communicate differently using the phone, fax, letters, and email. Think back on the last time you ordered from a catalog. When you order by phone, you engage in a question-and-answer session with the sales staff:
"Can I have your first item?"
"Item number 123-456."
"How many would you like?"
"Three."
This conversation continues until the sales staff has your entire order, your billing address, your credit-card information, your shipping address, and any other information necessary to complete the transaction. It's comforting on the phone to have this back-and-forth discussion. You never give long soliloquies with no feedback. You never endure long periods of silence wondering if the salesperson is still there.
Contrast that with ordering by fax. You fill out the entire document and fax the completed document to the company. One document, one transaction. You do not fill out one product line, fax it, add your address, fax again, add your credit number, and fax again.
This illustrates the common pitfalls of a poorly defined web method interface. Whether you use a web service or .NET Remoting,you must remember that the most expensive part of the operation comes when you transfer objects between distant machines. You must stop creating remote APIs that are simply a repackaging of the same local interfaces that you use. It works, but it reeks of inefficiency. It's using the phone call metaphor to process your catalog request via fax. Your application waits for the network each time you make a round trip to pass a new piece of information through the pipe. The more granular the API is, the higher percentage of time your application spends waiting for data to return from the server.
Instead, create web-based interfaces based on serializing documents or sets of objects between client and server. Your remote communications should work like the order form you fax to the catalog company: The client machine should be capable of working for extended periods of time without contacting the server. Then, when all the information to complete the transaction is filled in, the client can send the entire document to the server. The server's responses work the same way: When information gets sent from the server to the client, the client receives all the information necessary to complete all the tasks at hand.
Sticking with the customer order metaphor, we'll design a customer order-processing system that consists of a central server and desktop clients accessing information via web services. One class in the system is the customer class. If you ignore the transport issues, the customer class might look something like this, which allows client code to retrieve or modify the name, shipping address, and account information:
public class Customer
{
public Customer( )
{
}
// Properties to access and modify customer fields:
public string Name
{
// get and set details elided.
}
public Address shippingAddr
{
// get and set details elided.
}
public Account creditCardInfo
{
// get and set details elided.
}
}
The customer class does not contain the kind of API that should be called remotely. Calling a remote customer results in excessive traffic between the client and the server:
// create customer on the server.
Customer c = new Server.Customer( );
// round trip to set the name.
c.Name = dlg.Name.Text;
// round trip to set the addr.
c.shippingAddr = dlg.Addr;
// round trip to set the cc card.
c.creditCardInfo = dlg.credit;
Instead, you would create a local Customer object and transfer the Customer to the server after all the fields have been set:
// create customer on the client.
Customer c = new Customer( );
// Set local copy
c.Name = dlg.Name.Text;
// set the local addr.
c.shippingAddr = dlg.Addr;
// set the local cc card.
c.creditCardInfo = dlg.credit;
// send the finished object to the server. (one trip)
Server.AddCustomer( c );
The customer example illustrates an obvious and simple example: transfer entire objects back and forth between client and server. But to write efficient programs, you need to extend that simple example to include the right set of related objects. Making remote invocations to set a single property of an object is too small of a granularity. But one customer might not be the right granularity for transactions between the client and server, either.
To extend this example into the real-world design issues you'll encounter in your programs, we'll make a few assumptions about the system. This software system supports a major online vendor with more than 1 million customers. Imagine that it is a major catalog ordering house and that each customer has, on average, 15 orders in the last year. Each telephone operator uses one machine during the shift and must lookup or create customer records whenever he or she answers the phone. Your design task is to determine the most efficient set of objects to transfer between client machines and the server.
You can begin by eliminating some obvious choices. Retrieving every customer and every order is clearly prohibitive: 1 million customers and 15 million order records are just too much data to bring to each client. You've simply traded one bottleneck for another. Instead of constantly bombarding your server with every possible data update, you send the server a request for more than 15 million objects. Sure, it's only one transaction, but it's a very inefficient transaction.
Instead, consider how you can best retrieve a set of objects that can constitute a good approximation of the set of data that an operator must use for the next several minutes. An operator will answer the phone and be interacting with one customer. During the course of the phone call, that operator might add or remove orders, change orders, or modify a customer's account information. The obvious choice is to retrieve one customer, with all orders that have been placed by that customer. The server method would be something like this:
public OrderData FindOrders( string customerName )
{
// Search for the customer by name.
// Find all orders by that customer.
}
Or is that right? Orders that have been shipped and received by the customer are almost certainly not needed at the client machine. A better answer is to retrieve only the open orders for the requested customer. The server method would change to something like this:
public OrderData FindOpenOrders( string customerName )
{
// Search for the customer by name.
// Find all orders by that customer.
// Filter out those that have already
// been received.
}
You are still making the client machine create a new request for each customer phone call. Are there ways to optimize this communication channel more than including orders in the customer download? We'll add a few more assumptions on the business processes to give you some more ideas. Suppose that the call center is partitioned so that each working team receives calls from only one area code. Now you can modify your design to optimize the communication quite a bit more.
Each operator would retrieve the updated customer and order information for that one area code at the start of the shift. After each call, the client application would push the modified data back to the server, and the server would respond with all changes since the last time this client machine asked for data. The end result is that after every phone call, the operator sends any changes made and retrieves all changes made by any other operator in the same work group. This design means that there is one transaction per phone call, and each operator should always have the right set of data available when he or she answers a call. Now the server contains two methods that would look something like this:
public CustomerSet RetrieveCustomerData(
AreaCode theAreaCode )
{
// Find all customers for a given area code.
// Foreach customer in that area code:
// Find all orders by that customer.
// Filter out those that have already
// been received.
// Return the result.
}
public CustomerSet UpdateCustomer( CustomerData
updates, DataTime lastUpdate, AreaCode theAreaCode )
{
// First, save any updates, marking each update
// with the current time.
// Next, get the updates:
// Find all customers for a given area code.
// Foreach customer in that area code:
// Find all orders by that customer that have been
// updated since the last time. Add those to the result.
// Return the result.
}
But you might still be wasting some bandwidth. Your last design works best when every known customer calls every day. That's probably not true. If it is, your company has customer service problems that are far outside of the scope of a software program.
How can we further limit the size of each transaction without increasing the number of transactions or the latency of the service rep's responsiveness to a customer? You can make some assumptions about which customers in the database are going to place calls. You track some statistics and find that if customers go six months without ordering, they are very unlikely to order again. So you stop retrieving those customers and their orders at the beginning of the day. That shrinks the size of the initial transaction. You also find that any customer who calls shortly after placing an order is usually inquiring about the last order. So you modify the list of orders sent down to the client to include only the last order rather than all orders. This would not change the signatures of the server methods, but it would shrink the size of the packets sent back to the client.
This hypothetical discussion focused on getting you to think about the communication between remote machines: You want to minimize both the frequency and the size of the transactions sent between machines. Those two goals are at odds, and you need to make trade-offs between them. You should end up close to the center of the two extremes, but err toward the side of fewer, larger transactions.