Eager Loading with Entity Framework and NHibernate

At CB Richard Ellis, Rich Alger and I wrote and maintain an internal ordering system that uses NHibernate. It was our first experience with NHibernate, but it was a huge success. Much of our initial focus was to just get things working, not to optimize it, since NHibernate has a bit of a steep learning curve. Now that we have more experience with it, we’re revisiting some of our strategies to improve them. My approach is this:
  • Set up a SQL Trace on the dev database
  • Load up the dev deployment of the ASP.NET app
  • Load a page I want to optimize
  • Count the number of queries going to the database as a result of loading that page
  • Optimize queries till I’m making the fewest db trips possible
Some pages were sending hundreds of queries! We had a lot of the SELECT N+1 problem going on, and some of those were pretty easy to fix. In general, we are trying to improve our loading strategies, to eagerly load everything we know we’ll need the FIRST time around, and to avoid the use of lazy loading, since it requires additional trips to the database. Lazy loading is great in dynamic scenarios, but if you already know you’re going to need something, don’t wait. Get it the first time around.

Which brings me to my current point. Eager loading with NH versus EF. Ayende recently asked what EFv4 can do that NH can’t. In a subsequent post he replies to the many responses he received. I have an ongoing discussion with him in the comments regarding eager fetching.

Suppose you had these tables:
User <1---*> User_x_Role <*---1> Roles <1---*> Role_x_Permission <*---1> Permission
(The reason for the associative tables is that I hate many-to-many relationships. They are evil).
These tables set up a basic security model. Certain operations and resources require that you have a specific permission. Permissions are assigned to roles, and roles are assigned to users. A user’s permissions, then, are inherited from the roles he is assigned to. Permissions are not directly assignable to users. You must use a Role.

These tables would result in a model like this:
UserThe User class
RoleThe Role class
PermissionThe Permission class
User.RolesAn IList<Role> that contains the Roles assigned to the User
Role.PermissionsAn IList<Permission> that contains the permissions assigned to the Role
Role.UsersAn IList<User> that contains all of the Users that are assigned this Role
Permission.RolesAn IList<Role> that contains all of the Roles are that are assigned this Permission
Checking if a User has a Permission may look like:

someUser.Roles.Permissions.Contains(somePermission);

So when fetching a user from the database, we may want to eagerly fetch his roles, and the permissions of those roles. With EF, I can do this:

using (MyEntities context = new MyEntities())
{
	int userID = 6;
	var qUser =
		from u in context.User.Include("Roles.Permissions")
		where u.ID == userID
		select u;
	User user = qUser.FirstOrDefault();

	// Roles is already populated
	int numRoles = user.Roles.Count;

	// As are the Permissions of each Role
	int numPermissions = user.Roles.First().Permissions.Count;
}

Note the use of ObjectQuery<T>.Include(string path). Note that it’s chainable. If you had some other related entities or collections of User that you wanted to eagerly fetch you could do this:
from u in context.User
	.Include("Roles.Permissions")
	.Include("Orders.OrderDetails")
	.Include("Employer")
	.Include("ContactInfo")
select u;
Basically, you have complete control over what data will be eagerly fetched, and you’ll get back the exact object graph you want, with all of the collections populated, and related entities loaded. If you think that’s impressive, fire up SQL Profiler and do a trace on your database. All of the data for all of those entities and collections is retrieved in one query. It may be full of subselects, joins, unions, etc, but it’s one query. One trip to the database, to issue one query, and you have all the data you want. Because of the ease of use of ObjectQuery<T>Include(string path), it’s easy to make methods on your entity classes that give you a LOT of flexibility when it comes to getting the data you want:
		public  static IList<Affiliate> GetPage(int pageIndex, int pageSize,
			string where, string orderBy, out int rawRecordCount, params string[] includes)
		{
			// Start with a basic ObjectQuery<T> that would just return ALL rows (no criteria or sorting)
			ObjectQuery<Affiliate> query = Jam2DataLayer.Instance.Context.Affiliate;

			// Apply the where-clause if they provided one.
			if (!String.IsNullOrEmpty(where))
				query = query.Where(where);

			// Apply the orderby-clause if they provided one
			if (!String.IsNullOrEmpty(orderBy))
				query = query.OrderBy(orderBy);
			else
				// If no sort-by clause was provided, sort by the ID desc (show newest records first)
				// (There has to be sorting applied for paging to work with Skip() and Take())
				query = query.OrderByDescending(ent => ent.ID);

			// Execute the query at this point, but only to retrieve a count of the records it returns.
			// We want to know the number of records our query WOULD return, if it weren't paged.
			rawRecordCount = query.Count();

			// Set up any eager loading that was specified
			foreach (string include in includes)
				query = query.Include(include);

			// Get results just for the page we're interested in.
			query = query.Skip(pageIndex * pageSize).Take(pageSize);

			// Execute the query, storing the results in a List<T>
			List<Affiliate> list = query.ToList();
			return list;
		}


		// Example usage: Get the first 10 active Affiliate objects, with their User entity eagerly loaded
		int totalRecords;
		IList<Affiliate> activeAffiliates = Affiliate.GetPage(0, 10, "it.Active == true", null, out totalRecords, "User");
Definitely a handy method to have on each of your entity classes (note: in an upcoming post, I’ll be showing my T4 templates that use the SFS Plugin for Visual Studio 2008 to generate just such code for each of your entity classes).

Now, can this be done in NHibernate? Honestly, I don’t know yet. I’ve been trying. My first attempt went something like this:
public virtual IList<T> GetPage<T>(int pageIndex, int pageSize, params string[] eagerFetches)
{
	ICriteria crit = CurrentSession.CreateCriteria<T>();
	crit.SetFirstResult(pageIndex * pageSize);
	if (pageSize > 0)
		crit.SetMaxResults(pageSize);

	// Now set up the eager fetching for whatever the caller specified
	foreach(string entity in eagerFetches)
		crit.SetFetchMode(entity, FetchMode.Eager);

	return crit.List<T>();
}
That seemed to work! GetPage<User>(0, 10, “Roles”) DID indeed fetch 10 users, with their Roles collections already populated. Then I tried “Roles.Permissions”. No dice! The data that comes back is correct, but it issues 10 separate queries. It’s still lazy loading. And, when you start to include multiple collections for the root entity, as in GetPage<User>(0, 10, “Roles”, “Orders”), you start getting Cartesian products, producing millions of rows sometimes. NH doesn’t sort it out. This question on StackOverflow has an answer that quotes from the NHibernate In Action book: From “NHibernate in Action”, page 225: NHibernate currently limits you to fetching just one collection eagerly.

Perhaps that is still the case. I mentioned this on Ayende’s blog, suggesting that perhaps this is an area where EF has the features in tow, and NH is still lacking. He told me that FetchMode.Eager is the same as FetchMode.Join, which would be the reason for the cartesian products. He said to look at FetchMode.Subselect, and Future queries. Well FetchMode.Subselect can only be set in the mapping file, it can’t be set arbitrarily on an ICriteria (the NHibernate.FetchMode enum does not have a field named Subselect), so that’s out. I want to leave my mappings as lazy, and specify eager only when I want it, on a per query basis. As for the Future<T> queries, I haven’t been able to get them to do what I want either. I might be able to get individual lists back, that all contain the related entities I’m interested in, by delaying the query to the database, storing the queries in an IMultiCriteria, but that isn’t going to give me a connected object graph, just disparate lists. I’ll keep poking around with different ways to hopefully accomplish the same thing in NH. For now, hopefully the discussion stays alive in the comments on his blog. I’d love to find a way to do this in NH, as it’s a powerful feature. I have a feeling that even if we conjure something up, it’s not going to be the same as what EF does, issuing only a single query to the database. Time will tell.



This entry was posted in Code and tagged , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

3 Comments

  1. Matthys
    Posted January 22, 2010 at 1:12 am | Permalink

    Have you tried accessing the the collections after loading them through the future queries – ie load and then discard the disparate lists, and then just use the associations as you would normally do. As long as the entity you access the relation through has been loaded in the same session, I would expect that the related entities would be available through the session cache. In other words, you don’t have to use the disparate lists, after any entity has been loaded in the session, it should be available whichever way you access it.

  2. Posted January 25, 2010 at 1:02 am | Permalink

    I have not tried that, I didn’t think about how the session cache would prevent a trip to the database. “Load and then discard” sounds like a good idea, I’ll give it a try this week, and will update my progress here.

  3. Anand
    Posted May 20, 2010 at 11:46 am | Permalink

    When you say EF does it in one query It brings them all back in single result set and tries and works through them to get the object graph.
    I have to bring one user role
    I have 2 roles assigned to the user
    I have 5 permissions available per role
    assume that there are two columns in each table
    the result set is going to have 10 rows and 10 columns with a total of 100 data elements and most of it being repeated data. I may be wrong in the actual number but it is quite large. How would this be efficient way to do eager loading? Should it not batch the call and loop through the result sets to populate the object graph instead to make it efficient?
    I do not know if NHibernate handles it well as I am still trying to understand both of them.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>