The Entity Framework (v1 and v4) Deal Breaker: TPT Inheritance

9/10/2010 Update #3: It appears that MS has deleted the bug on Microsoft Connect. Here is a cached version indexed from Google where you can see it has 34 upvotes, and 22 users saying they can reproduce. I blogged about this in January 2010, and the bug was opened on 3/17/2010. MS didn’t respond until 6/29/2010, when they said:
Sorry for not responding here. We are aware of performance issues with TPT hierarchies. We are currently investigating solutions and expect to make improvements in this area in a future release. This is definitely at the top of our query/perf optimizations list. I know this isn’t a great answer, and is not something that helps you today. Will keep you posted as we find out more, however. Hope this helps. Entity Framework Team
Apparently “keep you posted” means “we’ll delete your posts”. I don’t want to jump to any conclusions, but it is certainly very odd. Especially when you consider that they have mentioned the problem on their own blog (though they don’t nearly describe how bad it actually is).


3/17/2010 Update #2: I have submitted this problem as a bug on Microsoft Connect. If you’ve been bitten by this bug, vote it up (click the “I can too” link indicating that you can reproduce the bug), and leave some helpful comments.


3/17/2010 Update: I’ve put together a quick sample program that illustrates this problem, if you want to try it for yourself.
Download EFInheritanceTest.zip.
The data project has the included SQL scripts to create a simple EFTest database, with a base class table, some sub class tables, and a bit of dummy data.


I’ll keep the intro short and sweet. Everyone says you’ll eventually get burned by Microsoft data access strategies, because they eventually get abandoned. Well, my turn finally came. And it’s not that Entity Framework is being abandoned, it’s just that it DOESN’T WORK.
</intro>

Regarding inheritance mapping, EF advertises that it offers you 3 solutions:
  • TPT: Table per type
  • TPH: Table per hierarchy
  • TPC: Table per concrete class
I’ll be focusing on TPT: Table per type.

Let’s say I have a line of business application where users answer a bunch of Yes/No questions, and depending on their answers, fill out a series of forms, each form having a unique set of fields that we want to capture end-user data for.

Q: Did you open a bank account?
Yes
Name of bank:
Date account opened:
Purpose of account:
etc.

EVERY form has a common set of fields as well: The end-user that the particular instance belongs to, the date it was created, whatever. Now, when I have a User object, and I want to see all the forms he’s filled out, I don’t want to have to access 20 different collections of the User object, one collection for each form type:
User user = User.GetByKey(20);
user.NewAccountForms;
user.TaxForms;
user.ReplacedLimbsForms;
user.CentralServicesForms;

I want an IList<FormBase> or something similar. This means I need to use inheritance, and each form needs to be derived from my FormBase class (actually named FormInstance in this case).

If you need this type of design, or think that your software at some point in the future will require this type of design, DO NOT use the Entity Framework. The Table-Per-Type inheritance is 100% unusable. If you feel that one of the other forms of inheritance suits you, well, you may be crazy, but just the same….tread lightly. (And please don’t have 50 different discriminator values in a table that’s 200 columns wide…the SQL gods will strike you down).

So what’s wrong with EF’s TPT inheritance? The SQL that EF generates. With some simple testing (2 or 3 subclasses) everything may appear to be fine. In fact, you may even be impressed that EF manages to fetch all the data you need in a single trip, indeed, in a single query. Wow, their SQL generation must be top notch… But once you start adding some more subclasses, surprise! Your application grinds to a halt. So how bad is it? I decided to find out definitively.

Results

The setup is very basic. Here’s the Linq query I’m executing against my model:
	var q =
		from insts in context.FormInstances
		select insts;
	
	List<FormInstances> instances = q.ToList();
Very basic. Give me the FormInstances. FormInstance is the base class. I started with just a single derived class, ran my test 500 times, averaging the time it took to execute my simple query. Then I added a 2nd derived class, a 3rd one, etc etc. The base class has less than 10 fields. Each derived class (to keep things consistent) is exactly the same, with 3 simple fields. Also, the tables are empty. Yes, empty. No data! So the performance numbers here strictly represent the time it takes EF to generate its abhorrent SQL, and the amount of time it takes SQL Server to grind it out.

Most of these are averaged over 500 executions. Some of the later ones were fewer, because it started to take over an hour to execute.
Num Derived Classes Avg ms for query generation and execution Lines of SQL Generated
1 16 80
2 35 106
3 67 152
4 119 214
5 203 292
6 340 386
7 523 496
8 784 622
9 1137 764
10 1608 922
11 2198 1096
12 2958 1286
13 3900 1492
14 4828 1686
15 5962 1892
20 17412 3302
25 41206 5112
30 108807 7862
So what happens when your application has 30 subclasses? It takes 108 seconds to generate and execute the SQL to retrieve the records in them! That’s 3.6 seconds PER FORM! And that’s when the query returns ZERO RECORDS, because the tables are EMPTY. Why does it take so long? Well, look at the 7,862 lines of SQL it generates, to start with (I won’t post it here; I’ll spare you that).
Here’s what it looks like graphed:

Ouch! That sure doesn’t look linear folks… And guess what, my “I got burned” experience doesn’t have 30 form tables. It has 89. Based on the trend outlined above, it would take 97 minutes to generate and execute the SQL to select the data from those 89 tables. Funny, because 30 SQL statements querying those tables, even with WHERE fk IN(SELECT id FROM BaseTable) clauses on each one only take 132ms on the same computer…And it returns the same data.

Pragmatists vs Idealists

Joel Spolsky has some great words (scroll down to “The Two Forces at Microsoft”) regarding two developer camps at Microsoft; the Pragmatists, and the Idealists. Windows XP was a product of the Pragmatists; Vista, of the Idealists. Enough said. While the SQL generated by EF in cases such as these may call to the inner relational mathematician idealist in us, the result is impossible to read or easily understand, takes a long time to be generated, and even longer to execute. In other words, it’s 100% unusable.

Now, as a matter of research, I have to create this same test in NHibernate, to see what the generated SQL looks like, and to see how well it performs. I’ll update this post when I have that data. Until then, if you decide to use EF’s inheritance “features”, here is a list of movies that are all 97 minutes long, so that your end users will have something to do while your site grates off your SQL Server’s face from the comfort of their browsers.






This entry was posted in Code and tagged , , , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

18 Comments

  1. Posted March 3, 2010 at 10:02 am | Permalink

    Great post! Unfortunately I’ve run into the same problem… But let’s try to clarify something here: Where exactly is the bottleneck? I don’t disagree with your numbers, but for the time under “Avg ms for query generation and execution”, how much of that is actually spent in the CLR generating the query and materializing the object, and how much is spent in SQL?

    Using a CLR profiler and SQL Server Profiler, it seems to me as though the real bottleneck is in the time it takes for EF to generate the query and materialize the subsequent object(s).

    Example for querying an entity with about 5 or 6 subclasses and several associations:

    TSource System.Linq.Queryable.FirstOrDefault(IQueryable) – 2,303 ms
    Number of SQL lines in query – 1084
    Time taken to execute query in SQL – 111 ms

    So it takes just over 2 seconds to bring back my object. Despite the fact that the generated SQL is an ugly, outer join filled mess, it really isn’t all that bad performance wise (relatively speaking, I can deal with 111 ms queries). But whatever EF is doing behind the scenes is causing the real slow down.

    This is with EFv1 – so I wonder how EFv2 compares. I haven’t installed VS2010 yet because I haven’t seen any evidence that it fixes the problem.

  2. Posted March 8, 2010 at 12:34 pm | Permalink

    @DK: I did do a little differentiation initially, splitting the measurements between the time it takes EF to generate SQL and the time it takes SQL Server to execute it, and in the simpler cases, SQL Server does execute the sql quickly enough. However, in some of my later cases, where you have 30+ subclasses, even SQL server takes 30+ seconds to execute your queries, even when no data is coming back. It’s just a mess all around. It’s not fixed in EFv2 or EFv4 either. Because the problem affected both EF sql generation, and SQL Server execution times, I just combined the times together to make a simpler case.

  3. Adler Catalin
    Posted March 17, 2010 at 2:01 am | Permalink

    I’t not a problem of time. It’s a problem of sql complexity & scalability.
    We use TPT with 3 derived tables, the execution plan is huge and contains joins with missing join predicade warning.

    That kinda of select cannot be used in production environment. Ii is tottaly unmanageble.

    We’ll try and select from the base table the old fashioned way (sqlcommand) :(

    i was thinking about sql entities and the oftype operator, but it didnt help. :(
    any other ideea?

  4. Posted March 17, 2010 at 10:03 am | Permalink

    OfType should actually produce pretty decent sequel if you can use it. The problem with OfType is that you can only select objects of a single subclass at a time, like this:

    var q =
    from subclass in context.BaseClass.OfType<SubClass>()
    select subclass;

    That would return all instances of SubClass, and the SQL would be reasonably simple. The problem is http://samscode.com/wp-admin/#comments-formthat that isn’t terribly useful. You can only fetch instances of 1 subclass at a time, and you have to know the type ahead of time (at compile time). You’re not really getting any of the advantages of inheritance or polymorphism.

    And unfortunately no, I don’t really have any other ideas on how to fix this (aside from my initial advice: if you need to use inheritance, don’t use EF). I think I’ll head over to Microsoft Connect and try to submit a bug report regarding the performance and the complexity of the SQL in these scenarios. Maybe we can get them to acknowledge the problem.

  5. Posted March 17, 2010 at 10:37 am | Permalink

    I submitted this problem as a bug on Microsoft Connect.
    https://connect.microsoft.com/data/feedback/details/542660/tpt-table-per-type-inheritance-performance
    Go vote it up if you’ve been plagued by this problem.

  6. Mike Hellem
    Posted March 23, 2010 at 12:55 am | Permalink

    I’m interested in this problem as i’m working on an EF project using inheritance so I ran your test with VS2010 RC (quad core, 10gb ram, 64 bit win7). I don’t have the time right now to investigate the results but thought i’d post them here for you. Feel free to email me.
    ——————————————————————————–

    With the simple model (1 base class, 3 subclasses)….
    ——————————————————————————–

    *** Query 1:
    *** 64 lines of generated SQL, 6 selects, 2 joins, 3 unions
    *** Generated by EF in 306 ms, Executed by SQL Server in 90 ms

    Query results for all shapes: 3 shapes returned.
    Color: Black
    EFInheritanceTest.Data.Model.Triangle

    Color: Green
    EFInheritanceTest.Data.Model.Circle

    Color: Red
    EFInheritanceTest.Data.Model.Square

    *** Query 2:
    *** 52 lines of generated SQL, 5 selects, 2 joins, 3 unions
    *** Generated by EF in 10 ms, Executed by SQL Server in 7 ms

    Query results for the 1 shape:
    Color: Red
    EFInheritanceTest.Data.Model.Square

    *** Query 3:
    *** 9 lines of generated SQL, 1 selects, 2 joins, 1 unions
    *** Generated by EF in 53 ms, Executed by SQL Server in 31 ms

    Query results for the 1 square:
    Color: Red
    EFInheritanceTest.Data.Model.Square

    ——————————————————————————–

    With the larger model (1 base class, 12 subclasses)….
    ——————————————————————————–

    *** Query 1:
    *** 658 lines of generated SQL, 24 selects, 2 joins, 12 unions
    *** Generated by EF in 655 ms, Executed by SQL Server in 396 ms

    Query results for all shapes: 3 shapes returned.
    Color: Black
    EFInheritanceTest.Data.Model_MoreShapes.Triangle

    Color: Green
    EFInheritanceTest.Data.Model_MoreShapes.Circle

    Color: Red
    EFInheritanceTest.Data.Model_MoreShapes.Square

    *** Query 2:
    *** 637 lines of generated SQL, 23 selects, 2 joins, 12 unions
    *** Generated by EF in 192 ms, Executed by SQL Server in 187 ms

    Query results for the 1 shape:
    Color: Black
    EFInheritanceTest.Data.Model_MoreShapes.Triangle

    *** Query 3:
    *** 9 lines of generated SQL, 1 selects, 2 joins, 1 unions
    *** Generated by EF in 69 ms, Executed by SQL Server in 31 ms

    Query results for the 1 square:
    Color: Red
    EFInheritanceTest.Data.Model_MoreShapes.Square

  7. Posted March 24, 2010 at 11:02 am | Permalink

    Thanks for posting those results.

    *** Query 1:
    *** 658 lines of generated SQL, 24 selects, 2 joins, 12 unions
    *** Generated by EF in 655 ms, Executed by SQL Server in 396 ms

    That’s just insane. This is exactly what I’m talking about. That’s for a simple base class, with 12 simple subclasses, and a query that only returns 3 records.

    How many subclasses do you have in your EF project that’s using inheritance? Have you ran into any problems yet?

  8. Posted August 24, 2010 at 6:20 pm | Permalink

    Hi Sam!

    Thanks for sharing your analysis…

    Definitely EF needs an urgent tuning if it wants to be considered a serious player in real world applications. That’s why in our team we call it: “lentity” which in Spanish resembles the word “lento” (slow).

    Thanks,

    Julio César.

  9. Posted August 24, 2010 at 10:52 pm | Permalink

    Hi Sam,

    More results about the EF test in my box:

    Intel Core Duo 2 Ghz
    3GB RAM
    Windows XP Professional SP2
    SQL Server 2005 SP2 Developer Edition

    ——————————————————————————–

    With the simple model (1 base class, 3 subclasses)….
    ——————————————————————————–

    *** Query 1:
    *** 64 lines of generated SQL, 6 selects, 2 joins, 3 unions
    *** Generated by EF in 418 ms, Executed by SQL Server in 149 ms

    Query results for all shapes: 3 shapes returned.
    Color: Black
    EFInheritanceTest.Data.Model.Triangle

    Color: Green
    EFInheritanceTest.Data.Model.Circle

    Color: Red
    EFInheritanceTest.Data.Model.Square

    *** Query 2:
    *** 52 lines of generated SQL, 5 selects, 2 joins, 3 unions
    *** Generated by EF in 11 ms, Executed by SQL Server in 12 ms

    Query results for the 1 shape:
    Color: Red
    EFInheritanceTest.Data.Model.Square

    *** Query 3:
    *** 9 lines of generated SQL, 1 selects, 2 joins, 1 unions
    *** Generated by EF in 58 ms, Executed by SQL Server in 36 ms

    Query results for the 1 square:
    Color: Red
    EFInheritanceTest.Data.Model.Square

    ——————————————————————————–

    With the larger model (1 base class, 12 subclasses)….
    ——————————————————————————–

    *** Query 1:
    *** 658 lines of generated SQL, 24 selects, 2 joins, 12 unions
    *** Generated by EF in 846 ms, Executed by SQL Server in 510 ms

    Query results for all shapes: 3 shapes returned.
    Color: Black
    EFInheritanceTest.Data.Model_MoreShapes.Triangle

    Color: Green
    EFInheritanceTest.Data.Model_MoreShapes.Circle

    Color: Red
    EFInheritanceTest.Data.Model_MoreShapes.Square

    *** Query 2:
    *** 637 lines of generated SQL, 23 selects, 2 joins, 12 unions
    *** Generated by EF in 324 ms, Executed by SQL Server in 324 ms

    Query results for the 1 shape:
    Color: Black
    EFInheritanceTest.Data.Model_MoreShapes.Triangle

    *** Query 3:
    *** 9 lines of generated SQL, 1 selects, 2 joins, 1 unions
    *** Generated by EF in 79 ms, Executed by SQL Server in 37 ms

    Query results for the 1 square:
    Color: Red
    EFInheritanceTest.Data.Model_MoreShapes.Square

    Press any key to continue . . .

  10. Ilia
    Posted September 10, 2010 at 9:00 am | Permalink

    It’s look like MS deleted bug from MS Connect…
    No bug report – no problem)

  11. Amtiskaw
    Posted November 30, 2010 at 11:16 am | Permalink

    FYI, I have submitted a feedback item to the Improvement Program on Connect that provides details of this incident and suggests preventing Microsoft employees from deleting valid items without providing a reason, based on published guidelines on what is considered unacceptable in bug reports. Find it, and vote on it, here: https://connect.microsoft.com/Connect/feedback/details/625776/prevent-microsoft-from-deleting-feedback-items-without-explanation

    Of course, they might well delete or ignore this item as well, but it’s worth a shot, right?

  12. Jo
    Posted March 2, 2011 at 8:59 pm | Permalink

    LOL…Had the same problem but with database projects and generation of the delta scripts via the schema comparison.

    They acknowledged there was a problem than closed.

  13. Posted November 3, 2011 at 9:04 pm | Permalink

    This is madness. I fully agree with your post, and in fact I always pull it up when I want to convince people *not* to use the EF at the moment? It’s just not very good at some really important things.

    So, it seems Microsoft have deleted your Connect issue? Really? If that’s true, it’s madness. I’ve heard no news about this problem being addressed, which is quite disconcerting. An update from yourself on what you now know would be much appreciated.

  14. Posted November 3, 2011 at 9:05 pm | Permalink

    Okay, so I just noticed your Update #3. Not sure how I missed that. Very odd indeed. Have you heard any news about it since?

  15. Aaron
    Posted November 24, 2011 at 2:19 pm | Permalink

    I have tested this scenario with 19 derived tables using TPT in Entity Framework 4.2 The time to compile and execute the query went from 8,457ms to 3,334ms.

  16. Leo
    Posted December 18, 2013 at 6:01 am | Permalink

    Sam, searching for some guidance on TPT with EF I ended up on your blog. I think it would be nice to have an update on how this works with the latest (6.0.2) version of EF.

  17. matt
    Posted June 9, 2014 at 7:20 am | Permalink

    @Leo – me too. I am keen to see if there are any workarounds in EF 6. My query for the baseclass (I have 3 subclasses) seems to be very large as per posts here. Thought this would be resolved by now?

  18. Posted September 23, 2014 at 9:13 pm | Permalink

    Please let me know if you’re looking for a article writer for
    your site. You have some really great articles and I feel I would be a good asset.
    If you ever want to take some of the load off, I’d love to write some content for your blog in exchange for a link back to mine.
    Please send me an email if interested. Kudos!

2 Trackbacks

  1. [...] Performance By Sam | Published: September 10, 2010 Just a quick post today. In January 2010, I wrote about the problem of Table-Per-Type (TPT) inheritance performance with Entity Framework (both v1 and v4). I had opened up a bug on Microsoft Connect in March, and MS finally got around to [...]

  2. [...] Hey, what’s with the “new”? Well, if you are using any sort of entity inheritance, your child entities will use the new keyword to mask any references from the parent class. This will come up if you are using either of the inheritance schemes available in EF4, which are table-per-hierarchy (TPH) and table-per-type (TPT). If you are using TPT, be aware that there are significant performance considerations. [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>