Gabriel Sassone | 29 Sep 2010 11:31
Picon

Data oriented programming

Hi everyone,
    just wondering if anyone of you has some experiences with "Data-oriented programming" stuff.
To push the limits of the PS3 you must use SPUs very well, and to do this I found that the "data-oriented programming" is really useful and let you create a very scalable framework.
There are some papers around (the most know is probably the one from Sony):
 
 
I've experienced in small scale (on PS3) and found it really intriguing and performance wise.
Also it is possible to use this mentality to improve performances on X360 and PC, almost all platforms!
 
What are your experiences with that?
 
Thanks!
   Gabriel
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Tony Albrecht | 30 Sep 2010 02:47
Picon

Re: Data oriented programming

Hi Gabriel,

     A lot of the work I do is related to optimisation of code or at least producing code that has to perform well, so I use a lot of the concepts mentioned in that paper extensively (and I should too, I wrote it :) ). The key thing to consider is your data - how much of it you access and in what order and where it goes. I find that just reordering data into contiguous chunks has a positive effect, regardless of the state of the code using it. Once the data is fixed, massaging the code that uses it into a better state ( SIMD becomes easier on contig data) improves performance even more.

Taking an existing code base and retrofitting DOD concepts into it can create some pretty nasty code, but designing with data in mind right from the start can produce maintainable, readable, extensible code that runs fast.

-Tony 

On 29 September 2010 19:01, Gabriel Sassone <gsassone7 <at> gmail.com> wrote:
Hi everyone,
    just wondering if anyone of you has some experiences with "Data-oriented programming" stuff.
To push the limits of the PS3 you must use SPUs very well, and to do this I found that the "data-oriented programming" is really useful and let you create a very scalable framework.
There are some papers around (the most know is probably the one from Sony):
 
 
I've experienced in small scale (on PS3) and found it really intriguing and performance wise.
Also it is possible to use this mentality to improve performances on X360 and PC, almost all platforms!
 
What are your experiences with that?
 
Thanks!
   Gabriel

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com


_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Mike Acton | 30 Sep 2010 03:47
Picon
Gravatar

Re: Data oriented programming

I'll add my two cents to what Tony said: I personally think the most important thing to keep in mind is that fundamentally the goal of any program (at any level of the hierarchy - from application to function to single instruction) is to transform data from one form into another. Understanding that data being transformed is the key to doing that well. The reality is that it's always a mix of mental models, but the real *problem* in practice is that not only is there no real understanding of the data being transformed, but it's ignored *completely* - as though "data" is some abstract concept that can be ignored.


Mike.

On Wed, Sep 29, 2010 at 5:47 PM, Tony Albrecht <tony.albrecht0 <at> gmail.com> wrote:
Hi Gabriel,
     A lot of the work I do is related to optimisation of code or at least producing code that has to perform well, so I use a lot of the concepts mentioned in that paper extensively (and I should too, I wrote it :) ). The key thing to consider is your data - how much of it you access and in what order and where it goes. I find that just reordering data into contiguous chunks has a positive effect, regardless of the state of the code using it. Once the data is fixed, massaging the code that uses it into a better state ( SIMD becomes easier on contig data) improves performance even more.

Taking an existing code base and retrofitting DOD concepts into it can create some pretty nasty code, but designing with data in mind right from the start can produce maintainable, readable, extensible code that runs fast.

-Tony 

On 29 September 2010 19:01, Gabriel Sassone <gsassone7 <at> gmail.com> wrote:
Hi everyone,
    just wondering if anyone of you has some experiences with "Data-oriented programming" stuff.
To push the limits of the PS3 you must use SPUs very well, and to do this I found that the "data-oriented programming" is really useful and let you create a very scalable framework.
There are some papers around (the most know is probably the one from Sony):
 
 
I've experienced in small scale (on PS3) and found it really intriguing and performance wise.
Also it is possible to use this mentality to improve performances on X360 and PC, almost all platforms!
 
What are your experiences with that?
 
Thanks!
   Gabriel

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com



_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com


_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Mat Noguchi | 30 Sep 2010 18:49

Re: Data oriented programming

Incidentally, Rico Mariani posted this earlier this week:

 

http://blogs.msdn.com/b/ricom/archive/2010/09/27/less-loosely-coupled-than-meets-the-eye.aspx

 

I find it particularly funny since he’s chief something in Visual Studio and has significant experience with managed code, but he has a similar perspective on performance even from the managed code perspective.

 

MSN

 

From: sweng-gamedev-bounces <at> lists.midnightryder.com [mailto:sweng-gamedev-bounces <at> lists.midnightryder.com] On Behalf Of Mike Acton
Sent: Wednesday, September 29, 2010 6:47 PM
To: sweng-gamedev <at> midnightryder.com
Subject: Re: [Sweng-Gamedev] Data oriented programming

 

I'll add my two cents to what Tony said: I personally think the most important thing to keep in mind is that fundamentally the goal of any program (at any level of the hierarchy - from application to function to single instruction) is to transform data from one form into another. Understanding that data being transformed is the key to doing that well. The reality is that it's always a mix of mental models, but the real *problem* in practice is that not only is there no real understanding of the data being transformed, but it's ignored *completely* - as though "data" is some abstract concept that can be ignored.

 

Mike.

On Wed, Sep 29, 2010 at 5:47 PM, Tony Albrecht <tony.albrecht0 <at> gmail.com> wrote:

Hi Gabriel,

     A lot of the work I do is related to optimisation of code or at least producing code that has to perform well, so I use a lot of the concepts mentioned in that paper extensively (and I should too, I wrote it :) ). The key thing to consider is your data - how much of it you access and in what order and where it goes. I find that just reordering data into contiguous chunks has a positive effect, regardless of the state of the code using it. Once the data is fixed, massaging the code that uses it into a better state ( SIMD becomes easier on contig data) improves performance even more.

 

Taking an existing code base and retrofitting DOD concepts into it can create some pretty nasty code, but designing with data in mind right from the start can produce maintainable, readable, extensible code that runs fast.

 

-Tony 

On 29 September 2010 19:01, Gabriel Sassone <gsassone7 <at> gmail.com> wrote:

Hi everyone,

    just wondering if anyone of you has some experiences with "Data-oriented programming" stuff.

To push the limits of the PS3 you must use SPUs very well, and to do this I found that the "data-oriented programming" is really useful and let you create a very scalable framework.

There are some papers around (the most know is probably the one from Sony):

 

 

I've experienced in small scale (on PS3) and found it really intriguing and performance wise.

Also it is possible to use this mentality to improve performances on X360 and PC, almost all platforms!

 

What are your experiences with that?

 

Thanks!

   Gabriel

 

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

 


_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

 

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Richard | 30 Sep 2010 21:32
Favicon
Gravatar

Re: Data oriented programming


In article <D7F4D9E414140644BF099886C28C6636350A0B198D <at> bngexchange01.bungie.bng.local>,
    Mat Noguchi <matthewn <at> bungie.com> writes:

> Incidentally, Rico Mariani posted this earlier this week:
> 
> http://blogs.msdn.com/b/ricom/archive/2010/09/27/less-loosely-coupled-than-me
ets-the-eye.aspx

This is why I've been recommending that performance tests be part of
your CI build.  As soon as you have two components that each use 2/3
of the L2 cache, you'll know from your performance tests.

FitNesse is a nice way, but certainly not the only way, to orchestrate
these end-to-end tests.  <http://www.fitnesse.org>
--

-- 
"The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
 <http://legalizeadulthood.wordpress.com/the-direct3d-graphics-pipeline/>

      Legalize Adulthood! <http://legalizeadulthood.wordpress.com>
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Fabian Giesen | 30 Sep 2010 22:22
Picon

Re: Data oriented programming

On 9/30/2010 12:32 PM, Richard wrote:
>
> In article<D7F4D9E414140644BF099886C28C6636350A0B198D <at> bngexchange01.bungie.bng.local>,
>      Mat Noguchi<matthewn <at> bungie.com>  writes:
>
>> Incidentally, Rico Mariani posted this earlier this week:
>>
>> http://blogs.msdn.com/b/ricom/archive/2010/09/27/less-loosely-coupled-than-me
> ets-the-eye.aspx
>
> This is why I've been recommending that performance tests be part of
> your CI build.  As soon as you have two components that each use 2/3
> of the L2 cache, you'll know from your performance tests.

Your cache usage patterns (usually) depend on architectural decisions, 
not implementation details. Any sort of integration testing (even if 
you're doing CI) is too late a stage to detect architectural problems. 
By that point you're already in trouble.

The whole point is that no development methodology will do the thinking 
for you. You simply *will not* get good results if you break your system 
down into little pieces and forget about the big picture. Details matter.

That's not arguing for Waterfall. Unexpected issues will bite you in 
unexpected ways, and you will have to iterate your design. But if you 
want high performance, you absolutely, positively have to start with a 
clear picture of the data flows you're going to have. "Module X and Y 
both use 2/3 of the L2 cache" is a completely useless value. How do they 
use it? What are they using it for? Cache usage isn't additive. If they 
both work on largely the same data, two modules that "use 2/3 of the L2 
cache" taken together might "use 3/4 of the L2 cache". Are they both one 
large loop going through a linear dataset, or are they randomly 
traversing linked data structures? Are they writing to large output 
buffers that don't get read for a while? (In which case you can use 
streaming stores/evictions to kick them out of the cache as soon as 
you're done). And so on.

None of that needs a detailed implementation to answer. But all of it 
needs a good idea of the layout (and size) of your data structures, and 
a knowledge of how they're used. "Module A magically transforms widgets 
to gadgets" isn't good enough. If you don't know enough about the 
behavior of your program to make reasonable back-of-the-envelope 
calculations about how much memory you're touching (and where), you 
don't know enough to make it run fast, period.

-Fabian
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Richard | 30 Sep 2010 22:36
Favicon
Gravatar

Re: Data oriented programming


In article <4CA4F186.4000701 <at> gmx.net>,
    Fabian Giesen <ryg <at> gmx.net> writes:

> On 9/30/2010 12:32 PM, Richard wrote:
> >
> > In article<D7F4D9E414140644BF099886C28C6636350A0B198D <at> bngexchange01.bungie.
bng.local>,
> >      Mat Noguchi<matthewn <at> bungie.com>  writes:
> >
> >> Incidentally, Rico Mariani posted this earlier this week:
> >>
> >> http://blogs.msdn.com/b/ricom/archive/2010/09/27/less-loosely-coupled-than
-me
> > ets-the-eye.aspx
> >
> > This is why I've been recommending that performance tests be part of
> > your CI build.  As soon as you have two components that each use 2/3
> > of the L2 cache, you'll know from your performance tests.
> 
> Your cache usage patterns (usually) depend on architectural decisions, 
> not implementation details. Any sort of integration testing (even if 
> you're doing CI) is too late a stage to detect architectural problems. 
> By that point you're already in trouble.

Not that I disagree, but if you're in trouble isn't it best to know as
soon as possible?  That's all I'm asserting.
--

-- 
"The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
 <http://legalizeadulthood.wordpress.com/the-direct3d-graphics-pipeline/>

      Legalize Adulthood! <http://legalizeadulthood.wordpress.com>
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Fabian Giesen | 30 Sep 2010 22:58
Picon

Re: Data oriented programming

On 9/30/2010 1:36 PM, Richard wrote:
>> Your cache usage patterns (usually) depend on architectural decisions,
>> not implementation details. Any sort of integration testing (even if
>> you're doing CI) is too late a stage to detect architectural problems.
>> By that point you're already in trouble.
>
> Not that I disagree, but if you're in trouble isn't it best to know as
> soon as possible?  That's all I'm asserting.

Yes, it is best to know as soon as possible, and integration testing is 
most emphatically not "as soon as possible" for architectural decisions. 
Back-of-the-envelope calculations during the design stage are. Consider 
them unit tests for your design if you're hell-bent on viewing 
everything as an agile progress.

-Fabian
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Richard | 30 Sep 2010 23:06
Favicon
Gravatar

Re: Data oriented programming


In article <4CA4FA04.6010603 <at> gmx.net>,
    Fabian Giesen <ryg <at> gmx.net> writes:

> > Not that I disagree, but if you're in trouble isn't it best to know as
> > soon as possible?  That's all I'm asserting.
> 
> Yes, it is best to know as soon as possible, and integration testing is 
> most emphatically not "as soon as possible" for architectural decisions. 
> Back-of-the-envelope calculations during the design stage are.

... but of course those calculations are done by error-prone humans
and can be done wrong.

And just because you decide on an architecture in the beginning
doesn't mean that other people on your team are adding things
consistent with that architecture.

There's nothing wrong with planning things up front.

CI tests ensure that reality is consistent with that plan.

Designs are propositions and postulates, not measured data.
--

-- 
"The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
 <http://legalizeadulthood.wordpress.com/the-direct3d-graphics-pipeline/>

      Legalize Adulthood! <http://legalizeadulthood.wordpress.com>
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Mat Noguchi | 30 Sep 2010 23:13

Re: Data oriented programming

I think a great way to synthesize agile vs. planning is that if you are going to plan, you should also plan on
validating that plan. And if you are going to be agile, realize that the fundamental property of agile is a
learning process. Which means experimentation and reflection, not just experimentation.

MSN
-----Original Message-----
From: sweng-gamedev-bounces <at> lists.midnightryder.com
[mailto:sweng-gamedev-bounces <at> lists.midnightryder.com] On Behalf Of Richard
Sent: Thursday, September 30, 2010 2:07 PM
To: sweng-gamedev <at> midnightryder.com
Subject: Re: [Sweng-Gamedev] Data oriented programming

In article <4CA4FA04.6010603 <at> gmx.net>,
    Fabian Giesen <ryg <at> gmx.net> writes:

> > Not that I disagree, but if you're in trouble isn't it best to know as
> > soon as possible?  That's all I'm asserting.
> 
> Yes, it is best to know as soon as possible, and integration testing is 
> most emphatically not "as soon as possible" for architectural decisions. 
> Back-of-the-envelope calculations during the design stage are.

... but of course those calculations are done by error-prone humans
and can be done wrong.

And just because you decide on an architecture in the beginning
doesn't mean that other people on your team are adding things
consistent with that architecture.

There's nothing wrong with planning things up front.

CI tests ensure that reality is consistent with that plan.

Designs are propositions and postulates, not measured data.
--

-- 
"The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
 <http://legalizeadulthood.wordpress.com/the-direct3d-graphics-pipeline/>

      Legalize Adulthood! <http://legalizeadulthood.wordpress.com>
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Jon Watte | 6 Oct 2010 00:03
Picon
Gravatar

Re: Data oriented programming

There are times when back-of-the-envelope calculations turn out to be wrong.

Or, perhaps more commonly, when implementation bugs cause systems to not behave the way you planned them.
CI builds catching performance problems as soon as they are checked into the code base seems quite preferrable to customers and reviewers catching bugs when you ship :-)
I don't see why you think adding performance tests to a CI tester is wrong? We have it, and have used it, and it's caught things for a long time. Even things that we planned well, but ended up having either unexpected side effects, or implementation problems.

Sincerely,

jw


--
Americans might object: there is no way we would sacrifice our living standards for the benefit of people in the rest of the world. Nevertheless, whether we get there willingly or not, we shall soon have lower consumption rates, because our present rates are unsustainable.



On Thu, Sep 30, 2010 at 1:58 PM, Fabian Giesen <ryg <at> gmx.net> wrote:
On 9/30/2010 1:36 PM, Richard wrote:
Your cache usage patterns (usually) depend on architectural decisions,
not implementation details. Any sort of integration testing (even if
you're doing CI) is too late a stage to detect architectural problems.
By that point you're already in trouble.

Not that I disagree, but if you're in trouble isn't it best to know as
soon as possible?  That's all I'm asserting.

Yes, it is best to know as soon as possible, and integration testing is most emphatically not "as soon as possible" for architectural decisions. Back-of-the-envelope calculations during the design stage are. Consider them unit tests for your design if you're hell-bent on viewing everything as an agile progress.

-Fabian

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Fabian Giesen | 6 Oct 2010 00:33
Picon

Re: Data oriented programming

On 10/5/2010 3:03 PM, Jon Watte wrote:
> There are times when back-of-the-envelope calculations turn out to be wrong.
> Or, perhaps more commonly, when implementation bugs cause systems to not
> behave the way you planned them.
> CI builds catching performance problems as soon as they are checked into
> the code base seems quite preferrable to customers and reviewers
> catching bugs when you ship :-)
> I don't see why you think adding performance tests to a CI tester is
> wrong? We have it, and have used it, and it's caught things for a long
> time. Even things that we planned well, but ended up having either
> unexpected side effects, or implementation problems.

I didn't mean to suggest that you shouldn't test performance during 
testing, just that it shouldn't be your "first line of defense". 
Obviously you can't fix problems before you notice them, but a lot of 
these problems are entirely avoidable if you think things through early 
enough.

-Fabian
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Gregory Junker | 6 Oct 2010 05:48

Re: Data oriented programming

> 
> I didn't mean to suggest that you shouldn't test performance during
> testing, just that it shouldn't be your "first line of defense".
> Obviously you can't fix problems before you notice them, but a lot of
> these problems are entirely avoidable if you think things through early
> enough.

Which can easily lead to "analysis paralysis"...

Greg 

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Tony Albrecht | 6 Oct 2010 06:12
Picon

Re: Data oriented programming

So you are recommending that you shouldn't think about the performance impact of your code before you write it?


I think you should. If you are working on code which is likely to have a performance impact on a system, then you should think carefully about the layout of data and code. Smart decisions at the start will save you a lot of time in the long run. I've worked on systems that are incredibly difficult to optimise due to uniformly slow code where a week spent fiddling around during the design of the system would have saved months down the track.

-Tony

On 6 October 2010 14:18, Gregory Junker <gjunker <at> dayark.com> wrote:
>
> I didn't mean to suggest that you shouldn't test performance during
> testing, just that it shouldn't be your "first line of defense".
> Obviously you can't fix problems before you notice them, but a lot of
> these problems are entirely avoidable if you think things through early
> enough.

Which can easily lead to "analysis paralysis"...

Greg

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Richard Fabian | 6 Oct 2010 11:52
Picon
Gravatar

Re: Data oriented programming

I'm with Tony on this one: Too many times "just write it and we'll optimise it later" has come to bite.


So many times it's been "just add a system to do this", and then three years later, we're not using it the way it was originally intended due to performance issues. For example, scripting languages that get too slow, get compiled to code, then get optimised further by doing trickery, then some random guy walks up and says, "so why not do this stuff in code anyway" and gets bashed for pointing out the painful but obvious.

Don't trap yourself in thinking that something that seems easy to use, can ever be made fast enough to USE. If you wait until your performance slot (say one year before release), then you're in for an almighty headache as you have to do awful things like change the way in which you even approach certain problems. Take some visibility algorithms for example, how many different ones are there? Portals, PVS, octree, BSP, blah blah blah..... don't think they're all there because someone thought "ooh, i'lll try doing it this way." No, they did it because in different circumstances, different algorithms do better than others, either due to layout of your levels, or layout of the code using it, or simply the access patterns into the structure. For anyone who's actually done a complete conversion of visibilty system from one to another, changing data structures and access techniques throughout a codebase: I feel your pain.

There are many different ways to lay out your data, some are performant, some are easy to manipulate... Not many are both.

Apply natural selection: If you don't care about performance from day 1, then manipulability wins over performance. Thus ends the tale.


On 6 October 2010 05:12, Tony Albrecht <tony.albrecht0 <at> gmail.com> wrote:
So you are recommending that you shouldn't think about the performance impact of your code before you write it?

I think you should. If you are working on code which is likely to have a performance impact on a system, then you should think carefully about the layout of data and code. Smart decisions at the start will save you a lot of time in the long run. I've worked on systems that are incredibly difficult to optimise due to uniformly slow code where a week spent fiddling around during the design of the system would have saved months down the track.

-Tony


On 6 October 2010 14:18, Gregory Junker <gjunker <at> dayark.com> wrote:
>
> I didn't mean to suggest that you shouldn't test performance during
> testing, just that it shouldn't be your "first line of defense".
> Obviously you can't fix problems before you notice them, but a lot of
> these problems are entirely avoidable if you think things through early
> enough.

Which can easily lead to "analysis paralysis"...

Greg


_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com




--
fabs();
Just because the world is full of people that think just like you, doesn't mean the other ones can't be right.
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Mat Noguchi | 6 Oct 2010 18:53

Re: Data oriented programming

I think you overestimate anyone’s ability to predict the actual difficulties of shipping and underestimate the ability of programmers to bash a system into submission. Unless you are talking about the mean and not possibilities.

 

MSN

From: sweng-gamedev-bounces <at> lists.midnightryder.com [mailto:sweng-gamedev-bounces <at> lists.midnightryder.com] On Behalf Of Richard Fabian
Sent: Wednesday, October 06, 2010 2:53 AM
To: sweng-gamedev <at> midnightryder.com
Subject: Re: [Sweng-Gamedev] Data oriented programming

 

I'm with Tony on this one: Too many times "just write it and we'll optimise it later" has come to bite.

 

So many times it's been "just add a system to do this", and then three years later, we're not using it the way it was originally intended due to performance issues. For example, scripting languages that get too slow, get compiled to code, then get optimised further by doing trickery, then some random guy walks up and says, "so why not do this stuff in code anyway" and gets bashed for pointing out the painful but obvious.

 

Don't trap yourself in thinking that something that seems easy to use, can ever be made fast enough to USE. If you wait until your performance slot (say one year before release), then you're in for an almighty headache as you have to do awful things like change the way in which you even approach certain problems. Take some visibility algorithms for example, how many different ones are there? Portals, PVS, octree, BSP, blah blah blah..... don't think they're all there because someone thought "ooh, i'lll try doing it this way." No, they did it because in different circumstances, different algorithms do better than others, either due to layout of your levels, or layout of the code using it, or simply the access patterns into the structure. For anyone who's actually done a complete conversion of visibilty system from one to another, changing data structures and access techniques throughout a codebase: I feel your pain.

 

There are many different ways to lay out your data, some are performant, some are easy to manipulate... Not many are both.

 

Apply natural selection: If you don't care about performance from day 1, then manipulability wins over performance. Thus ends the tale.

 

 

On 6 October 2010 05:12, Tony Albrecht <tony.albrecht0 <at> gmail.com> wrote:

So you are recommending that you shouldn't think about the performance impact of your code before you write it?

 

I think you should. If you are working on code which is likely to have a performance impact on a system, then you should think carefully about the layout of data and code. Smart decisions at the start will save you a lot of time in the long run. I've worked on systems that are incredibly difficult to optimise due to uniformly slow code where a week spent fiddling around during the design of the system would have saved months down the track.

 

-Tony

 

On 6 October 2010 14:18, Gregory Junker <gjunker <at> dayark.com> wrote:

>
> I didn't mean to suggest that you shouldn't test performance during
> testing, just that it shouldn't be your "first line of defense".
> Obviously you can't fix problems before you notice them, but a lot of
> these problems are entirely avoidable if you think things through early
> enough.

Which can easily lead to "analysis paralysis"...

Greg

 


_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com




--
fabs();
Just because the world is full of people that think just like you, doesn't mean the other ones can't be right.

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Fabian Giesen | 6 Oct 2010 20:53
Picon

Re: Data oriented programming

On 06.10.2010 09:53, Mat Noguchi wrote:
> I think you overestimate anyone’s ability to predict the actual
> difficulties of shipping and underestimate the ability of programmers to
> bash a system into submission. Unless you are talking about the mean and
> not possibilities.

Why do so many people on this list insist on turning this into a 
dichotomy? It's not. I completely agree that no amount of planning ahead 
will prevent you from having to change important parts of your codebase 
late in the project because your assumptions turned out to be wrong. I 
also agree trying to get everything perfect the first time round is a 
fool's errand. (And I doubt anyone with software development experience 
disagrees).

But I'm somewhat baffled that my suggestion of, in effect, "think before 
you type" is actually being met with resistance. Just because I can't 
fully predict the performance implications of a system in advance 
doesn't mean it's a waste of my time to think about it at all. And, in 
my experience anyway, *not* thinking about it in advance is a waste of 
mine and other people's time some months down the road.

Same thing with the testing. Again, it's not a dichotomy. Just because I 
think that testing is too late in the process to easily fix performance 
problems doesn't mean that performance testing is worthless, it just 
makes it a bad choice for your first line of defense.

-Fabian
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Mat Noguchi | 6 Oct 2010 21:02

Re: Data oriented programming

I was responding specifically to this:

> Apply natural selection: If you don't care about performance from day 1, 
> then manipulability wins over performance. Thus ends the tale.

Which was stated by Richard Fabian, not you.

MSN
-----Original Message-----
From: sweng-gamedev-bounces <at> lists.midnightryder.com
[mailto:sweng-gamedev-bounces <at> lists.midnightryder.com] On Behalf Of Fabian Giesen
Sent: Wednesday, October 06, 2010 11:53 AM
To: sweng-gamedev <at> lists.midnightryder.com
Subject: Re: [Sweng-Gamedev] Data oriented programming

On 06.10.2010 09:53, Mat Noguchi wrote:
> I think you overestimate anyone's ability to predict the actual
> difficulties of shipping and underestimate the ability of programmers to
> bash a system into submission. Unless you are talking about the mean and
> not possibilities.

Why do so many people on this list insist on turning this into a 
dichotomy? It's not. I completely agree that no amount of planning ahead 
will prevent you from having to change important parts of your codebase 
late in the project because your assumptions turned out to be wrong. I 
also agree trying to get everything perfect the first time round is a 
fool's errand. (And I doubt anyone with software development experience 
disagrees).

But I'm somewhat baffled that my suggestion of, in effect, "think before 
you type" is actually being met with resistance. Just because I can't 
fully predict the performance implications of a system in advance 
doesn't mean it's a waste of my time to think about it at all. And, in 
my experience anyway, *not* thinking about it in advance is a waste of 
mine and other people's time some months down the road.

Same thing with the testing. Again, it's not a dichotomy. Just because I 
think that testing is too late in the process to easily fix performance 
problems doesn't mean that performance testing is worthless, it just 
makes it a bad choice for your first line of defense.

-Fabian
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Richard | 7 Oct 2010 00:40
Favicon
Gravatar

Re: Data oriented programming


In article <4CACC58F.6070201 <at> gmx.net>,
    Fabian Giesen <ryg <at> gmx.net> writes:

> Why do so many people on this list insist on turning this into a 
> dichotomy?  [...]

Ironically when I read your initial response to my suggestion of
having CI performance tests, this was my first thought.

Why does it have to be either/or, why can't it be both?

Why can't we think ahead but validate our thinking with CI testing?

> But I'm somewhat baffled that my suggestion of, in effect, "think before 
> you type" is actually being met with resistance.

I don't perceive anyone as actually having expressed resistance to the
idea of thinking ahead.

I don't know if you feel this way or not, but I commonly see people
expressing the idea that agile development is incompatible with
thinking ahead or that "true" agile development somehow prohibits
thinking ahead.  Nothing could be further from the truth.  Maybe they
get this idea from the agile manifesto's statement that it values
"responding to change over following a plan".  It doesn't say "do not
plan"; the agile manifesto was created in response to an environment
where people were drowning in specifications (i.e. plans) and yet
still building the wrong thing based on customer feedback.

In a gaming studio, "responding to change over following a plan" would
mean to me things like adjusting game play based on test play groups,
even if your original plan/architecture/whatever had something
different and it means ditching parts or all of your original
plan/architecture/whatever.

Maybe a game like "The Daedalus Encounter" probably wouldn't have been
such a piece of shit if they had responded to change instead of following
their plan.
--

-- 
"The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
 <http://legalizeadulthood.wordpress.com/the-direct3d-graphics-pipeline/>

      Legalize Adulthood! <http://legalizeadulthood.wordpress.com>
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Richard | 7 Oct 2010 00:41
Favicon
Gravatar

Re: Data oriented programming


In article <4CACC58F.6070201 <at> gmx.net>,
    Fabian Giesen <ryg <at> gmx.net> writes:

> Why do so many people on this list insist on turning this into a 
> dichotomy?  [...]

Ironically when I read your initial response to my suggestion of
having CI performance tests, this was my first thought.

Why does it have to be either/or, why can't it be both?

Why can't we think ahead but validate our thinking with CI testing?

> But I'm somewhat baffled that my suggestion of, in effect, "think before 
> you type" is actually being met with resistance.

I don't perceive anyone as actually having expressed resistance to the
idea of thinking ahead.

I don't know if you feel this way or not, but I commonly see people
expressing the idea that agile development is incompatible with
thinking ahead or that "true" agile development somehow prohibits
thinking ahead.  Nothing could be further from the truth.  Maybe they
get this idea from the agile manifesto's statement that it values
"responding to change over following a plan".  It doesn't say "do not
plan"; the agile manifesto was created in response to an environment
where people were drowning in specifications (i.e. plans) and yet
still building the wrong thing based on customer feedback.

In a gaming studio, "responding to change over following a plan" would
mean to me things like adjusting game play based on test play groups,
even if your original plan/architecture/whatever had something
different and it means ditching parts or all of your original
plan/architecture/whatever.

Maybe a game like "The Daedalus Encounter" probably wouldn't have been
such a piece of shit if they had responded to change instead of following
their plan.
--

-- 
"The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
 <http://legalizeadulthood.wordpress.com/the-direct3d-graphics-pipeline/>

      Legalize Adulthood! <http://legalizeadulthood.wordpress.com>
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Richard | 7 Oct 2010 00:42
Favicon
Gravatar

Re: Data oriented programming

My apologies for the duplicate send.
--

-- 
"The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
 <http://legalizeadulthood.wordpress.com/the-direct3d-graphics-pipeline/>

      Legalize Adulthood! <http://legalizeadulthood.wordpress.com>
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Richard Fabian | 6 Oct 2010 22:44
Picon
Gravatar

Re: Data oriented programming

I'm not sure I actually understand what you've written there (especially the bit about "mean and not possibilities",) but I think my point is orthogonal to shipping issues. Normally, a game has either good performance or bad, and that doesn't really affect whether or not it ships. Normally it only affects what it is shipped with.


Bashing a system into submission is something I have done a number of times, but sometimes we've had to go after smaller fish to bash as the real large fish is just too hard to make good. If you don't consider the difficulty of future optimisations, then you run the risk of having to redo part of the project.

To me, it sounds like you're happy ripping systems out to replace them, and that would make sense if you're working on the same kind of code base game after game, things change, but everything kinda stays the same. That's great, you can throw away whole ways of working and introduce new stuff and see how it works, but when you're doing multiple projects in the same studio without much overlapping tech or at least without much overlapping requirements, your estimated schedules are wild and inaccurate because each project is like you've never done a project like it before. Putting performance estimations off and relying on late in the game optimisation will get you a game, but not one that is great. Something will suffer.

One last point, performance is one of those things that fuels it's own fire. Like asset build optimisations or iteration time reduction. Every time you make one of these things better, it makes the whole studio run faster.


On 6 October 2010 17:53, Mat Noguchi <matthewn <at> bungie.com> wrote:

I think you overestimate anyone’s ability to predict the actual difficulties of shipping and underestimate the ability of programmers to bash a system into submission. Unless you are talking about the mean and not possibilities.

 

MSN



--
fabs();
Just because the world is full of people that think just like you, doesn't mean the other ones can't be right.
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Mat Noguchi | 7 Oct 2010 00:33

Re: Data oriented programming

I think my statement was terse and subtle enough to allow for pretty much any interpretation (apparently mostly negative). So let’s expand:

 

I interpreted your post as “you need to design your system with <x> in mind up front.” If that’s not the case then feel free to ignore my initial reply.

 

Although based on your reply, I don’t think my retort is even necessary, other than to say it’s not that easy to understand the performance impact of what you do; it’s much more obvious to understand performance from the perspective of what you shouldn’t do. It’s a very subtle distinction, and doesn’t really clash with anything you said.

 

[And I would also argue that simply “planning early on for performance” isn’t really sufficient unless the “plan” is: don’t ever write code that will make performance crappy.]

 

MSN

From: sweng-gamedev-bounces <at> lists.midnightryder.com [mailto:sweng-gamedev-bounces <at> lists.midnightryder.com] On Behalf Of Richard Fabian
Sent: Wednesday, October 06, 2010 1:45 PM
To: sweng-gamedev <at> midnightryder.com
Subject: Re: [Sweng-Gamedev] Data oriented programming

 

I'm not sure I actually understand what you've written there (especially the bit about "mean and not possibilities",) but I think my point is orthogonal to shipping issues. Normally, a game has either good performance or bad, and that doesn't really affect whether or not it ships. Normally it only affects what it is shipped with.

 

Bashing a system into submission is something I have done a number of times, but sometimes we've had to go after smaller fish to bash as the real large fish is just too hard to make good. If you don't consider the difficulty of future optimisations, then you run the risk of having to redo part of the project.

 

To me, it sounds like you're happy ripping systems out to replace them, and that would make sense if you're working on the same kind of code base game after game, things change, but everything kinda stays the same. That's great, you can throw away whole ways of working and introduce new stuff and see how it works, but when you're doing multiple projects in the same studio without much overlapping tech or at least without much overlapping requirements, your estimated schedules are wild and inaccurate because each project is like you've never done a project like it before. Putting performance estimations off and relying on late in the game optimisation will get you a game, but not one that is great. Something will suffer.

 

One last point, performance is one of those things that fuels it's own fire. Like asset build optimisations or iteration time reduction. Every time you make one of these things better, it makes the whole studio run faster.

 

 

On 6 October 2010 17:53, Mat Noguchi <matthewn <at> bungie.com> wrote:

I think you overestimate anyone’s ability to predict the actual difficulties of shipping and underestimate the ability of programmers to bash a system into submission. Unless you are talking about the mean and not possibilities.

 

MSN

 


--
fabs();
Just because the world is full of people that think just like you, doesn't mean the other ones can't be right.

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Jon Watte | 10 Oct 2010 22:39
Picon
Gravatar

Re: Data oriented programming

There are a few differing points of view here.

Clearly, you can't super-optimize all code you write up front, because you don't know that that's going to be the part that matters.

Clearly, *some* code will almost always matter -- the core of a CPU skinning loop, say, or broad-phase collision detection. Will the performance of a script interpreter matter? If 95% of the heavy lifting is done in the rendering and simulation subsystems anyway, then probably not. (Not to mention that you *should* be using Lua or Python or Mono or some other already-debugged scripting facility :-)

You can't always know which will end up mattering, and you can't assume that everything will matter, or you'll end up wasting a lot of expensive programmer time. Thus, you take a good stab at guessing where it's worth it, and move on. A CI performance tester (which I think is what this thread is about?) is a good safety net for when you get it wrong.

Then there's the question of "bashing it into shape" as a phase of shipping. If what you're shipping is a game, on physical (or near-physical) media, then spending a few months to make it great, once you know that the design of the game is solid, makes a lot of sense.
If what you're shipping is a platform or a service, you don't have that opportunity.
I'm working at a place where we synthesize a very dynamic web site, a semi-real-time game server back-end, and a end-user-installable 3D client application. We ship 50 times a day. This allows us to react to changes in the market and tune for customers with incredible speed! If we were to stop and not ship anything new for six months, we'd probably be dead.
Thus, we take a guess at performance up front, and then let CI performance tests, as well as measurements collected from actual customer installs, tell us what areas we need to optimize.

Sincerely,

jw

--
Americans might object: there is no way we would sacrifice our living standards for the benefit of people in the rest of the world. Nevertheless, whether we get there willingly or not, we shall soon have lower consumption rates, because our present rates are unsustainable.



On Wed, Oct 6, 2010 at 2:52 AM, Richard Fabian <raspo1 <at> gmail.com> wrote:
I'm with Tony on this one: Too many times "just write it and we'll optimise it later" has come to bite.

So many times it's been "just add a system to do this", and then three years later, we're not using it the way it was originally intended due to performance issues. For example, scripting languages that get too slow, get compiled to code, then get optimised further by doing trickery, then some random guy walks up and says, "so why not do this stuff in code anyway" and gets bashed for pointing out the painful but obvious.

Don't trap yourself in thinking that something that seems easy to use, can ever be made fast enough to USE. If you wait until your performance slot (say one year before release), then you're in for an almighty headache as you have to do awful things like change the way in which you even approach certain problems. Take some visibility algorithms for example, how many different ones are there? Portals, PVS, octree, BSP, blah blah blah..... don't think they're all there because someone thought "ooh, i'lll try doing it this way." No, they did it because in different circumstances, different algorithms do better than others, either due to layout of your levels, or layout of the code using it, or simply the access patterns into the structure. For anyone who's actually done a complete conversion of visibilty system from one to another, changing data structures and access techniques throughout a codebase: I feel your pain.

There are many different ways to lay out your data, some are performant, some are easy to manipulate... Not many are both.

Apply natural selection: If you don't care about performance from day 1, then manipulability wins over performance. Thus ends the tale.


On 6 October 2010 05:12, Tony Albrecht <tony.albrecht0 <at> gmail.com> wrote:
So you are recommending that you shouldn't think about the performance impact of your code before you write it?

I think you should. If you are working on code which is likely to have a performance impact on a system, then you should think carefully about the layout of data and code. Smart decisions at the start will save you a lot of time in the long run. I've worked on systems that are incredibly difficult to optimise due to uniformly slow code where a week spent fiddling around during the design of the system would have saved months down the track.

-Tony


On 6 October 2010 14:18, Gregory Junker <gjunker <at> dayark.com> wrote:
>
> I didn't mean to suggest that you shouldn't test performance during
> testing, just that it shouldn't be your "first line of defense".
> Obviously you can't fix problems before you notice them, but a lot of
> these problems are entirely avoidable if you think things through early
> enough.

Which can easily lead to "analysis paralysis"...

Greg


_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com




--
fabs();
Just because the world is full of people that think just like you, doesn't mean the other ones can't be right.

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com


_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Veikko Eeva | 11 Oct 2010 20:40
Picon
Picon
Favicon

Re: Data oriented programming


Jon Watte wrote:
> There are a few differing points of view here.
> Clearly, you can't super-optimize all code you write up front, because
> you don't know that that's going to be the part that matters.
>[...]

This isn't about games, but I couldn't resist on providing a link to a 
Velocity 2010 conference presentation notes by Theo Schlossnagle of 
Scalable Internet Architectures fame about, well, Scalable
Internet Architectures: 
http://assets.en.oreilly.com/1/event/44/Scalable%20Internet%20Architectures%20Presentation%202.pdf

After the intial hand-waving, the presentation is much about 
understanding the problem in the context of performance and scalability 
trade-offs.

Cheers,
Veikko E.

P.s.
Velocity conference 2010 Slides/Notes
http://www.royans.net/arch/all-velocity-conference-2010-slidesnotes/

Continuous deployments may not be for everyone: Culture
http://www.royans.net/arch/continuous-deployments-may-not-be-for-everyone-culture/
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

OvermindDL1 | 16 Oct 2010 04:11
Picon

Re: Data oriented programming

On Mon, Oct 11, 2010 at 12:40 PM, Veikko Eeva <veikko.eeva <at> iki.fi> wrote:
>
> Jon Watte wrote:
>>
>> There are a few differing points of view here.
>> Clearly, you can't super-optimize all code you write up front, because
>> you don't know that that's going to be the part that matters.
>> [...]
>
> This isn't about games, but I couldn't resist on providing a link to a
> Velocity 2010 conference presentation notes by Theo Schlossnagle of Scalable
> Internet Architectures fame about, well, Scalable
> Internet Architectures:
> http://assets.en.oreilly.com/1/event/44/Scalable%20Internet%20Architectures%20Presentation%202.pdf
>
> After the intial hand-waving, the presentation is much about understanding
> the problem in the context of performance and scalability trade-offs.
>
>
> Cheers,
> Veikko E.
>
>
> P.s.
> Velocity conference 2010 Slides/Notes
> http://www.royans.net/arch/all-velocity-conference-2010-slidesnotes/
>
> Continuous deployments may not be for everyone: Culture
> http://www.royans.net/arch/continuous-deployments-may-not-be-for-everyone-culture/

Continuing this line of questioning, this is actually my use-case.

I have a multi-dimensional array, say typedef myType[SIZE][SIZE][SIZE]
myTypeArr;, the SIZE is set right now but I can change it to whatever
allows best access.  Right now to fit in the L2 cache properly on most
CPU's I have it SIZE set to 8, which as I recall lets it be about a
24k block.

myType itself, in essence, is a discriminated union, you can think of
it as a tag and a data, like this:
struct myType {
  size_unsigned short tag;
  void *data;
}
(In reality is reserves enough internal space to hold the data of the
'tag' for most types, others have a void* at the end of the allocated
area for additional data, where the most accessed is in the main area,
not through the pointer, I think this was 31bytes in size, maybe 63,
would have to look, but that can be changed as well to a completely
different way if necessary.)

The tag either references a built-in type, or it 0 it references a
scripted type (which I usually later plan to make into internal types
for speed reason, but the scripted type will remain for third-party
extensions).  There are two 'major' passes which are called the most
often, each of which access two different types of data inside the
data element, so I could easily create a function to work over all of
these in quick turn.  However, there is a lot more sporadic access,
for example, say tag type 5 needs to have a certain function run over
it when a certain event happens, or more simply, every one whole
second.  It is not just tag type 5, but a lot of tag types that will
need to perform actions every once in a while, and usually on *all* of
a certain tag type within bounds in the overall structure.  I figured
that the brute-force method would be to have a pointer for each
possible tag type, and as a tag is allocated just have the main
structure keep the 'coords' of it with a pointer to that index of the
tag type in another allocated structure, something like this:
struct myTypeArr {
  myType arr[SIZE][SIZE][SIZE]; // where mytype is a tag and an index into:

  void **internalTags[TAGCOUNT];
}
and access it like:
myTypeArr a;
a.internalTags[a.arr[3][6][1].tag][a.arr[3][6][1].idx].data // or
.location or whatever
Or something nasty like that.  problem is, that only uses memory for
an item when it exists and is marked zero otherwise, but this will
still eat a *LOT* more memory then otherwise due to the multiple void*
indirections, and of course each pointer is going to be using 4/8
bytes of memory each, although since the tag size of 'mostly' known
(very few have variable sizes), that could simplify memory allocation
somewhat.

The overall structure as-is (although it can be pretty easily changed,
the back-end is pretty well abstracted out from the front) is an
octree, the octree can currently handle a max size of 32bits x 32bits
x 32bits, although reduced a little based on the above SIZE constant
(if size is 8 then the 32 bits is reduced to 29 bits or so).
Basically, I need to index into a *massive*, not always loaded array
that is 32bits x 32bits x 32bits in size *just* for indexing, not
including data, and the data per cell may be variable size, but in
general nearby chunks tend to be the same.  So it is abstracted out as
an octree, if all blocks in a level of the octree are all the same
then the lower levels are deleted from memory and it is given as just
a single block representing all blocks in the upper level, so on and
so forth.

The way the data is generated comes from a generator function.  Upon
first access of a data block, its 'chunk' is loaded, and the chunk is
loaded by giving the location and dimension of the chunk to a
generator function that fills it up with data (dynamically generated,
scripted, who knows, the function is a black-box).  When the data
chunk is no longer needed for a time period then it is unloaded from
memory by being serialized out to disk (whether by just writing a file
with a special filename based on location and block size, or being
written to a DB, or, how it is done now, being written to a
distributed filesystem using Sector/Sphere, all subject to change
based on if a better method is found).  Some chunks may almost always
be loaded into memory due to rapid access on them, potentially growing
large to to a large number of chunks being loaded.  Currently I am
writing it into Sector/Sphere so I can distribute it, but thus far I
have only distributed the file reading/writing of chunks and not of
processing.

I am guessing that I could distribute the processing and let each
Sector/Sphere slave work on entire chunks at a time and just do a
giant switch on the tag and work on each block inside the chunks, a
couple problems with that though, the primary two being that blocks
can affect other nearby blocks (including nearby ones in nearby
chunks), could probably work around this by either overlapping chunks
and having 'out of range' chunks be read-only, but still affect the
nearby internal ones to the chunk, that would duplicate processing
though, and I would have to make sure that data access was only
affected by chunks of a very limited range while also ensuring that
they only propagate outwards and not inwards, or have all affects
propogate outwards in another pass to handle such things, thus slowing
it down by causing another access pass, although that could be
mitigated by just running it concurrently with the next 'tick' pass
inside it, but would could potentially double memory usage (which I
guess is easier to access then faster CPU usage...).

So, I want to minimize latency in the tick processing, including
minimizing reading from the drive, should I try to figure out a way to
maximize concurrent processing on this 6-core server, or perhaps
should I go ahead and scale out to Sector/Sphere to let slave
computers do a lot of the processing.

Hmm, an idea, I could do all of the processing of the nearby nodes to
the players on the 6-core server itself, and do the further nodes on
the slaves, I do not need it to be deterministic (although it would be
*VERY* nice if it were)...

I could just use Sector/Sphere for storage and create multiple client
server to work on the data in tandom, say in specific in-game
geographical areas, and have clients connect to the server that
handles the area closest to them, perhaps be connected the the
multiple nearest servers at the same time for quick access...

This might be the best bet as it will not be handling just 'one' world
either, hmm...

I still need to figure out how to optimize the primary server work
though, is the best method just to work on a block by block basis
while switching based on the tag, or should that try to be optimized
somehow to minimize branches in the functions?  Perhaps just a master
array list and go over those?  But I do not want to update every block
in every chunk in every tick.  For example, most things will only need
to update near every tick in only the chunks that are near players,
further away ones can update slower if necessary.  Optimally
everything would update every tick, but that will no doubt be
infeasible when it gets to a larger dataset.  Thoughts?  It seems like
a problem that should be very simple to use such a data-oriented
pattern, except that there are going to be a rather large amount of
tag types (a short should handle it well, but it is not altogether
impossible for me to need to increase it to a uint32 later), each of
which may operate differently.

Hmm, another idea, most blocks will require no internal state if I
split their small amount of stated into new block types, but that will
quickly balloon the block type count, and there will still be a
comparatively small amount of blocks that do require state, very few
with rather large state.
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Jon Watte | 17 Oct 2010 17:36
Picon
Gravatar

Re: Data oriented programming

tl;dr:


Either, you walk the data in memory order, and switch on type, or you group the data in type order, and do all types separately.

In the first case, you get branch mis-predictions, which can be expensive, and you also get higher I-cache pressure and even D-cache pressure because you have to keep a bunch of different "subsystems" (type handlers?) in memory.

In the second case, you have to map xyz->object (and object->xyz) in some way other than the three-dimensional array, such as a sparse array implemented as a hash table. That will cause more cache misses when locating the item for any particular element, but it will let you walk elements of the same type using linear memory access, and have less I and D cache pressure and better branch prediction.

Which one is faster depends on whether you do mostly "update all of type X" or do mostly "find element by xyz" operations per frame. You'll simply have to come up with realistic workloads and profile it both ways.

Sincerely,

jw


--
Americans might object: there is no way we would sacrifice our living standards for the benefit of people in the rest of the world. Nevertheless, whether we get there willingly or not, we shall soon have lower consumption rates, because our present rates are unsustainable.



On Fri, Oct 15, 2010 at 7:11 PM, OvermindDL1 <overminddl1 <at> gmail.com> wrote:
On Mon, Oct 11, 2010 at 12:40 PM, Veikko Eeva <veikko.eeva <at> iki.fi> wrote:
>
> Jon Watte wrote:
>>
>> There are a few differing points of view here.
>> Clearly, you can't super-optimize all code you write up front, because
>> you don't know that that's going to be the part that matters.
>> [...]
>
> This isn't about games, but I couldn't resist on providing a link to a
> Velocity 2010 conference presentation notes by Theo Schlossnagle of Scalable
> Internet Architectures fame about, well, Scalable
> Internet Architectures:
> http://assets.en.oreilly.com/1/event/44/Scalable%20Internet%20Architectures%20Presentation%202.pdf
>
> After the intial hand-waving, the presentation is much about understanding
> the problem in the context of performance and scalability trade-offs.
>
>
> Cheers,
> Veikko E.
>
>
> P.s.
> Velocity conference 2010 Slides/Notes
> http://www.royans.net/arch/all-velocity-conference-2010-slidesnotes/
>
> Continuous deployments may not be for everyone: Culture
> http://www.royans.net/arch/continuous-deployments-may-not-be-for-everyone-culture/


Continuing this line of questioning, this is actually my use-case.

I have a multi-dimensional array, say typedef myType[SIZE][SIZE][SIZE]
myTypeArr;, the SIZE is set right now but I can change it to whatever
allows best access.  Right now to fit in the L2 cache properly on most
CPU's I have it SIZE set to 8, which as I recall lets it be about a
24k block.

myType itself, in essence, is a discriminated union, you can think of
it as a tag and a data, like this:
struct myType {
 size_unsigned short tag;
 void *data;
}
(In reality is reserves enough internal space to hold the data of the
'tag' for most types, others have a void* at the end of the allocated
area for additional data, where the most accessed is in the main area,
not through the pointer, I think this was 31bytes in size, maybe 63,
would have to look, but that can be changed as well to a completely
different way if necessary.)

The tag either references a built-in type, or it 0 it references a
scripted type (which I usually later plan to make into internal types
for speed reason, but the scripted type will remain for third-party
extensions).  There are two 'major' passes which are called the most
often, each of which access two different types of data inside the
data element, so I could easily create a function to work over all of
these in quick turn.  However, there is a lot more sporadic access,
for example, say tag type 5 needs to have a certain function run over
it when a certain event happens, or more simply, every one whole
second.  It is not just tag type 5, but a lot of tag types that will
need to perform actions every once in a while, and usually on *all* of
a certain tag type within bounds in the overall structure.  I figured
that the brute-force method would be to have a pointer for each
possible tag type, and as a tag is allocated just have the main
structure keep the 'coords' of it with a pointer to that index of the
tag type in another allocated structure, something like this:
struct myTypeArr {
 myType arr[SIZE][SIZE][SIZE]; // where mytype is a tag and an index into:

 void **internalTags[TAGCOUNT];
}
and access it like:
myTypeArr a;
a.internalTags[a.arr[3][6][1].tag][a.arr[3][6][1].idx].data // or
.location or whatever
Or something nasty like that.  problem is, that only uses memory for
an item when it exists and is marked zero otherwise, but this will
still eat a *LOT* more memory then otherwise due to the multiple void*
indirections, and of course each pointer is going to be using 4/8
bytes of memory each, although since the tag size of 'mostly' known
(very few have variable sizes), that could simplify memory allocation
somewhat.


The overall structure as-is (although it can be pretty easily changed,
the back-end is pretty well abstracted out from the front) is an
octree, the octree can currently handle a max size of 32bits x 32bits
x 32bits, although reduced a little based on the above SIZE constant
(if size is 8 then the 32 bits is reduced to 29 bits or so).
Basically, I need to index into a *massive*, not always loaded array
that is 32bits x 32bits x 32bits in size *just* for indexing, not
including data, and the data per cell may be variable size, but in
general nearby chunks tend to be the same.  So it is abstracted out as
an octree, if all blocks in a level of the octree are all the same
then the lower levels are deleted from memory and it is given as just
a single block representing all blocks in the upper level, so on and
so forth.

The way the data is generated comes from a generator function.  Upon
first access of a data block, its 'chunk' is loaded, and the chunk is
loaded by giving the location and dimension of the chunk to a
generator function that fills it up with data (dynamically generated,
scripted, who knows, the function is a black-box).  When the data
chunk is no longer needed for a time period then it is unloaded from
memory by being serialized out to disk (whether by just writing a file
with a special filename based on location and block size, or being
written to a DB, or, how it is done now, being written to a
distributed filesystem using Sector/Sphere, all subject to change
based on if a better method is found).  Some chunks may almost always
be loaded into memory due to rapid access on them, potentially growing
large to to a large number of chunks being loaded.  Currently I am
writing it into Sector/Sphere so I can distribute it, but thus far I
have only distributed the file reading/writing of chunks and not of
processing.

I am guessing that I could distribute the processing and let each
Sector/Sphere slave work on entire chunks at a time and just do a
giant switch on the tag and work on each block inside the chunks, a
couple problems with that though, the primary two being that blocks
can affect other nearby blocks (including nearby ones in nearby
chunks), could probably work around this by either overlapping chunks
and having 'out of range' chunks be read-only, but still affect the
nearby internal ones to the chunk, that would duplicate processing
though, and I would have to make sure that data access was only
affected by chunks of a very limited range while also ensuring that
they only propagate outwards and not inwards, or have all affects
propogate outwards in another pass to handle such things, thus slowing
it down by causing another access pass, although that could be
mitigated by just running it concurrently with the next 'tick' pass
inside it, but would could potentially double memory usage (which I
guess is easier to access then faster CPU usage...).

So, I want to minimize latency in the tick processing, including
minimizing reading from the drive, should I try to figure out a way to
maximize concurrent processing on this 6-core server, or perhaps
should I go ahead and scale out to Sector/Sphere to let slave
computers do a lot of the processing.

Hmm, an idea, I could do all of the processing of the nearby nodes to
the players on the 6-core server itself, and do the further nodes on
the slaves, I do not need it to be deterministic (although it would be
*VERY* nice if it were)...

I could just use Sector/Sphere for storage and create multiple client
server to work on the data in tandom, say in specific in-game
geographical areas, and have clients connect to the server that
handles the area closest to them, perhaps be connected the the
multiple nearest servers at the same time for quick access...

This might be the best bet as it will not be handling just 'one' world
either, hmm...

I still need to figure out how to optimize the primary server work
though, is the best method just to work on a block by block basis
while switching based on the tag, or should that try to be optimized
somehow to minimize branches in the functions?  Perhaps just a master
array list and go over those?  But I do not want to update every block
in every chunk in every tick.  For example, most things will only need
to update near every tick in only the chunks that are near players,
further away ones can update slower if necessary.  Optimally
everything would update every tick, but that will no doubt be
infeasible when it gets to a larger dataset.  Thoughts?  It seems like
a problem that should be very simple to use such a data-oriented
pattern, except that there are going to be a rather large amount of
tag types (a short should handle it well, but it is not altogether
impossible for me to need to increase it to a uint32 later), each of
which may operate differently.

Hmm, another idea, most blocks will require no internal state if I
split their small amount of stated into new block types, but that will
quickly balloon the block type count, and there will still be a
comparatively small amount of blocks that do require state, very few
with rather large state.

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
OvermindDL1 | 18 Oct 2010 01:45
Picon

Re: Data oriented programming

On Sun, Oct 17, 2010 at 9:36 AM, Jon Watte <jwatte <at> gmail.com> wrote:
> tl;dr:
> Either, you walk the data in memory order, and switch on type, or you group
> the data in type order, and do all types separately.
> In the first case, you get branch mis-predictions, which can be expensive,
> and you also get higher I-cache pressure and even D-cache pressure because
> you have to keep a bunch of different "subsystems" (type handlers?) in
> memory.
> In the second case, you have to map xyz->object (and object->xyz) in some
> way other than the three-dimensional array, such as a sparse array
> implemented as a hash table. That will cause more cache misses when locating
> the item for any particular element, but it will let you walk elements of
> the same type using linear memory access, and have less I and D cache
> pressure and better branch prediction.
> Which one is faster depends on whether you do mostly "update all of type X"
> or do mostly "find element by xyz" operations per frame. You'll simply have
> to come up with realistic workloads and profile it both ways.
> Sincerely,

That is what I figured, problem is I am going to be doing a lot of
operations all on single types within a large area, and also going to
be getting everything in range to send to the client often as well as
their view area changes and on how blocks can affect other nearby
blocks...  Basically create both systems I guess and benchmark, how
fun...
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Jon Watte | 19 Oct 2010 22:36
Picon
Gravatar

Re: Data oriented programming

Basically create both systems I guess and benchmark, how
fun.

Or create one, and use it until it becomes a blocking problem, and then try to benchmark the other to see if it will be any better.
No project has infinite time and resources, and pretending that they do is directly hurtful to the success of the project.

If this is the biggest risk in your entire project, then creating both systems and measuring them may make sense. If this is just a small risk, or a perfectionist wanting to avoid wasting 5% of the CPU, then just implement the simplest thing that could possibly work, write tests that prove that it's functionally correct, and don't re-visit the code until BOTH conditions are true:
1) You are not reaching your target frame rate on your target hardware
2) A profiler tells you that this system is the lowest hanging fruit for performance

It's OK to plan up front, and even compare the two approaches. If you know that obviously one of the factors will dominate, planning says that's what you should implement for. If you think it'll be a toss-up, then just pick the quickest one to implement, and go with that.

Sincerely,

jw


--
Americans might object: there is no way we would sacrifice our living standards for the benefit of people in the rest of the world. Nevertheless, whether we get there willingly or not, we shall soon have lower consumption rates, because our present rates are unsustainable.



On Sun, Oct 17, 2010 at 4:45 PM, OvermindDL1 <overminddl1 <at> gmail.com> wrote:
On Sun, Oct 17, 2010 at 9:36 AM, Jon Watte <jwatte <at> gmail.com> wrote:
> tl;dr:
> Either, you walk the data in memory order, and switch on type, or you group
> the data in type order, and do all types separately.
> In the first case, you get branch mis-predictions, which can be expensive,
> and you also get higher I-cache pressure and even D-cache pressure because
> you have to keep a bunch of different "subsystems" (type handlers?) in
> memory.
> In the second case, you have to map xyz->object (and object->xyz) in some
> way other than the three-dimensional array, such as a sparse array
> implemented as a hash table. That will cause more cache misses when locating
> the item for any particular element, but it will let you walk elements of
> the same type using linear memory access, and have less I and D cache
> pressure and better branch prediction.
> Which one is faster depends on whether you do mostly "update all of type X"
> or do mostly "find element by xyz" operations per frame. You'll simply have
> to come up with realistic workloads and profile it both ways.
> Sincerely,

That is what I figured, problem is I am going to be doing a lot of
operations all on single types within a large area, and also going to
be getting everything in range to send to the client often as well as
their view area changes and on how blocks can affect other nearby
blocks...  Basically create both systems I guess and benchmark, how
fun...

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Marc OMorain | 21 Oct 2010 10:56
Favicon

Re: Data oriented programming

don't re-visit the code until BOTH conditions are true:
1) You are not reaching your target frame rate on your target hardware
2) A profiler tells you that this system is the lowest hanging fruit for performance


This advice should be engraved in stone. Very well said, Sir.
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Richard Fabian | 21 Oct 2010 11:22
Picon
Gravatar

Re: Data oriented programming

I'm not sure I agree though. Low hanging fruit is easy, but sometimes it's better to chop the tree down to get all the fruit at once.

Right now, concentrating on the input and output data and then writing just what's necessary to transform the data seems like a silver axe.

On 21 October 2010 09:56, Marc OMorain <marc.omorain <at> kore.net> wrote:
don't re-visit the code until BOTH conditions are true:
1) You are not reaching your target frame rate on your target hardware
2) A profiler tells you that this system is the lowest hanging fruit for performance


This advice should be engraved in stone. Very well said, Sir.

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com




--
fabs();
Just because the world is full of people that think just like you, doesn't mean the other ones can't be right.
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Gregory Junker | 6 Oct 2010 22:48

Re: Data oriented programming


So you are recommending that you shouldn't think about the performance
impact of your code before you write it?

I don't see how you could possibly have read that into my comments.

Greg

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Gregory Junker | 6 Oct 2010 22:51

Re: Data oriented programming

Sorry, thanks to Outlook assuming everyone wants to write HTML in their
emails, this isn't clear; edited for who said what. 

> Tony said: 
> So you are recommending that you shouldn't think about the performance
> impact of your code before you write it?
> 

I said:

I don't see how you could possibly have read that into my comments.

Greg

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Tony Albrecht | 7 Oct 2010 00:47
Picon

Re: Data oriented programming

Fabian wrote:
> I didn't mean to suggest that you shouldn't test performance during testing, just that it shouldn't be 
> your "first line of defense". Obviously you can't fix problems before you notice them, but a lot of these 
> problems are entirely avoidable if you think things through early enough.

Then Gregory wrote:
> Which can easily lead to "analysis paralysis"...

To which I replied:
> So you are recommending that you shouldn't think about the performance
> impact of your code before you write it?

And the Gregory replied:
> I don't see how you could possibly have read that into my comments.

Your terse comment lead me to think that you didn't like the idea of considering performance before implementation as it could lead to analysis paralysis. If you were just making a statement about analysis paralysis then, I agree, yes it can. But the fear of analysis paralysis (analysis paralysis paralysis?) is more likely to have a detrimental long term effect on your code base than the former. 


-Tony




On 7 October 2010 07:21, Gregory Junker <gjunker <at> dayark.com> wrote:
Sorry, thanks to Outlook assuming everyone wants to write HTML in their
emails, this isn't clear; edited for who said what.

> Tony said:
> So you are recommending that you shouldn't think about the performance
> impact of your code before you write it?
>

I said:

I don't see how you could possibly have read that into my comments.

Greg

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Gregory Junker | 7 Oct 2010 04:44

Re: Data oriented programming


From: sweng-gamedev-bounces <at> lists.midnightryder.com
[mailto:sweng-gamedev-bounces <at> lists.midnightryder.com] On Behalf Of Tony
Albrecht
Sent: Wednesday, October 06, 2010 3:47 PM
To: sweng-gamedev <at> midnightryder.com
Subject: Re: [Sweng-Gamedev] Data oriented programming

> Your terse comment lead me to think that you didn't like the 
> idea of considering performance before implementation as it 
> could lead to analysis paralysis. If you were just making a 
> statement about analysis paralysis then, I agree, yes it can. 
> But the fear of analysis paralysis (analysis paralysis paralysis?) 
> is more likely to have a detrimental long term effect on your code 
> base than the former. 

> -Tony

My point is that at some point you have to start writing code. Too often I
see engineers afraid to write a single line because they have not thought
through (and therefore designed for) every single possibility in advance.
Nothing we talk about on this list rises to the same level of coding for
space shuttle or Mars missions, so while considering algorithmic complexity
or not making the obvious performance mistakes during implementation are
Good Things (as has been implied previously), a product needs to ship, and
IMO it really shouldn't go much beyond that before you start writing code. 

The fact is that pretty much everyone here has already done before, whatever
it is they are embarking upon, so experience goes a long way towards
encouraging things that improve performance, even without a ton of thought
up front.

I suppose that wasn't exactly conveyed by a one-liner... 

Greg

_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Gabriel Sassone | 7 Oct 2010 11:54
Picon

Re: Data oriented programming

Personally I think that it's difficult to find a balance between writing code, knowing performances from both a large and small scale, think what you have NOT to do...it's a matter of experience and open mind/study.
I don't like the "coding" paralysis that many software engineering have (me included until some time ago), because having "fear" of writing code because it's not too much clean/with good performance can lead to writing nothing (that is far worst).
Also I don't like the pattern mentality, because many times it's more a matter of context and circumstances/experience in the code you write.
So for me, I started giving me a "decision time", after which I'll start writing code. Then I'm writing down what can be improved, both from the performances that from the interface design, and features: in that way I'm finding a balance between cowboy coding and code paralysis.
 
Just my 50c, even if I'm not experienced as you, I wanted only to hear your experiences with Data Oriented Programming.
 
By the way, thanks anybody for the replies!
 
    Gabriel Sassone
 
 
 
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
OvermindDL1 | 8 Oct 2010 03:11
Picon

Re: Data oriented programming

On Thu, Oct 7, 2010 at 3:54 AM, Gabriel Sassone <gsassone7 <at> gmail.com> wrote:
> /* snip */

Slightly back on to topic, data oriented programming, I know what most
of it means anyway and I do use it in some specific areas, but I am
curious, a common use-case for a game:

Class hierarchy:
  GameObject
    Actor
      Pawn
        PlayerPawn
        Vehicle
          Wheeled
          Tracked
          Flying
      Controller
        PlayerController
        AIController
          MonsterController

I am not certain how to properly get functionality over to a
data-oriented model.  Just for the sake of ease-of-use argument, the
best way might be a component system?  But it seems that no matter how
I think of data usage, sometimes I will need, for example, the
positions with the force descriptions, or I might need positions with
the renderer, or I might need the force descriptions with the input to
each model (and how do you model an input system based on events, such
as you have a list of events and based on what they are they need to
modify the structures of one thing per event, with possible multiple
events per thing, or maybe zero).

I have done quite a bit of looking on Google for any articles or
tutorials that anyone may have done to convert a traditionally
C++/Java-style OO hierarchy into a data-oriented or so setup for
maximal speed, does anyone know of one?
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Phill Djonov | 8 Oct 2010 10:37
Favicon

Re: Data oriented programming

Go with components and limit cross-component communication to clearly
defined phases:

TickAnims();
MoveAnimDataToPhysics();
TickPhysics();
MovePhysicsDataToAnimation();
TouchUpAnimsForThingsPhysicsConstrained();

Etc.

Obviously, a lot of the move phases go away if you can find a
cache-friendly format that works well for most members of a group of
related subsystems, though often the cost of copying data around is
well worth the savings you can achieve when a complex computation is
tuned *just* right.

YMMV.

Phill

On Thu, Oct 7, 2010 at 7:11 PM, OvermindDL1 <overminddl1 <at> gmail.com> wrote:
> On Thu, Oct 7, 2010 at 3:54 AM, Gabriel Sassone <gsassone7 <at> gmail.com> wrote:
>> /* snip */
>
> Slightly back on to topic, data oriented programming, I know what most
> of it means anyway and I do use it in some specific areas, but I am
> curious, a common use-case for a game:
>
> Class hierarchy:
>  GameObject
>    Actor
>      Pawn
>        PlayerPawn
>        Vehicle
>          Wheeled
>          Tracked
>          Flying
>      Controller
>        PlayerController
>        AIController
>          MonsterController
>
> I am not certain how to properly get functionality over to a
> data-oriented model.  Just for the sake of ease-of-use argument, the
> best way might be a component system?  But it seems that no matter how
> I think of data usage, sometimes I will need, for example, the
> positions with the force descriptions, or I might need positions with
> the renderer, or I might need the force descriptions with the input to
> each model (and how do you model an input system based on events, such
> as you have a list of events and based on what they are they need to
> modify the structures of one thing per event, with possible multiple
> events per thing, or maybe zero).
>
> I have done quite a bit of looking on Google for any articles or
> tutorials that anyone may have done to convert a traditionally
> C++/Java-style OO hierarchy into a data-oriented or so setup for
> maximal speed, does anyone know of one?
> _______________________________________________
> Sweng-Gamedev mailing list
> Sweng-Gamedev <at> lists.midnightryder.com
> http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
>
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Warrick Buchanan | 8 Oct 2010 10:49
Picon
Favicon

Re: Data oriented programming

Also I would add to that and say don't be afraid of copying data around - especially when it comes to multi-threaded systems it can be much more preferable than the alternatives.  Obviously there are limits to this though as most things and you should take care about the rate your systems produce and consume data.



> Date: Fri, 8 Oct 2010 02:37:31 -0600
> From: phill <at> beamdog.com
> To: sweng-gamedev <at> midnightryder.com
> Subject: Re: [Sweng-Gamedev] Data oriented programming
>
> Go with components and limit cross-component communication to clearly
> defined phases:
>
> TickAnims();
> MoveAnimDataToPhysics();
> TickPhysics();
> MovePhysicsDataToAnimation();
> TouchUpAnimsForThingsPhysicsConstrained();
>
> Etc.
>
> Obviously, a lot of the move phases go away if you can find a
> cache-friendly format that works well for most members of a group of
> related subsystems, though often the cost of copying data around is
> well worth the savings you can achieve when a complex computation is
> tuned *just* right.
>
> YMMV.
>
> Phill
>
> On Thu, Oct 7, 2010 at 7:11 PM, OvermindDL1 <overminddl1 <at> gmail.com> wrote:
> > On Thu, Oct 7, 2010 at 3:54 AM, Gabriel Sassone <gsassone7 <at> gmail.com> wrote:
> >> /* snip */
> >
> > Slightly back on to topic, data oriented programming, I know what most
> > of it means anyway and I do use it in some specific areas, but I am
> > curious, a common use-case for a game:
> >
> > Class hierarchy:
> >  GameObject
> >    Actor
> >      Pawn
> >        PlayerPawn
> >        Vehicle
> >          Wheeled
> >          Tracked
> >          Flying
> >      Controller
> >        PlayerController
> >        AIController
> >          MonsterController
> >
> > I am not certain how to properly get functionality over to a
> > data-oriented model.  Just for the sake of ease-of-use argument, the
> > best way might be a component system?  But it seems that no matter how
> > I think of data usage, sometimes I will need, for example, the
> > positions with the force descriptions, or I might need positions with
> > the renderer, or I might need the force descriptions with the input to
> > each model (and how do you model an input system based on events, such
> > as you have a list of events and based on what they are they need to
> > modify the structures of one thing per event, with possible multiple
> > events per thing, or maybe zero).
> >
> > I have done quite a bit of looking on Google for any articles or
> > tutorials that anyone may have done to convert a traditionally
> > C++/Java-style OO hierarchy into a data-oriented or so setup for
> > maximal speed, does anyone know of one?
> > _______________________________________________
> > Sweng-Gamedev mailing list
> > Sweng-Gamedev <at> lists.midnightryder.com
> > http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
> >
> _______________________________________________
> Sweng-Gamedev mailing list
> Sweng-Gamedev <at> lists.midnightryder.com
> http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Richard | 8 Oct 2010 15:52
Favicon
Gravatar

Re: Data oriented programming


In article <BAY146-w3774631C7C55D7306C3FBCFB500 <at> phx.gbl>,
    Warrick Buchanan <warrick_buchanan <at> hotmail.com> writes:

> Also I would add to that and say don't be afraid of copying data around
> [...]

The reasoning behind this being that the cache misses during copying is
paid for by the repeated cache hits during subsequent processing?

Does this assume that the data needs to be accessed more than once
during processing?
--

-- 
"The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
 <http://legalizeadulthood.wordpress.com/the-direct3d-graphics-pipeline/>

      Legalize Adulthood! <http://legalizeadulthood.wordpress.com>
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Richard Fabian | 8 Oct 2010 16:03
Picon
Gravatar

Re: Data oriented programming

I'd say that it's because copying has a latency of only one cache miss no matter how much you're copying, as you can start processing the next read straight away as it's not dependent on the arrival of previous data. The "processing" is just doing more of the same load and store, load and store. So you're running at memory bandwidth the whole time rather than waiting on data before doing something.

On 8 October 2010 14:52, Richard <legalize <at> xmission.com> wrote:

In article <BAY146-w3774631C7C55D7306C3FBCFB500 <at> phx.gbl>,
   Warrick Buchanan <warrick_buchanan <at> hotmail.com> writes:

> Also I would add to that and say don't be afraid of copying data around
> [...]

The reasoning behind this being that the cache misses during copying is
paid for by the repeated cache hits during subsequent processing?

Does this assume that the data needs to be accessed more than once
during processing?
--
"The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
 <http://legalizeadulthood.wordpress.com/the-direct3d-graphics-pipeline/>

     Legalize Adulthood! <http://legalizeadulthood.wordpress.com>



--
fabs();
Just because the world is full of people that think just like you, doesn't mean the other ones can't be right.
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
Fabian Giesen | 8 Oct 2010 21:00
Picon

Re: Data oriented programming

On 10/8/2010 6:52 AM, Richard wrote:
>
> In article<BAY146-w3774631C7C55D7306C3FBCFB500 <at> phx.gbl>,
>      Warrick Buchanan<warrick_buchanan <at> hotmail.com>  writes:
>
>> Also I would add to that and say don't be afraid of copying data around
>> [...]
>
> The reasoning behind this being that the cache misses during copying is
> paid for by the repeated cache hits during subsequent processing?
>
> Does this assume that the data needs to be accessed more than once
> during processing?

Copying by itself is not fundamentally more (or less) efficient than any 
other pass that only does sequential reads and writes. If you actually 
have do to it manually on the CPU, then all other things being equal, 
you're better off just writing to your destination(s) directly during 
the preceding pass. There's some caveats - e.g. when accessing 
write-combined memory directly, you need to look closely at your 
generated code and make sure that it really does write sequentially with 
no holes (exact limitations vary by platform; on some processors you can 
write "slightly out of order" within cache lines with no penalty, but 
others really need writes to be at least word-sized and sequential, so 
that's what you should target). Getting a C/C++ compiler to actually do 
this properly often involves liberal use of compiler write barriers 
and/or "volatile" qualifiers. If the sequential writes cause other 
complications (or if you prefer simplicity), it can be better to use a 
small block of cached memory as "staging area" and copy from there to 
write-combined mem.

Of course, if you have a DMA engine that can quickly (and 
asynchronously) move data around memory, the copies really are "free" 
(unless you're hitting memory bandwidth limits), which affects the usage 
patterns. On the PS3, you have this and the SPUs which can't access main 
memory directly, so expect to be copying around data a lot. (On SPUs, 
the usual pattern is to DMA output data from block N-1, work on block N, 
and DMA input data for block N+1 at the same time).

-Fabian
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Phill Djonov | 8 Oct 2010 22:55
Favicon

Re: Data oriented programming

Well, even just transforming:

foreach( object )
{
    //not real function calls, just illustrating the flow of the loop
    //if it helps, imagine them all messy and interleaved :)
    temp;
    FollowPointersAround( object, temp );
    BranchyFixingUpOfData( temp );
    HeavyNumberCrunching( temp );
}

into:

temp[]
for( ... )
    FollowPointersAround( objects[index], temp[index] ); //effectively
what we're referring to as a "copy"
for( ... )
    BranchyFixingUpOfData( temp[index] );
for( ... )
    HeavyNumberCrunching( temp[index] );

is a win.

It's *much* easier to go into each of those separate loops and measure
and optimize than it is the first. Loops are smaller and highly
focused. Logic mispredictions don't stall the math pipeline. Fewer
temporaries are in play, freeing registers, making unrolling
opportunities more obvious. It makes sense to do expensive
pipeline/cache-flushing CPU-mode switches between stages, or to
utilize write-combining, scratch pads, or other special hardware
features. That sort of stuff can *easily* make up for the cost of
pushing the original temp out of registers or the stack, and often
would even if everything Richard and Fabian said about mitigating copy
costs were false.

And, when the profiler screams, or you move to a new platform, you can
likely fix things in isolation without having to butcher "object" and
every other bit of code that touches it.

Phill

On Fri, Oct 8, 2010 at 7:52 AM, Richard <legalize <at> xmission.com> wrote:
>
> In article <BAY146-w3774631C7C55D7306C3FBCFB500 <at> phx.gbl>,
>    Warrick Buchanan <warrick_buchanan <at> hotmail.com> writes:
>
>> Also I would add to that and say don't be afraid of copying data around
>> [...]
>
> The reasoning behind this being that the cache misses during copying is
> paid for by the repeated cache hits during subsequent processing?
>
> Does this assume that the data needs to be accessed more than once
> during processing?
> --
> "The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
>  <http://legalizeadulthood.wordpress.com/the-direct3d-graphics-pipeline/>
>
>      Legalize Adulthood! <http://legalizeadulthood.wordpress.com>
> _______________________________________________
> Sweng-Gamedev mailing list
> Sweng-Gamedev <at> lists.midnightryder.com
> http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
>
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com

Phill Djonov | 8 Oct 2010 10:56
Favicon

Re: Data oriented programming

You also mentioned events, and I didn't really address that.

I avoid those like the plague. In my experience, having subsystems
directly call each other as they work is a recipe for disaster, as the
code you get is highly interdependent and you lose the ability to
reorganize the internals of a system (for performance or otherwise)
because the entire rest of the game is expecting things of it at all
times. Message queues or buffers of intermediate results, on the other
hand, are awesome:

listOfCollisions = BeginPhysicsTick();
DecideOnCollisionResolutions( listOfCollisions ); //updates the list inline
FinishPhysicsTick( listOfCollisions );

gameCollisions = FindCollisionsGameplayCaresAbout( listOfCollisions );
SpawnCollisionEffects( gameCollisions );
ApplyImpactDamage( gameCollisions );

//use listOfCollisions later to optimize what data you send over the network as
//prediction might do a decent job of objects that are simply moving
along uninterrupted

//etc

Phill

On Thu, Oct 7, 2010 at 7:11 PM, OvermindDL1 <overminddl1 <at> gmail.com> wrote:
> On Thu, Oct 7, 2010 at 3:54 AM, Gabriel Sassone <gsassone7 <at> gmail.com> wrote:
>> /* snip */
>
> Slightly back on to topic, data oriented programming, I know what most
> of it means anyway and I do use it in some specific areas, but I am
> curious, a common use-case for a game:
>
> Class hierarchy:
>  GameObject
>    Actor
>      Pawn
>        PlayerPawn
>        Vehicle
>          Wheeled
>          Tracked
>          Flying
>      Controller
>        PlayerController
>        AIController
>          MonsterController
>
> I am not certain how to properly get functionality over to a
> data-oriented model.  Just for the sake of ease-of-use argument, the
> best way might be a component system?  But it seems that no matter how
> I think of data usage, sometimes I will need, for example, the
> positions with the force descriptions, or I might need positions with
> the renderer, or I might need the force descriptions with the input to
> each model (and how do you model an input system based on events, such
> as you have a list of events and based on what they are they need to
> modify the structures of one thing per event, with possible multiple
> events per thing, or maybe zero).
>
> I have done quite a bit of looking on Google for any articles or
> tutorials that anyone may have done to convert a traditionally
> C++/Java-style OO hierarchy into a data-oriented or so setup for
> maximal speed, does anyone know of one?
> _______________________________________________
> Sweng-Gamedev mailing list
> Sweng-Gamedev <at> lists.midnightryder.com
> http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com
>
_______________________________________________
Sweng-Gamedev mailing list
Sweng-Gamedev <at> lists.midnightryder.com
http://lists.midnightryder.com/listinfo.cgi/sweng-gamedev-midnightryder.com


Gmane