Bingfeng Mei | 16 Jul 13:05

Question about doloop_end pattern

Hello,
I tried to use doloop_end pattern to reduce loop overhead for our target
processor, which features a dedicated loop instruction.  Somehow even a
simple loop just cannot pass the test of doloop_condition_get, which
requires following canonical pattern.

  /* The canonical doloop pattern we expect has one of the following
     forms:

     1)  (parallel [(set (pc) (if_then_else (condition)
	  			            (label_ref (label))
				            (pc)))
	             (set (reg) (plus (reg) (const_int -1)))
	             (additional clobbers and uses)])

     The branch must be the first entry of the parallel (also required
     by jump.c), and the second entry of the parallel must be a set of
     the loop counter register.  Some targets (IA-64) wrap the set of
     the loop counter in an if_then_else too.

     2)  (set (reg) (plus (reg) (const_int -1))
         (set (pc) (if_then_else (reg != 0)
	                         (label_ref (label))
			         (pc))).  */

Here is a simple function I used, it should meet all doloop optimization
requirements.
void Unroll( short s, int * restrict b_inout, int *restrict out, int N)
{
        int i;
(Continue reading)

Ian Lance Taylor | 16 Jul 16:00

Re: Question about doloop_end pattern

"Bingfeng Mei" <bmei <at> broadcom.com> writes:

> I tried to use doloop_end pattern to reduce loop overhead for our target
> processor, which features a dedicated loop instruction.  Somehow even a
> simple loop just cannot pass the test of doloop_condition_get, which
> requires following canonical pattern.

You are looking at this at the wrong level.  That comment is
describing what your doloop insn must look like in the MD file.  It
does not mean that the RTL for the loop must look like that in order
for the doloop optimization to apply.  Look at existing targets which
support doloops for examples of how to write the insn.

Ian

Joern Rennecke | 16 Jul 17:04

Re: Question about doloop_end pattern

I can confirm that the doloop optimization is applied for ARC600 / ARC700
in a compiler based on gcc 4.4.0 20080606 (experimental) .
OTOH, it doesn't use any of the PRE_INC, POST_INC, PRE_MODIFY or
POST_MODIFY addressing modes.

        lp .L__GCC__LP2
        .align 4
.L2:
        add r0,r1,r4
        ld r3,[r0]
        add r0,r6,r4
        add r4,r4,4
        add r2,r5,r3
        st r2,[r0]

.L__GCC__LP2: ; loop end

Bingfeng Mei | 16 Jul 17:17

RE: Question about doloop_end pattern

Thanks. I just checked the latest mainline version and couldn't find
doloop related stuff in config/arc/arc.md.  Will you merge your code
into mainline?  Did you also get modulo scheduling work properly? 

Bingfeng

> -----Original Message-----
> From: Joern Rennecke [mailto:joernr <at> arc.com] 
> Sent: 16 July 2008 16:05
> To: gcc <at> gcc.gnu.org
> Cc: Ian Lance Taylor; Bingfeng Mei
> Subject: Re: Question about doloop_end pattern
> 
> I can confirm that the doloop optimization is applied for 
> ARC600 / ARC700
> in a compiler based on gcc 4.4.0 20080606 (experimental) .
> OTOH, it doesn't use any of the PRE_INC, POST_INC, PRE_MODIFY or
> POST_MODIFY addressing modes.
> 
>         lp .L__GCC__LP2
>         .align 4
> .L2:
>         add r0,r1,r4
>         ld r3,[r0]
>         add r0,r6,r4
>         add r4,r4,4
>         add r2,r5,r3
>         st r2,[r0]
>         
> .L__GCC__LP2: ; loop end
(Continue reading)

Ramana Radhakrishnan | 16 Jul 20:17

Re: Question about doloop_end pattern

Hi Bingfeng,

> Hello,
> I tried to use doloop_end pattern to reduce loop overhead for our target
> processor, which features a dedicated loop instruction.  Somehow even a
> simple loop just cannot pass the test of doloop_condition_get, which
> requires following canonical pattern.

I checked this on our private port of GCC .  This is based off 4.3
branch which is off what we are working on right now .  We do use the
doloop pattern to generate out these cases in our port and I can
confirm that for our case we generate the following bit of code. Our
tree does have a few other tweaks that we maintain that we'd like to
contribute once the copyright assignments are in place.

Unroll:
       c2c     $c5,$c2
       i2cs    $c4,63
.L2:
       ldw     $c2,($c5)+=1
       add     $c2,$c1,$c2
       stw     ($c3)+=1,$c2
       brinzdec        $c4,.L2
       brz     $zero,$link

You probably want to see the mt backend for some example as to how to
do it . It looks similar to how we do it in ours.

cheers
Ramana
(Continue reading)

Bingfeng Mei | 17 Jul 10:27

RE: Question about doloop_end pattern

Thanks. I was looking at bfin. MT's implementation looks similar but
simpler.  

> -----Original Message-----
> From: Ramana Radhakrishnan [mailto:ramana.r <at> gmail.com] 
> Sent: 16 July 2008 19:17
> To: Bingfeng Mei
> Cc: gcc <at> gcc.gnu.org
> Subject: Re: Question about doloop_end pattern
> 
> Hi Bingfeng,
> 
> > Hello,
> > I tried to use doloop_end pattern to reduce loop overhead 
> for our target
> > processor, which features a dedicated loop instruction.  
> Somehow even a
> > simple loop just cannot pass the test of doloop_condition_get, which
> > requires following canonical pattern.
> 
> 
> I checked this on our private port of GCC .  This is based off 4.3
> branch which is off what we are working on right now .  We do use the
> doloop pattern to generate out these cases in our port and I can
> confirm that for our case we generate the following bit of code. Our
> tree does have a few other tweaks that we maintain that we'd like to
> contribute once the copyright assignments are in place.
> 
> Unroll:
>        c2c     $c5,$c2
(Continue reading)


Gmane