[2/4] aarch64: Add a missing sign extension in ff_h264_idct8_add_neon

Message ID 1475227092-15478-2-git-send-email-martin@martin.st
State Committed
Headers show

Commit Message

Martin Storsjö Sept. 30, 2016, 9:18 a.m.
---
 libavcodec/aarch64/h264idct_neon.S | 1 +
 1 file changed, 1 insertion(+)

Comments

Diego Biurrun Oct. 5, 2016, 5:26 p.m. | #1
On Fri, Sep 30, 2016 at 12:18:10PM +0300, Martin Storsjö wrote:
> ---
>  libavcodec/aarch64/h264idct_neon.S | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/libavcodec/aarch64/h264idct_neon.S b/libavcodec/aarch64/h264idct_neon.S
> index 5395e14..6354c03 100644
> --- a/libavcodec/aarch64/h264idct_neon.S
> +++ b/libavcodec/aarch64/h264idct_neon.S
> @@ -264,6 +264,7 @@ endfunc
>  
>  function ff_h264_idct8_add_neon, export=1
>          movi            v19.8H,   #0
> +        sxtw            x2,  w2
>          ld1             {v24.8H, v25.8H}, [x1]
>          st1             {v19.8H},  [x1],   #16
>          st1             {v19.8H},  [x1],   #16

Now I wonder why

ff_h264_idct_add16_neon
ff_h264_idct_add16intra_neon
ff_h264_idct8_add_neon
ff_h264_idct8_add4_neon

do not (seem to) do any sign extension..

Diego
Martin Storsjö Oct. 5, 2016, 7:09 p.m. | #2
On Wed, 5 Oct 2016, Diego Biurrun wrote:

> On Fri, Sep 30, 2016 at 12:18:10PM +0300, Martin Storsjö wrote:
>> ---
>>  libavcodec/aarch64/h264idct_neon.S | 1 +
>>  1 file changed, 1 insertion(+)
>> 
>> diff --git a/libavcodec/aarch64/h264idct_neon.S b/libavcodec/aarch64/h264idct_neon.S
>> index 5395e14..6354c03 100644
>> --- a/libavcodec/aarch64/h264idct_neon.S
>> +++ b/libavcodec/aarch64/h264idct_neon.S
>> @@ -264,6 +264,7 @@ endfunc
>>
>>  function ff_h264_idct8_add_neon, export=1
>>          movi            v19.8H,   #0
>> +        sxtw            x2,  w2
>>          ld1             {v24.8H, v25.8H}, [x1]
>>          st1             {v19.8H},  [x1],   #16
>>          st1             {v19.8H},  [x1],   #16
>
> Now I wonder why
>
> ff_h264_idct_add16_neon

This one doesn't use it by itself, but it calls ff_h264_idct_add_neon or 
ff_h264_idct_dc_add_neon, and passes the stride parameter to them. Do note 
that if you change the stride parameter to ptrdiff_t at a later stage, you 
need to change the registers w3/w9/w2 into x3/x9/x2 (otherwise, if done 
for the called functions but not for this one, you'll break negative 
values).

> ff_h264_idct_add16intra_neon

Ditto

> ff_h264_idct8_add_neon

This one is the bug I'm fixing with this very patch

> ff_h264_idct8_add4_neon

Same as the other ones above, this calls other functions

FWIW, most of these aren't hooked up to checkasm yet either, but the ones 
you pointed out seem fine (or fixed by this patch).

// Martin
Diego Biurrun Oct. 5, 2016, 7:25 p.m. | #3
On Wed, Oct 05, 2016 at 10:09:58PM +0300, Martin Storsjö wrote:
> On Wed, 5 Oct 2016, Diego Biurrun wrote:
> > On Fri, Sep 30, 2016 at 12:18:10PM +0300, Martin Storsjö wrote:
> >> --- a/libavcodec/aarch64/h264idct_neon.S
> >> +++ b/libavcodec/aarch64/h264idct_neon.S
> >> @@ -264,6 +264,7 @@ endfunc
> >>
> >>  function ff_h264_idct8_add_neon, export=1
> >>          movi            v19.8H,   #0
> >> +        sxtw            x2,  w2
> >>          ld1             {v24.8H, v25.8H}, [x1]
> >>          st1             {v19.8H},  [x1],   #16
> >>          st1             {v19.8H},  [x1],   #16
> >
> > Now I wonder why
> >
> > ff_h264_idct_add16_neon
> 
> This one doesn't use it by itself, but it calls ff_h264_idct_add_neon or 
> ff_h264_idct_dc_add_neon, and passes the stride parameter to them. Do note 
> that if you change the stride parameter to ptrdiff_t at a later stage, you 
> need to change the registers w3/w9/w2 into x3/x9/x2 (otherwise, if done 
> for the called functions but not for this one, you'll break negative 
> values).
> 
> > ff_h264_idct_add16intra_neon
> 
> Ditto
> 
> > ff_h264_idct8_add_neon
> 
> This one is the bug I'm fixing with this very patch
> 
> > ff_h264_idct8_add4_neon
> 
> Same as the other ones above, this calls other functions
> 
> FWIW, most of these aren't hooked up to checkasm yet either, but the ones 
> you pointed out seem fine (or fixed by this patch).

probably OK then

Diego

Patch

diff --git a/libavcodec/aarch64/h264idct_neon.S b/libavcodec/aarch64/h264idct_neon.S
index 5395e14..6354c03 100644
--- a/libavcodec/aarch64/h264idct_neon.S
+++ b/libavcodec/aarch64/h264idct_neon.S
@@ -264,6 +264,7 @@  endfunc
 
 function ff_h264_idct8_add_neon, export=1
         movi            v19.8H,   #0
+        sxtw            x2,  w2
         ld1             {v24.8H, v25.8H}, [x1]
         st1             {v19.8H},  [x1],   #16
         st1             {v19.8H},  [x1],   #16