[v3,1/5] replaygain: correctly parse peak values

Message ID 20140403115828.GA14600@kronk
State New
Headers show

Commit Message

Alessandro Ghedini April 3, 2014, 11:58 a.m.
On gio, apr 03, 2014 at 01:49:23 +0200, Anton Khirnov wrote:
> 
> On Thu, 3 Apr 2014 13:13:18 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> > On gio, apr 03, 2014 at 11:09:28 +0200, Alessandro Ghedini wrote:
> > > On Thu, Apr 03, 2014 at 09:18:20AM +0200, Anton Khirnov wrote:
> > > > 
> > > > On Mon, 31 Mar 2014 12:48:57 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> > > > > According to the ReplayGain spec, the peak amplitude may overflow and may result
> > > > > in peak amplitude values greater than 1.0 with psychoacoustically coded audio,
> > > > > such as MP3. Fully compliant decoders must allow peak overflows.
> > > > > 
> > > > > Additionally, having peak values in the 0<->UINT32_MAX scale makes it more
> > > > > difficult for applications to actually use the peak values (e.g. when
> > > > > implementing clipping prevention) since values have to be rescaled down.
> > > > > 
> > > > > This patch corrects the peak parsing by removing the rescaling of the decoded
> > > > > values between 0 and UINT32_MAX and the 1.0 upper limit.
> > > > > ---
> > > > > Merged parse_gain() and parse_peak() into parse_value(), lavu minor version
> > > > > bump, APIchanges update, better peak documentation in replaygain.h
> > > > > 
> > > > >  doc/APIchanges           |  4 ++++
> > > > >  libavformat/replaygain.c | 56 +++++++++++-------------------------------------
> > > > >  libavutil/replaygain.h   |  4 ++--
> > > > >  libavutil/version.h      |  2 +-
> > > > >  4 files changed, 20 insertions(+), 46 deletions(-)
> > > > > 
> > > > > diff --git a/doc/APIchanges b/doc/APIchanges
> > > > > index d800253..e999c7e 100644
> > > > > --- a/doc/APIchanges
> > > > > +++ b/doc/APIchanges
> > > > > @@ -13,6 +13,10 @@ libavutil:     2013-12-xx
> > > > >  
> > > > >  API changes, most recent first:
> > > > >  
> > > > > +2014-03-xx - xxxxxxx - lavu 53.09.0 - replaygain.h
> > > > > +  Full scale for peak values is now 100000 (instead of UINT32_MAX) and values
> > > > > +  may overflow.
> > > > > +
> > > > >  2014-02-xx - xxxxxxx - lavu 53.08.0 - frame.h
> > > > >    Add av_frame_remove_side_data() for removing a single side data
> > > > >    instance from a frame.
> > > > > diff --git a/libavformat/replaygain.c b/libavformat/replaygain.c
> > > > > index cf4dbf8..5e4bccd 100644
> > > > > --- a/libavformat/replaygain.c
> > > > > +++ b/libavformat/replaygain.c
> > > > > @@ -35,19 +35,19 @@
> > > > >  #include "avformat.h"
> > > > >  #include "replaygain.h"
> > > > >  
> > > > > -static int32_t parse_gain(const char *gain)
> > > > > +static int32_t parse_value(const char *value, int32_t min)
> > > > >  {
> > > > >      char *fraction;
> > > > >      int  scale = 10000;
> > > > >      int32_t mb = 0;
> > > > >      int db;
> > > > >  
> > > > > -    if (!gain)
> > > > > -        return INT32_MIN;
> > > > > +    if (!value)
> > > > > +        return min;
> > > > >  
> > > > > -    gain += strspn(gain, " \t");
> > > > > +    value += strspn(value, " \t");
> > > > >  
> > > > > -    db = strtol(gain, &fraction, 0);
> > > > > +    db = strtol(value, &fraction, 0);
> > > > >      if (*fraction++ == '.') {
> > > > >          while (av_isdigit(*fraction) && scale) {
> > > > >              mb += scale * (*fraction - '0');
> > > > > @@ -57,41 +57,11 @@ static int32_t parse_gain(const char *gain)
> > > > >      }
> > > > >  
> > > > >      if (abs(db) > (INT32_MAX - mb) / 100000)
> > > > > -        return INT32_MIN;
> > > > > +        return min;
> > > > >  
> > > > > -    return db * 100000 + FFSIGN(db) * mb;
> > > > > -}
> > > > > -
> > > > > -static uint32_t parse_peak(const uint8_t *peak)
> > > > > -{
> > > > > -    int64_t val = 0;
> > > > > -    int64_t scale = 1;
> > > > > -
> > > > > -    if (!peak)
> > > > > -        return 0;
> > > > > -
> > > > > -    peak += strspn(peak, " \t");
> > > > > -
> > > > > -    if (peak[0] == '1' && peak[1] == '.')
> > > > > -        return UINT32_MAX;
> > > > > -    else if (!(peak[0] == '0' && peak[1] == '.'))
> > > > > -        return 0;
> > > > > -
> > > > > -    peak += 2;
> > > > > -
> > > > > -    while (av_isdigit(*peak)) {
> > > > > -        int digit = *peak - '0';
> > > > > -
> > > > > -        if (scale > INT64_MAX / 10)
> > > > > -            break;
> > > > > -
> > > > > -        val    = 10 * val + digit;
> > > > > -        scale *= 10;
> > > > > -
> > > > > -        peak++;
> > > > > -    }
> > > > > -
> > > > > -    return av_rescale(val, UINT32_MAX, scale);
> > > > > +    return (min == 0) ?
> > > > > +            db * 100000 + mb :
> > > > > +            db * 100000 + FFSIGN(db) * mb;
> > > > 
> > > > Why the check here?
> > > > It will only make a difference if db is negative, in which case the result will
> > > > be borked anyway.
> > > 
> > > FFSIGN is defined as:
> > > 
> > >   #define FFSIGN(a) ((a) > 0 ? 1 : -1)
> > > 
> > > so if db == 0 the result is negative.
> > > 
> > > In fact, now that I think about this, the gain decoding is broken, since values
> > > that start with "0." are always going to be decoded as negative.
> > 
> > Not to mention that there's no such thing as -0, so strtol() always returns
> > non-negative 0 (which is then interpreted by FFSIGN() as negative).
> 
> Hmm, you're right. I wonder why FFSIGN behaves in this way, it looks
> counterintuitive.

Yeah, it caught me off-guard as well.

> > What about using FFSIGN(strtod(value, ...)) to get the sign? (Not sure why
> > you didn't use strtod() in the first place, instead of doing the decoding
> > manually though).
> 
> The point was to get exactly the same output on all architectures.
> That's also the reason the values are fixed point, instead of float.

I guess that the only alternative is to check whether the first character is
'-' (hoping that the strspn() removed enough elements).

Something like (applied on top of 1/5):


Which seems to work fine here.

Cheers

Comments

Anton Khirnov April 3, 2014, 12:18 p.m. | #1
On Thu, 3 Apr 2014 13:58:28 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> On gio, apr 03, 2014 at 01:49:23 +0200, Anton Khirnov wrote:
> > 
> > On Thu, 3 Apr 2014 13:13:18 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> > > On gio, apr 03, 2014 at 11:09:28 +0200, Alessandro Ghedini wrote:
> > > > On Thu, Apr 03, 2014 at 09:18:20AM +0200, Anton Khirnov wrote:
> > > > > 
> > > > > On Mon, 31 Mar 2014 12:48:57 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> > > > > > According to the ReplayGain spec, the peak amplitude may overflow and may result
> > > > > > in peak amplitude values greater than 1.0 with psychoacoustically coded audio,
> > > > > > such as MP3. Fully compliant decoders must allow peak overflows.
> > > > > > 
> > > > > > Additionally, having peak values in the 0<->UINT32_MAX scale makes it more
> > > > > > difficult for applications to actually use the peak values (e.g. when
> > > > > > implementing clipping prevention) since values have to be rescaled down.
> > > > > > 
> > > > > > This patch corrects the peak parsing by removing the rescaling of the decoded
> > > > > > values between 0 and UINT32_MAX and the 1.0 upper limit.
> > > > > > ---
> > > > > > Merged parse_gain() and parse_peak() into parse_value(), lavu minor version
> > > > > > bump, APIchanges update, better peak documentation in replaygain.h
> > > > > > 
> > > > > >  doc/APIchanges           |  4 ++++
> > > > > >  libavformat/replaygain.c | 56 +++++++++++-------------------------------------
> > > > > >  libavutil/replaygain.h   |  4 ++--
> > > > > >  libavutil/version.h      |  2 +-
> > > > > >  4 files changed, 20 insertions(+), 46 deletions(-)
> > > > > > 
> > > > > > diff --git a/doc/APIchanges b/doc/APIchanges
> > > > > > index d800253..e999c7e 100644
> > > > > > --- a/doc/APIchanges
> > > > > > +++ b/doc/APIchanges
> > > > > > @@ -13,6 +13,10 @@ libavutil:     2013-12-xx
> > > > > >  
> > > > > >  API changes, most recent first:
> > > > > >  
> > > > > > +2014-03-xx - xxxxxxx - lavu 53.09.0 - replaygain.h
> > > > > > +  Full scale for peak values is now 100000 (instead of UINT32_MAX) and values
> > > > > > +  may overflow.
> > > > > > +
> > > > > >  2014-02-xx - xxxxxxx - lavu 53.08.0 - frame.h
> > > > > >    Add av_frame_remove_side_data() for removing a single side data
> > > > > >    instance from a frame.
> > > > > > diff --git a/libavformat/replaygain.c b/libavformat/replaygain.c
> > > > > > index cf4dbf8..5e4bccd 100644
> > > > > > --- a/libavformat/replaygain.c
> > > > > > +++ b/libavformat/replaygain.c
> > > > > > @@ -35,19 +35,19 @@
> > > > > >  #include "avformat.h"
> > > > > >  #include "replaygain.h"
> > > > > >  
> > > > > > -static int32_t parse_gain(const char *gain)
> > > > > > +static int32_t parse_value(const char *value, int32_t min)
> > > > > >  {
> > > > > >      char *fraction;
> > > > > >      int  scale = 10000;
> > > > > >      int32_t mb = 0;
> > > > > >      int db;
> > > > > >  
> > > > > > -    if (!gain)
> > > > > > -        return INT32_MIN;
> > > > > > +    if (!value)
> > > > > > +        return min;
> > > > > >  
> > > > > > -    gain += strspn(gain, " \t");
> > > > > > +    value += strspn(value, " \t");
> > > > > >  
> > > > > > -    db = strtol(gain, &fraction, 0);
> > > > > > +    db = strtol(value, &fraction, 0);
> > > > > >      if (*fraction++ == '.') {
> > > > > >          while (av_isdigit(*fraction) && scale) {
> > > > > >              mb += scale * (*fraction - '0');
> > > > > > @@ -57,41 +57,11 @@ static int32_t parse_gain(const char *gain)
> > > > > >      }
> > > > > >  
> > > > > >      if (abs(db) > (INT32_MAX - mb) / 100000)
> > > > > > -        return INT32_MIN;
> > > > > > +        return min;
> > > > > >  
> > > > > > -    return db * 100000 + FFSIGN(db) * mb;
> > > > > > -}
> > > > > > -
> > > > > > -static uint32_t parse_peak(const uint8_t *peak)
> > > > > > -{
> > > > > > -    int64_t val = 0;
> > > > > > -    int64_t scale = 1;
> > > > > > -
> > > > > > -    if (!peak)
> > > > > > -        return 0;
> > > > > > -
> > > > > > -    peak += strspn(peak, " \t");
> > > > > > -
> > > > > > -    if (peak[0] == '1' && peak[1] == '.')
> > > > > > -        return UINT32_MAX;
> > > > > > -    else if (!(peak[0] == '0' && peak[1] == '.'))
> > > > > > -        return 0;
> > > > > > -
> > > > > > -    peak += 2;
> > > > > > -
> > > > > > -    while (av_isdigit(*peak)) {
> > > > > > -        int digit = *peak - '0';
> > > > > > -
> > > > > > -        if (scale > INT64_MAX / 10)
> > > > > > -            break;
> > > > > > -
> > > > > > -        val    = 10 * val + digit;
> > > > > > -        scale *= 10;
> > > > > > -
> > > > > > -        peak++;
> > > > > > -    }
> > > > > > -
> > > > > > -    return av_rescale(val, UINT32_MAX, scale);
> > > > > > +    return (min == 0) ?
> > > > > > +            db * 100000 + mb :
> > > > > > +            db * 100000 + FFSIGN(db) * mb;
> > > > > 
> > > > > Why the check here?
> > > > > It will only make a difference if db is negative, in which case the result will
> > > > > be borked anyway.
> > > > 
> > > > FFSIGN is defined as:
> > > > 
> > > >   #define FFSIGN(a) ((a) > 0 ? 1 : -1)
> > > > 
> > > > so if db == 0 the result is negative.
> > > > 
> > > > In fact, now that I think about this, the gain decoding is broken, since values
> > > > that start with "0." are always going to be decoded as negative.
> > > 
> > > Not to mention that there's no such thing as -0, so strtol() always returns
> > > non-negative 0 (which is then interpreted by FFSIGN() as negative).
> > 
> > Hmm, you're right. I wonder why FFSIGN behaves in this way, it looks
> > counterintuitive.
> 
> Yeah, it caught me off-guard as well.
> 
> > > What about using FFSIGN(strtod(value, ...)) to get the sign? (Not sure why
> > > you didn't use strtod() in the first place, instead of doing the decoding
> > > manually though).
> > 
> > The point was to get exactly the same output on all architectures.
> > That's also the reason the values are fixed point, instead of float.
> 
> I guess that the only alternative is to check whether the first character is
> '-' (hoping that the strspn() removed enough elements).

Well, you an simply check whether db < 0
Alessandro Ghedini April 3, 2014, 12:36 p.m. | #2
On gio, apr 03, 2014 at 02:18:30 +0200, Anton Khirnov wrote:
> 
> On Thu, 3 Apr 2014 13:58:28 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> > On gio, apr 03, 2014 at 01:49:23 +0200, Anton Khirnov wrote:
> > > 
> > > On Thu, 3 Apr 2014 13:13:18 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> > > > On gio, apr 03, 2014 at 11:09:28 +0200, Alessandro Ghedini wrote:
> > > > > On Thu, Apr 03, 2014 at 09:18:20AM +0200, Anton Khirnov wrote:
> > > > > > 
> > > > > > On Mon, 31 Mar 2014 12:48:57 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> > > > > > > According to the ReplayGain spec, the peak amplitude may overflow and may result
> > > > > > > in peak amplitude values greater than 1.0 with psychoacoustically coded audio,
> > > > > > > such as MP3. Fully compliant decoders must allow peak overflows.
> > > > > > > 
> > > > > > > Additionally, having peak values in the 0<->UINT32_MAX scale makes it more
> > > > > > > difficult for applications to actually use the peak values (e.g. when
> > > > > > > implementing clipping prevention) since values have to be rescaled down.
> > > > > > > 
> > > > > > > This patch corrects the peak parsing by removing the rescaling of the decoded
> > > > > > > values between 0 and UINT32_MAX and the 1.0 upper limit.
> > > > > > > ---
> > > > > > > Merged parse_gain() and parse_peak() into parse_value(), lavu minor version
> > > > > > > bump, APIchanges update, better peak documentation in replaygain.h
> > > > > > > 
> > > > > > >  doc/APIchanges           |  4 ++++
> > > > > > >  libavformat/replaygain.c | 56 +++++++++++-------------------------------------
> > > > > > >  libavutil/replaygain.h   |  4 ++--
> > > > > > >  libavutil/version.h      |  2 +-
> > > > > > >  4 files changed, 20 insertions(+), 46 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/doc/APIchanges b/doc/APIchanges
> > > > > > > index d800253..e999c7e 100644
> > > > > > > --- a/doc/APIchanges
> > > > > > > +++ b/doc/APIchanges
> > > > > > > @@ -13,6 +13,10 @@ libavutil:     2013-12-xx
> > > > > > >  
> > > > > > >  API changes, most recent first:
> > > > > > >  
> > > > > > > +2014-03-xx - xxxxxxx - lavu 53.09.0 - replaygain.h
> > > > > > > +  Full scale for peak values is now 100000 (instead of UINT32_MAX) and values
> > > > > > > +  may overflow.
> > > > > > > +
> > > > > > >  2014-02-xx - xxxxxxx - lavu 53.08.0 - frame.h
> > > > > > >    Add av_frame_remove_side_data() for removing a single side data
> > > > > > >    instance from a frame.
> > > > > > > diff --git a/libavformat/replaygain.c b/libavformat/replaygain.c
> > > > > > > index cf4dbf8..5e4bccd 100644
> > > > > > > --- a/libavformat/replaygain.c
> > > > > > > +++ b/libavformat/replaygain.c
> > > > > > > @@ -35,19 +35,19 @@
> > > > > > >  #include "avformat.h"
> > > > > > >  #include "replaygain.h"
> > > > > > >  
> > > > > > > -static int32_t parse_gain(const char *gain)
> > > > > > > +static int32_t parse_value(const char *value, int32_t min)
> > > > > > >  {
> > > > > > >      char *fraction;
> > > > > > >      int  scale = 10000;
> > > > > > >      int32_t mb = 0;
> > > > > > >      int db;
> > > > > > >  
> > > > > > > -    if (!gain)
> > > > > > > -        return INT32_MIN;
> > > > > > > +    if (!value)
> > > > > > > +        return min;
> > > > > > >  
> > > > > > > -    gain += strspn(gain, " \t");
> > > > > > > +    value += strspn(value, " \t");
> > > > > > >  
> > > > > > > -    db = strtol(gain, &fraction, 0);
> > > > > > > +    db = strtol(value, &fraction, 0);
> > > > > > >      if (*fraction++ == '.') {
> > > > > > >          while (av_isdigit(*fraction) && scale) {
> > > > > > >              mb += scale * (*fraction - '0');
> > > > > > > @@ -57,41 +57,11 @@ static int32_t parse_gain(const char *gain)
> > > > > > >      }
> > > > > > >  
> > > > > > >      if (abs(db) > (INT32_MAX - mb) / 100000)
> > > > > > > -        return INT32_MIN;
> > > > > > > +        return min;
> > > > > > >  
> > > > > > > -    return db * 100000 + FFSIGN(db) * mb;
> > > > > > > -}
> > > > > > > -
> > > > > > > -static uint32_t parse_peak(const uint8_t *peak)
> > > > > > > -{
> > > > > > > -    int64_t val = 0;
> > > > > > > -    int64_t scale = 1;
> > > > > > > -
> > > > > > > -    if (!peak)
> > > > > > > -        return 0;
> > > > > > > -
> > > > > > > -    peak += strspn(peak, " \t");
> > > > > > > -
> > > > > > > -    if (peak[0] == '1' && peak[1] == '.')
> > > > > > > -        return UINT32_MAX;
> > > > > > > -    else if (!(peak[0] == '0' && peak[1] == '.'))
> > > > > > > -        return 0;
> > > > > > > -
> > > > > > > -    peak += 2;
> > > > > > > -
> > > > > > > -    while (av_isdigit(*peak)) {
> > > > > > > -        int digit = *peak - '0';
> > > > > > > -
> > > > > > > -        if (scale > INT64_MAX / 10)
> > > > > > > -            break;
> > > > > > > -
> > > > > > > -        val    = 10 * val + digit;
> > > > > > > -        scale *= 10;
> > > > > > > -
> > > > > > > -        peak++;
> > > > > > > -    }
> > > > > > > -
> > > > > > > -    return av_rescale(val, UINT32_MAX, scale);
> > > > > > > +    return (min == 0) ?
> > > > > > > +            db * 100000 + mb :
> > > > > > > +            db * 100000 + FFSIGN(db) * mb;
> > > > > > 
> > > > > > Why the check here?
> > > > > > It will only make a difference if db is negative, in which case the result will
> > > > > > be borked anyway.
> > > > > 
> > > > > FFSIGN is defined as:
> > > > > 
> > > > >   #define FFSIGN(a) ((a) > 0 ? 1 : -1)
> > > > > 
> > > > > so if db == 0 the result is negative.
> > > > > 
> > > > > In fact, now that I think about this, the gain decoding is broken, since values
> > > > > that start with "0." are always going to be decoded as negative.
> > > > 
> > > > Not to mention that there's no such thing as -0, so strtol() always returns
> > > > non-negative 0 (which is then interpreted by FFSIGN() as negative).
> > > 
> > > Hmm, you're right. I wonder why FFSIGN behaves in this way, it looks
> > > counterintuitive.
> > 
> > Yeah, it caught me off-guard as well.
> > 
> > > > What about using FFSIGN(strtod(value, ...)) to get the sign? (Not sure why
> > > > you didn't use strtod() in the first place, instead of doing the decoding
> > > > manually though).
> > > 
> > > The point was to get exactly the same output on all architectures.
> > > That's also the reason the values are fixed point, instead of float.
> > 
> > I guess that the only alternative is to check whether the first character is
> > '-' (hoping that the strspn() removed enough elements).
> 
> Well, you an simply check whether db < 0

What if the number is "-0.5"? db is 0, so the sign would be +, but the number is
actually negative (that's what I meant with "there's no such thing as -0").
Anton Khirnov April 3, 2014, 1:17 p.m. | #3
On Thu, 3 Apr 2014 14:36:37 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> On gio, apr 03, 2014 at 02:18:30 +0200, Anton Khirnov wrote:
> > 
> > On Thu, 3 Apr 2014 13:58:28 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> > > On gio, apr 03, 2014 at 01:49:23 +0200, Anton Khirnov wrote:
> > > > 
> > > > On Thu, 3 Apr 2014 13:13:18 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> > > > > On gio, apr 03, 2014 at 11:09:28 +0200, Alessandro Ghedini wrote:
> > > > > > On Thu, Apr 03, 2014 at 09:18:20AM +0200, Anton Khirnov wrote:
> > > > > > > 
> > > > > > > On Mon, 31 Mar 2014 12:48:57 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> > > > > > > > According to the ReplayGain spec, the peak amplitude may overflow and may result
> > > > > > > > in peak amplitude values greater than 1.0 with psychoacoustically coded audio,
> > > > > > > > such as MP3. Fully compliant decoders must allow peak overflows.
> > > > > > > > 
> > > > > > > > Additionally, having peak values in the 0<->UINT32_MAX scale makes it more
> > > > > > > > difficult for applications to actually use the peak values (e.g. when
> > > > > > > > implementing clipping prevention) since values have to be rescaled down.
> > > > > > > > 
> > > > > > > > This patch corrects the peak parsing by removing the rescaling of the decoded
> > > > > > > > values between 0 and UINT32_MAX and the 1.0 upper limit.
> > > > > > > > ---
> > > > > > > > Merged parse_gain() and parse_peak() into parse_value(), lavu minor version
> > > > > > > > bump, APIchanges update, better peak documentation in replaygain.h
> > > > > > > > 
> > > > > > > >  doc/APIchanges           |  4 ++++
> > > > > > > >  libavformat/replaygain.c | 56 +++++++++++-------------------------------------
> > > > > > > >  libavutil/replaygain.h   |  4 ++--
> > > > > > > >  libavutil/version.h      |  2 +-
> > > > > > > >  4 files changed, 20 insertions(+), 46 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/doc/APIchanges b/doc/APIchanges
> > > > > > > > index d800253..e999c7e 100644
> > > > > > > > --- a/doc/APIchanges
> > > > > > > > +++ b/doc/APIchanges
> > > > > > > > @@ -13,6 +13,10 @@ libavutil:     2013-12-xx
> > > > > > > >  
> > > > > > > >  API changes, most recent first:
> > > > > > > >  
> > > > > > > > +2014-03-xx - xxxxxxx - lavu 53.09.0 - replaygain.h
> > > > > > > > +  Full scale for peak values is now 100000 (instead of UINT32_MAX) and values
> > > > > > > > +  may overflow.
> > > > > > > > +
> > > > > > > >  2014-02-xx - xxxxxxx - lavu 53.08.0 - frame.h
> > > > > > > >    Add av_frame_remove_side_data() for removing a single side data
> > > > > > > >    instance from a frame.
> > > > > > > > diff --git a/libavformat/replaygain.c b/libavformat/replaygain.c
> > > > > > > > index cf4dbf8..5e4bccd 100644
> > > > > > > > --- a/libavformat/replaygain.c
> > > > > > > > +++ b/libavformat/replaygain.c
> > > > > > > > @@ -35,19 +35,19 @@
> > > > > > > >  #include "avformat.h"
> > > > > > > >  #include "replaygain.h"
> > > > > > > >  
> > > > > > > > -static int32_t parse_gain(const char *gain)
> > > > > > > > +static int32_t parse_value(const char *value, int32_t min)
> > > > > > > >  {
> > > > > > > >      char *fraction;
> > > > > > > >      int  scale = 10000;
> > > > > > > >      int32_t mb = 0;
> > > > > > > >      int db;
> > > > > > > >  
> > > > > > > > -    if (!gain)
> > > > > > > > -        return INT32_MIN;
> > > > > > > > +    if (!value)
> > > > > > > > +        return min;
> > > > > > > >  
> > > > > > > > -    gain += strspn(gain, " \t");
> > > > > > > > +    value += strspn(value, " \t");
> > > > > > > >  
> > > > > > > > -    db = strtol(gain, &fraction, 0);
> > > > > > > > +    db = strtol(value, &fraction, 0);
> > > > > > > >      if (*fraction++ == '.') {
> > > > > > > >          while (av_isdigit(*fraction) && scale) {
> > > > > > > >              mb += scale * (*fraction - '0');
> > > > > > > > @@ -57,41 +57,11 @@ static int32_t parse_gain(const char *gain)
> > > > > > > >      }
> > > > > > > >  
> > > > > > > >      if (abs(db) > (INT32_MAX - mb) / 100000)
> > > > > > > > -        return INT32_MIN;
> > > > > > > > +        return min;
> > > > > > > >  
> > > > > > > > -    return db * 100000 + FFSIGN(db) * mb;
> > > > > > > > -}
> > > > > > > > -
> > > > > > > > -static uint32_t parse_peak(const uint8_t *peak)
> > > > > > > > -{
> > > > > > > > -    int64_t val = 0;
> > > > > > > > -    int64_t scale = 1;
> > > > > > > > -
> > > > > > > > -    if (!peak)
> > > > > > > > -        return 0;
> > > > > > > > -
> > > > > > > > -    peak += strspn(peak, " \t");
> > > > > > > > -
> > > > > > > > -    if (peak[0] == '1' && peak[1] == '.')
> > > > > > > > -        return UINT32_MAX;
> > > > > > > > -    else if (!(peak[0] == '0' && peak[1] == '.'))
> > > > > > > > -        return 0;
> > > > > > > > -
> > > > > > > > -    peak += 2;
> > > > > > > > -
> > > > > > > > -    while (av_isdigit(*peak)) {
> > > > > > > > -        int digit = *peak - '0';
> > > > > > > > -
> > > > > > > > -        if (scale > INT64_MAX / 10)
> > > > > > > > -            break;
> > > > > > > > -
> > > > > > > > -        val    = 10 * val + digit;
> > > > > > > > -        scale *= 10;
> > > > > > > > -
> > > > > > > > -        peak++;
> > > > > > > > -    }
> > > > > > > > -
> > > > > > > > -    return av_rescale(val, UINT32_MAX, scale);
> > > > > > > > +    return (min == 0) ?
> > > > > > > > +            db * 100000 + mb :
> > > > > > > > +            db * 100000 + FFSIGN(db) * mb;
> > > > > > > 
> > > > > > > Why the check here?
> > > > > > > It will only make a difference if db is negative, in which case the result will
> > > > > > > be borked anyway.
> > > > > > 
> > > > > > FFSIGN is defined as:
> > > > > > 
> > > > > >   #define FFSIGN(a) ((a) > 0 ? 1 : -1)
> > > > > > 
> > > > > > so if db == 0 the result is negative.
> > > > > > 
> > > > > > In fact, now that I think about this, the gain decoding is broken, since values
> > > > > > that start with "0." are always going to be decoded as negative.
> > > > > 
> > > > > Not to mention that there's no such thing as -0, so strtol() always returns
> > > > > non-negative 0 (which is then interpreted by FFSIGN() as negative).
> > > > 
> > > > Hmm, you're right. I wonder why FFSIGN behaves in this way, it looks
> > > > counterintuitive.
> > > 
> > > Yeah, it caught me off-guard as well.
> > > 
> > > > > What about using FFSIGN(strtod(value, ...)) to get the sign? (Not sure why
> > > > > you didn't use strtod() in the first place, instead of doing the decoding
> > > > > manually though).
> > > > 
> > > > The point was to get exactly the same output on all architectures.
> > > > That's also the reason the values are fixed point, instead of float.
> > > 
> > > I guess that the only alternative is to check whether the first character is
> > > '-' (hoping that the strspn() removed enough elements).
> > 
> > Well, you an simply check whether db < 0
> 
> What if the number is "-0.5"? db is 0, so the sign would be +, but the number is
> actually negative (that's what I meant with "there's no such thing as -0").

Ah right, i'm dumb.
Your approach should be correct then
Alessandro Ghedini April 3, 2014, 4:16 p.m. | #4
On gio, apr 03, 2014 at 03:17:50 +0200, Anton Khirnov wrote:
> 
> On Thu, 3 Apr 2014 14:36:37 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> > On gio, apr 03, 2014 at 02:18:30 +0200, Anton Khirnov wrote:
> > > 
> > > On Thu, 3 Apr 2014 13:58:28 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> > > > On gio, apr 03, 2014 at 01:49:23 +0200, Anton Khirnov wrote:
> > > > > 
> > > > > On Thu, 3 Apr 2014 13:13:18 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> > > > > > On gio, apr 03, 2014 at 11:09:28 +0200, Alessandro Ghedini wrote:
> > > > > > > On Thu, Apr 03, 2014 at 09:18:20AM +0200, Anton Khirnov wrote:
> > > > > > > > 
> > > > > > > > On Mon, 31 Mar 2014 12:48:57 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> > > > > > > > > According to the ReplayGain spec, the peak amplitude may overflow and may result
> > > > > > > > > in peak amplitude values greater than 1.0 with psychoacoustically coded audio,
> > > > > > > > > such as MP3. Fully compliant decoders must allow peak overflows.
> > > > > > > > > 
> > > > > > > > > Additionally, having peak values in the 0<->UINT32_MAX scale makes it more
> > > > > > > > > difficult for applications to actually use the peak values (e.g. when
> > > > > > > > > implementing clipping prevention) since values have to be rescaled down.
> > > > > > > > > 
> > > > > > > > > This patch corrects the peak parsing by removing the rescaling of the decoded
> > > > > > > > > values between 0 and UINT32_MAX and the 1.0 upper limit.
> > > > > > > > > ---
> > > > > > > > > Merged parse_gain() and parse_peak() into parse_value(), lavu minor version
> > > > > > > > > bump, APIchanges update, better peak documentation in replaygain.h
> > > > > > > > > 
> > > > > > > > >  doc/APIchanges           |  4 ++++
> > > > > > > > >  libavformat/replaygain.c | 56 +++++++++++-------------------------------------
> > > > > > > > >  libavutil/replaygain.h   |  4 ++--
> > > > > > > > >  libavutil/version.h      |  2 +-
> > > > > > > > >  4 files changed, 20 insertions(+), 46 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/doc/APIchanges b/doc/APIchanges
> > > > > > > > > index d800253..e999c7e 100644
> > > > > > > > > --- a/doc/APIchanges
> > > > > > > > > +++ b/doc/APIchanges
> > > > > > > > > @@ -13,6 +13,10 @@ libavutil:     2013-12-xx
> > > > > > > > >  
> > > > > > > > >  API changes, most recent first:
> > > > > > > > >  
> > > > > > > > > +2014-03-xx - xxxxxxx - lavu 53.09.0 - replaygain.h
> > > > > > > > > +  Full scale for peak values is now 100000 (instead of UINT32_MAX) and values
> > > > > > > > > +  may overflow.
> > > > > > > > > +
> > > > > > > > >  2014-02-xx - xxxxxxx - lavu 53.08.0 - frame.h
> > > > > > > > >    Add av_frame_remove_side_data() for removing a single side data
> > > > > > > > >    instance from a frame.
> > > > > > > > > diff --git a/libavformat/replaygain.c b/libavformat/replaygain.c
> > > > > > > > > index cf4dbf8..5e4bccd 100644
> > > > > > > > > --- a/libavformat/replaygain.c
> > > > > > > > > +++ b/libavformat/replaygain.c
> > > > > > > > > @@ -35,19 +35,19 @@
> > > > > > > > >  #include "avformat.h"
> > > > > > > > >  #include "replaygain.h"
> > > > > > > > >  
> > > > > > > > > -static int32_t parse_gain(const char *gain)
> > > > > > > > > +static int32_t parse_value(const char *value, int32_t min)
> > > > > > > > >  {
> > > > > > > > >      char *fraction;
> > > > > > > > >      int  scale = 10000;
> > > > > > > > >      int32_t mb = 0;
> > > > > > > > >      int db;
> > > > > > > > >  
> > > > > > > > > -    if (!gain)
> > > > > > > > > -        return INT32_MIN;
> > > > > > > > > +    if (!value)
> > > > > > > > > +        return min;
> > > > > > > > >  
> > > > > > > > > -    gain += strspn(gain, " \t");
> > > > > > > > > +    value += strspn(value, " \t");
> > > > > > > > >  
> > > > > > > > > -    db = strtol(gain, &fraction, 0);
> > > > > > > > > +    db = strtol(value, &fraction, 0);
> > > > > > > > >      if (*fraction++ == '.') {
> > > > > > > > >          while (av_isdigit(*fraction) && scale) {
> > > > > > > > >              mb += scale * (*fraction - '0');
> > > > > > > > > @@ -57,41 +57,11 @@ static int32_t parse_gain(const char *gain)
> > > > > > > > >      }
> > > > > > > > >  
> > > > > > > > >      if (abs(db) > (INT32_MAX - mb) / 100000)
> > > > > > > > > -        return INT32_MIN;
> > > > > > > > > +        return min;
> > > > > > > > >  
> > > > > > > > > -    return db * 100000 + FFSIGN(db) * mb;
> > > > > > > > > -}
> > > > > > > > > -
> > > > > > > > > -static uint32_t parse_peak(const uint8_t *peak)
> > > > > > > > > -{
> > > > > > > > > -    int64_t val = 0;
> > > > > > > > > -    int64_t scale = 1;
> > > > > > > > > -
> > > > > > > > > -    if (!peak)
> > > > > > > > > -        return 0;
> > > > > > > > > -
> > > > > > > > > -    peak += strspn(peak, " \t");
> > > > > > > > > -
> > > > > > > > > -    if (peak[0] == '1' && peak[1] == '.')
> > > > > > > > > -        return UINT32_MAX;
> > > > > > > > > -    else if (!(peak[0] == '0' && peak[1] == '.'))
> > > > > > > > > -        return 0;
> > > > > > > > > -
> > > > > > > > > -    peak += 2;
> > > > > > > > > -
> > > > > > > > > -    while (av_isdigit(*peak)) {
> > > > > > > > > -        int digit = *peak - '0';
> > > > > > > > > -
> > > > > > > > > -        if (scale > INT64_MAX / 10)
> > > > > > > > > -            break;
> > > > > > > > > -
> > > > > > > > > -        val    = 10 * val + digit;
> > > > > > > > > -        scale *= 10;
> > > > > > > > > -
> > > > > > > > > -        peak++;
> > > > > > > > > -    }
> > > > > > > > > -
> > > > > > > > > -    return av_rescale(val, UINT32_MAX, scale);
> > > > > > > > > +    return (min == 0) ?
> > > > > > > > > +            db * 100000 + mb :
> > > > > > > > > +            db * 100000 + FFSIGN(db) * mb;
> > > > > > > > 
> > > > > > > > Why the check here?
> > > > > > > > It will only make a difference if db is negative, in which case the result will
> > > > > > > > be borked anyway.
> > > > > > > 
> > > > > > > FFSIGN is defined as:
> > > > > > > 
> > > > > > >   #define FFSIGN(a) ((a) > 0 ? 1 : -1)
> > > > > > > 
> > > > > > > so if db == 0 the result is negative.
> > > > > > > 
> > > > > > > In fact, now that I think about this, the gain decoding is broken, since values
> > > > > > > that start with "0." are always going to be decoded as negative.
> > > > > > 
> > > > > > Not to mention that there's no such thing as -0, so strtol() always returns
> > > > > > non-negative 0 (which is then interpreted by FFSIGN() as negative).
> > > > > 
> > > > > Hmm, you're right. I wonder why FFSIGN behaves in this way, it looks
> > > > > counterintuitive.
> > > > 
> > > > Yeah, it caught me off-guard as well.
> > > > 
> > > > > > What about using FFSIGN(strtod(value, ...)) to get the sign? (Not sure why
> > > > > > you didn't use strtod() in the first place, instead of doing the decoding
> > > > > > manually though).
> > > > > 
> > > > > The point was to get exactly the same output on all architectures.
> > > > > That's also the reason the values are fixed point, instead of float.
> > > > 
> > > > I guess that the only alternative is to check whether the first character is
> > > > '-' (hoping that the strspn() removed enough elements).
> > > 
> > > Well, you an simply check whether db < 0
> > 
> > What if the number is "-0.5"? db is 0, so the sign would be +, but the number is
> > actually negative (that's what I meant with "there's no such thing as -0").
> 
> Ah right, i'm dumb.
> Your approach should be correct then

Should I include this in the 1/5 patch, or make a new one?
Anton Khirnov April 4, 2014, 5:37 a.m. | #5
On Thu, 3 Apr 2014 18:16:20 +0200, Alessandro Ghedini <alessandro@ghedini.me> wrote:
> On gio, apr 03, 2014 at 03:17:50 +0200, Anton Khirnov wrote:
> 
> Should I include this in the 1/5 patch, or make a new one?
> 

I'd say a separate patch to be applied before the first one in this set.

Patch

diff --git a/libavformat/replaygain.c b/libavformat/replaygain.c
index 5e4bccd..5d80d2b 100644
--- a/libavformat/replaygain.c
+++ b/libavformat/replaygain.c
@@ -41,12 +41,16 @@  static int32_t parse_value(const char *value, int32_t min)
     int  scale = 10000;
     int32_t mb = 0;
     int db;
+    int sign = 1;
 
     if (!value)
         return min;
 
     value += strspn(value, " \t");
 
+    if (*value == '-')
+        sign = -1;
+
     db = strtol(value, &fraction, 0);
     if (*fraction++ == '.') {
         while (av_isdigit(*fraction) && scale) {
@@ -59,9 +63,7 @@  static int32_t parse_value(const char *value, int32_t min)
     if (abs(db) > (INT32_MAX - mb) / 100000)
         return min;
 
-    return (min == 0) ?
-            db * 100000 + mb :
-            db * 100000 + FFSIGN(db) * mb;
+    return db * 100000 + sign * mb;
 }
 
 static int replaygain_export(AVStream *st,