g722: Change bits per sample to 4

Message ID 1322781844-31711-1-git-send-email-martin@martin.st
State Superseded
Headers show

Commit Message

Martin Storsjö Dec. 1, 2011, 11:24 p.m.
Earlier, bits per sample was defined as 8, which wasn't chosen
with any thought, only chosen since it didn't seem to change
anything compared to not defining it at all.

g722 encodes 2 samples into one byte codeword, therefore the
bits per sample is 4. By changing this, the generated timestamps
for streams encoded with g722 become correct.

Due to the way avcodec_encode_audio is defined for non frame
based codecs, calculating the number of input samples in the
encode function has to be changed once the bits per samples
definition is changed.
---
 libavcodec/g722enc.c |    4 ++--
 libavcodec/utils.c   |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

Comments

Justin Ruggles Dec. 2, 2011, 1:54 a.m. | #1
On 12/01/2011 06:24 PM, Martin Storsjö wrote:

> Earlier, bits per sample was defined as 8, which wasn't chosen
> with any thought, only chosen since it didn't seem to change
> anything compared to not defining it at all.
> 
> g722 encodes 2 samples into one byte codeword, therefore the
> bits per sample is 4. By changing this, the generated timestamps
> for streams encoded with g722 become correct.

While technically true, this causes problems with WAV. In WAV files, the
bits-per-sample field in the header is supposed to be 8 (or 7 or 6) not
4. So we need an override for g722 in the WAV header writing function in
lavf to write the correct value.

See Bug #54

> Due to the way avcodec_encode_audio is defined for non frame
> based codecs, calculating the number of input samples in the
> encode function has to be changed once the bits per samples
> definition is changed.


It isn't defined that way for non-frame-based codecs, it's defined that
way for PCM. The g722 encoder is just wrong. All other ADPCM encoders
set frame_size and use it accordingly.

-Justin
Martin Storsjö Dec. 2, 2011, 9:32 a.m. | #2
On Thu, 1 Dec 2011, Justin Ruggles wrote:

> On 12/01/2011 06:24 PM, Martin Storsjö wrote:
>
>> Earlier, bits per sample was defined as 8, which wasn't chosen
>> with any thought, only chosen since it didn't seem to change
>> anything compared to not defining it at all.
>>
>> g722 encodes 2 samples into one byte codeword, therefore the
>> bits per sample is 4. By changing this, the generated timestamps
>> for streams encoded with g722 become correct.
>
> While technically true, this causes problems with WAV. In WAV files, the
> bits-per-sample field in the header is supposed to be 8 (or 7 or 6) not
> 4. So we need an override for g722 in the WAV header writing function in
> lavf to write the correct value.

Yeah, I came to think of this detail right after posting the patch - I 
guess I chose 8 since it was the default for the "number of bits per 
codeword/sample pair", which the decoder uses - my patch also makes the 
raw g722 decoder set bits_per_coded_sample to 4, which gives warnings in 
the decoder, which I realized after posting it.

As for WAV files, I don't have any samples, it'd be good to know what the 
6/7 bits per codeword samples look like, if the words are packed tightly, 
or the extra bits just are to be ignored. (Currently the decoder doesn't 
assume tightly packed codewords.)

> See Bug #54

Oh, didn't know about this issue.

>> Due to the way avcodec_encode_audio is defined for non frame
>> based codecs, calculating the number of input samples in the
>> encode function has to be changed once the bits per samples
>> definition is changed.
>
>
> It isn't defined that way for non-frame-based codecs, it's defined that
> way for PCM. The g722 encoder is just wrong. All other ADPCM encoders
> set frame_size and use it accordingly.

Hmm, so are there any other non-PCM encoders that don't set a frame size?
avcodec.h says this behaviour is for PCM codecs only, while avcodec.c does 
it for any codec that has a frame_size <= 1.

My issue (that pts isn't generated properly) is fixed by setting any 
frame_size in the encoder init function (and changing the encode_frame 
function accordingly), but does that make sense for a codec that can use 
any frame size? (For this codec, the user could set any other frame size 
he wants after initializing the encoder.) And if we'd want to change this 
to use a default frame size, what would be a suitable default?

// Martin
Martin Storsjö Dec. 2, 2011, 9:46 a.m. | #3
On Fri, 2 Dec 2011, Martin Storsjö wrote:

> On Thu, 1 Dec 2011, Justin Ruggles wrote:
>
>> On 12/01/2011 06:24 PM, Martin Storsjö wrote:
>> 
>>> Earlier, bits per sample was defined as 8, which wasn't chosen
>>> with any thought, only chosen since it didn't seem to change
>>> anything compared to not defining it at all.
>>> 
>>> g722 encodes 2 samples into one byte codeword, therefore the
>>> bits per sample is 4. By changing this, the generated timestamps
>>> for streams encoded with g722 become correct.
>> 
>> While technically true, this causes problems with WAV. In WAV files, the
>> bits-per-sample field in the header is supposed to be 8 (or 7 or 6) not
>> 4. So we need an override for g722 in the WAV header writing function in
>> lavf to write the correct value.
>
> Yeah, I came to think of this detail right after posting the patch - I guess 
> I chose 8 since it was the default for the "number of bits per 
> codeword/sample pair", which the decoder uses - my patch also makes the raw 
> g722 decoder set bits_per_coded_sample to 4, which gives warnings in the 
> decoder, which I realized after posting it.

FWIW, changing it to 4 makes the raw g722 demuxer generate proper 
timestamps for input data, too. But I'm not sure any such data really 
exists in the wild other than for testing the codec, so it's not an 
important usecase. The raw g722 demuxer's requirement of a non-zero 
bits_per_coded_sample was what triggered the addition of the definition in 
the first place (in 029f966c3aa73531a90cb14ca95057f2fb3f0a26).

// Martin
Justin Ruggles Dec. 2, 2011, 5:14 p.m. | #4
On 12/02/2011 04:32 AM, Martin Storsjö wrote:

> On Thu, 1 Dec 2011, Justin Ruggles wrote:
> 
>> On 12/01/2011 06:24 PM, Martin Storsjö wrote:
>>
>>> Earlier, bits per sample was defined as 8, which wasn't chosen
>>> with any thought, only chosen since it didn't seem to change
>>> anything compared to not defining it at all.
>>>
>>> g722 encodes 2 samples into one byte codeword, therefore the
>>> bits per sample is 4. By changing this, the generated timestamps
>>> for streams encoded with g722 become correct.
>>
>> While technically true, this causes problems with WAV. In WAV files, the
>> bits-per-sample field in the header is supposed to be 8 (or 7 or 6) not
>> 4. So we need an override for g722 in the WAV header writing function in
>> lavf to write the correct value.
> 
> Yeah, I came to think of this detail right after posting the patch - I 
> guess I chose 8 since it was the default for the "number of bits per 
> codeword/sample pair", which the decoder uses - my patch also makes the 
> raw g722 decoder set bits_per_coded_sample to 4, which gives warnings in 
> the decoder, which I realized after posting it.
> 
> As for WAV files, I don't have any samples, it'd be good to know what the 
> 6/7 bits per codeword samples look like, if the words are packed tightly, 
> or the extra bits just are to be ignored. (Currently the decoder doesn't 
> assume tightly packed codewords.)

Well, the only sample I have been able to find is
http://samples.libav.org/A-codecs/g722/msn-messenger-g722.wav

And it actually has bits-per-sample as 16. :/ I also could not find a
G.722 codec for Windows to see how WMP handles it. I've never seen a 6/7
bits sample either, other than the raw ITU-T testsuite.

So maybe until we actually come across such samples in a container to
see how it's handled, we could just assume 4 bits per sample and provide
a decoder option for code_size.

>> See Bug #54
> 
> Oh, didn't know about this issue.
> 
>>> Due to the way avcodec_encode_audio is defined for non frame
>>> based codecs, calculating the number of input samples in the
>>> encode function has to be changed once the bits per samples
>>> definition is changed.
>>
>>
>> It isn't defined that way for non-frame-based codecs, it's defined that
>> way for PCM. The g722 encoder is just wrong. All other ADPCM encoders
>> set frame_size and use it accordingly.
> 
> Hmm, so are there any other non-PCM encoders that don't set a frame size?
> avcodec.h says this behaviour is for PCM codecs only, while avcodec.c does 
> it for any codec that has a frame_size <= 1.

AFAIK, PCM and G.722 are the only encoders which do not set a frame_size
> 1. But I haven't checked them all yet.

> My issue (that pts isn't generated properly) is fixed by setting any 
> frame_size in the encoder init function (and changing the encode_frame 
> function accordingly), but does that make sense for a codec that can use 
> any frame size? (For this codec, the user could set any other frame size 
> he wants after initializing the encoder.) And if we'd want to change this 
> to use a default frame size, what would be a suitable default?


If the user changes frame_size after init with almost any encoder (with
the exception of the last frame for CODEC_CAP_SMALL_LAST_FRAME) it will
generally break horribly. I'm actually debating about how to handle PCM
and other such codecs in the new encoding API that I'm working on. One
option would be a CODEC_CAP that indicates that the encoder will take
any number of samples from the user (via AVFrame.nb_samples) and encode
them. Another option would be to have them behave like all the others
and set some arbitrary frame_size.

-Justin
Martin Storsjö Dec. 2, 2011, 10:06 p.m. | #5
On Fri, 2 Dec 2011, Justin Ruggles wrote:

> On 12/02/2011 04:32 AM, Martin Storsjö wrote:
>
>> On Thu, 1 Dec 2011, Justin Ruggles wrote:
>>
>>> On 12/01/2011 06:24 PM, Martin Storsjö wrote:
>>>
>>>> Earlier, bits per sample was defined as 8, which wasn't chosen
>>>> with any thought, only chosen since it didn't seem to change
>>>> anything compared to not defining it at all.
>>>>
>>>> g722 encodes 2 samples into one byte codeword, therefore the
>>>> bits per sample is 4. By changing this, the generated timestamps
>>>> for streams encoded with g722 become correct.
>>>
>>> While technically true, this causes problems with WAV. In WAV files, the
>>> bits-per-sample field in the header is supposed to be 8 (or 7 or 6) not
>>> 4. So we need an override for g722 in the WAV header writing function in
>>> lavf to write the correct value.
>>
>> Yeah, I came to think of this detail right after posting the patch - I
>> guess I chose 8 since it was the default for the "number of bits per
>> codeword/sample pair", which the decoder uses - my patch also makes the
>> raw g722 decoder set bits_per_coded_sample to 4, which gives warnings in
>> the decoder, which I realized after posting it.
>>
>> As for WAV files, I don't have any samples, it'd be good to know what the
>> 6/7 bits per codeword samples look like, if the words are packed tightly,
>> or the extra bits just are to be ignored. (Currently the decoder doesn't
>> assume tightly packed codewords.)
>
> Well, the only sample I have been able to find is
> http://samples.libav.org/A-codecs/g722/msn-messenger-g722.wav
>
> And it actually has bits-per-sample as 16. :/ I also could not find a
> G.722 codec for Windows to see how WMP handles it. I've never seen a 6/7
> bits sample either, other than the raw ITU-T testsuite.
>
> So maybe until we actually come across such samples in a container to
> see how it's handled, we could just assume 4 bits per sample and provide
> a decoder option for code_size.

Hmm, that might make sense.

One one hand, the bits per codeword is something that should be passed 
from demuxer to decoder in the cases where the container format actually 
knows that (which we don't really have any good cases of anyway), and for 
that, a codec private option doesn't really work.

But for now, an option might be good enough until we have real samples 
that need the more detailed handling, to decouple this from the bits per 
sample definition.

>>> See Bug #54
>>
>> Oh, didn't know about this issue.
>>
>>>> Due to the way avcodec_encode_audio is defined for non frame
>>>> based codecs, calculating the number of input samples in the
>>>> encode function has to be changed once the bits per samples
>>>> definition is changed.
>>>
>>>
>>> It isn't defined that way for non-frame-based codecs, it's defined that
>>> way for PCM. The g722 encoder is just wrong. All other ADPCM encoders
>>> set frame_size and use it accordingly.
>>
>> Hmm, so are there any other non-PCM encoders that don't set a frame size?
>> avcodec.h says this behaviour is for PCM codecs only, while avcodec.c does
>> it for any codec that has a frame_size <= 1.
>
> AFAIK, PCM and G.722 are the only encoders which do not set a frame_size
>> 1. But I haven't checked them all yet.
>
>> My issue (that pts isn't generated properly) is fixed by setting any
>> frame_size in the encoder init function (and changing the encode_frame
>> function accordingly), but does that make sense for a codec that can use
>> any frame size? (For this codec, the user could set any other frame size
>> he wants after initializing the encoder.) And if we'd want to change this
>> to use a default frame size, what would be a suitable default?
>
>
> If the user changes frame_size after init with almost any encoder (with
> the exception of the last frame for CODEC_CAP_SMALL_LAST_FRAME) it will
> generally break horribly. I'm actually debating about how to handle PCM
> and other such codecs in the new encoding API that I'm working on. One
> option would be a CODEC_CAP that indicates that the encoder will take
> any number of samples from the user (via AVFrame.nb_samples) and encode
> them. Another option would be to have them behave like all the others
> and set some arbitrary frame_size.

Ok - a more straightforward handling of these things in the new API really 
is a welcome addition :-)

// Martin

Patch

diff --git a/libavcodec/g722enc.c b/libavcodec/g722enc.c
index bdc30d5..706a7ae 100644
--- a/libavcodec/g722enc.c
+++ b/libavcodec/g722enc.c
@@ -280,12 +280,12 @@  static int g722_encode_frame(AVCodecContext *avctx,
 {
     G722Context *c = avctx->priv_data;
     const int16_t *samples = data;
-    int i;
+    int i, nsamples = buf_size * 8 / av_get_bits_per_sample(avctx->codec->id);
 
     if (avctx->trellis)
         return g722_encode_trellis(avctx, dst, buf_size, data);
 
-    for (i = 0; i < buf_size >> 1; i++) {
+    for (i = 0; i < nsamples >> 1; i++) {
         int xlow, xhigh, ihigh, ilow;
         filter_samples(c, &samples[2*i], &xlow, &xhigh);
         ihigh = encode_high(&c->band[1], xhigh);
diff --git a/libavcodec/utils.c b/libavcodec/utils.c
index 53440e0..59588d7 100644
--- a/libavcodec/utils.c
+++ b/libavcodec/utils.c
@@ -1151,8 +1151,8 @@  int av_get_bits_per_sample(enum CodecID codec_id){
     case CODEC_ID_ADPCM_SWF:
     case CODEC_ID_ADPCM_MS:
     case CODEC_ID_ADPCM_YAMAHA:
-        return 4;
     case CODEC_ID_ADPCM_G722:
+        return 4;
     case CODEC_ID_PCM_ALAW:
     case CODEC_ID_PCM_MULAW:
     case CODEC_ID_PCM_S8: