movenc: factor initial_padding into edit lists

Message ID 20170220170555.28253-1-stebbins@jetheaddev.com
State New
Headers show

Commit Message

John Stebbins Feb. 20, 2017, 5:05 p.m.
initial_padding was getting added to the edit list indirectly due to
initial negative dts.  But in cases where the audio is delayed,
all or part of initial_padding would be unaccounted for.  This patch
makes initial_padding explicit.
---
 libavformat/movenc.c | 53 +++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 38 insertions(+), 15 deletions(-)

Comments

Martin Storsjö Feb. 20, 2017, 10:27 p.m. | #1
On Mon, 20 Feb 2017, John Stebbins wrote:

> initial_padding was getting added to the edit list indirectly due to
> initial negative dts.  But in cases where the audio is delayed,
> all or part of initial_padding would be unaccounted for.  This patch
> makes initial_padding explicit.
> ---
> libavformat/movenc.c | 53 +++++++++++++++++++++++++++++++++++++---------------
> 1 file changed, 38 insertions(+), 15 deletions(-)
>
> diff --git a/libavformat/movenc.c b/libavformat/movenc.c
> index 689291d..b2c0c92 100644
> --- a/libavformat/movenc.c
> +++ b/libavformat/movenc.c
> @@ -1698,9 +1698,28 @@ static int mov_write_edts_tag(AVIOContext *pb, MOVMuxContext *mov,
>                                       track->timescale, AV_ROUND_UP);
>     int version = duration < INT32_MAX ? 0 : 1;
>     int entry_size, entry_count, size;
> -    int64_t delay, start_ct = track->start_cts;
> -    delay = av_rescale_rnd(track->start_dts + start_ct, MOV_TIMESCALE,
> +    int64_t delay;
> +    int64_t mediatime;
> +    int64_t skip = 0;
> +
> +    delay = av_rescale_rnd(track->start_dts + track->start_cts, MOV_TIMESCALE,
>                            track->timescale, AV_ROUND_DOWN);
> +
> +    if (track->par->codec_type == AVMEDIA_TYPE_AUDIO &&
> +        track->par->initial_padding > 0) {
> +        /* Adjust delay so that initial_padding gets recorded in the
> +         * MediaTime of an edit list entry even in the case that
> +         * delay is positive. I.e. we don't want initial_padding to be
> +         * absorbed and hidden in the delay. MediaTime must contain
> +         * initial_padding in order to know where the actual media
> +         * timeline begins. A player should drop samples until MediaTime
> +         * is reached */
> +        delay += av_rescale_rnd(track->par->initial_padding, MOV_TIMESCALE,
> +                                track->par->sample_rate, AV_ROUND_DOWN);

> +        skip = av_rescale_rnd(track->par->initial_padding,
> +                              track->timescale,
> +                              track->par->sample_rate, AV_ROUND_DOWN);
> +    }
>     version |= delay < INT32_MAX ? 0 : 1;
>
>     entry_size = (version == 1) ? 20 : 12;
> @@ -1731,33 +1750,37 @@ static int mov_write_edts_tag(AVIOContext *pb, MOVMuxContext *mov,
>         }
>         avio_wb32(pb, 0x00010000);
>     } else {
> -        /* Avoid accidentally ending up with start_ct = -1 which has got a
> -         * special meaning. Normally start_ct should end up positive or zero
> -         * here, but use FFMIN in case dts is a a small positive integer
> -         * rounded to 0 when represented in MOV_TIMESCALE units. */
> -        start_ct  = -FFMIN(track->start_dts, 0);
> -        /* Note, this delay is calculated from the pts of the first sample,
> -         * ensuring that we don't reduce the duration for cases with
> -         * dts<0 pts=0. */
> -        duration += delay;
> +        /* Avoid accidentally ending up with mediatime = -1 which has got a
> +         * special meaning. skip and -track->start_dts are guaranteed to be
> +         * positive here, so it is not possible mediatime to be -1 */
> +        skip = FFMAX(skip, -track->start_dts - track->start_cts);

I don't see how it's guaranteed that those numbers are 
positive/nonnegative. Example:

track->timebase = 44100
track->start_dts = 10

This should end up with delay = 0, which hits this branch (delay > 0 
doesn't trigger), ending up with -track->start_dts as a negative value.

With the FFMAX here though, it does seem to guarantee the same, so it 
probably is fine technically, but the comment has me confused here.

With the logic changed, I think this comment actually adds more confusion 
than what it helps. It could perhaps instead explain what this achieves 
instead, i.e. if start_pts < 0, we should also skip some data, just like 
for initial padding, and skipping according to whichever of them is 
larger.


Overall I like where you're going with this. I see that this passes 
fate-movenc, so it doesn't change the actual output, at last for the cases 
tested there.

I'd appreciate if you could add a few cases in that test 
(libavformat/tests/movenc.c) to validate all the different corner cases 
that this code touches. This test writes synthetic test data in different 
setups and hashes the results. By running it as ./libavformat/tests/movenc 
-w, it will write the actual output files to disk. Since it's synthetic 
data (i.e. broken/fake packet payloads), it's mostly interesting to look 
at the produced edit lists with boxdumper or so.

E.g. first test something like video with b-frames starting from both 
pts=0 and pts > 0 (it does test the former but not the latter iirc). If 
that test is added before doing this change, we can be sure that we don't 
break/change those parts that aren't intended to be changed.

For audio, this test currently doesn't set the initial_padding field at 
all. The test could be extended to test audio both with and without that 
set

And later we can add cases for audio with initial padding starting from 
pts<0 but where the padding makes it start with a delay > 0, etc.


This function is really hard to grasp fully in general. I played with the 
thought of trying to achieve the same as you're doing, but by refactoring 
through a few smaller steps, to keep track of it better. On the other 
hand, I'm not sure if the main change can be done much smaller than what 
you do here - the more I think about it, the closer I end up with exactly 
what you have here.


After reading all of the code a few more times while typing this review 
comment, I'm pretty convinced that your patch is good. But then my main 
request would be to extend the movenc unit test file, to have more 
extensive tests of the cases that shouldn't be changed, and new tests for 
cases with/without initial_padding set.

// Martin
John Stebbins Feb. 21, 2017, 2:15 a.m. | #2
On 02/20/2017 03:27 PM, Martin Storsjö wrote:
> On Mon, 20 Feb 2017, John Stebbins wrote:
>
>> initial_padding was getting added to the edit list indirectly due to
>> initial negative dts.  But in cases where the audio is delayed,
>> all or part of initial_padding would be unaccounted for.  This patch
>> makes initial_padding explicit.
>> ---
>> libavformat/movenc.c | 53
>> +++++++++++++++++++++++++++++++++++++---------------
>> 1 file changed, 38 insertions(+), 15 deletions(-)
>>
>> diff --git a/libavformat/movenc.c b/libavformat/movenc.c
>> index 689291d..b2c0c92 100644
>> --- a/libavformat/movenc.c
>> +++ b/libavformat/movenc.c
>> @@ -1698,9 +1698,28 @@ static int mov_write_edts_tag(AVIOContext *pb,
>> MOVMuxContext *mov,
>>                                       track->timescale, AV_ROUND_UP);
>>     int version = duration < INT32_MAX ? 0 : 1;
>>     int entry_size, entry_count, size;
>> -    int64_t delay, start_ct = track->start_cts;
>> -    delay = av_rescale_rnd(track->start_dts + start_ct, MOV_TIMESCALE,
>> +    int64_t delay;
>> +    int64_t mediatime;
>> +    int64_t skip = 0;
>> +
>> +    delay = av_rescale_rnd(track->start_dts + track->start_cts,
>> MOV_TIMESCALE,
>>                            track->timescale, AV_ROUND_DOWN);
>> +
>> +    if (track->par->codec_type == AVMEDIA_TYPE_AUDIO &&
>> +        track->par->initial_padding > 0) {
>> +        /* Adjust delay so that initial_padding gets recorded in the
>> +         * MediaTime of an edit list entry even in the case that
>> +         * delay is positive. I.e. we don't want initial_padding to be
>> +         * absorbed and hidden in the delay. MediaTime must contain
>> +         * initial_padding in order to know where the actual media
>> +         * timeline begins. A player should drop samples until
>> MediaTime
>> +         * is reached */
>> +        delay += av_rescale_rnd(track->par->initial_padding,
>> MOV_TIMESCALE,
>> +                                track->par->sample_rate,
>> AV_ROUND_DOWN);
>
>> +        skip = av_rescale_rnd(track->par->initial_padding,
>> +                              track->timescale,
>> +                              track->par->sample_rate, AV_ROUND_DOWN);
>> +    }
>>     version |= delay < INT32_MAX ? 0 : 1;
>>
>>     entry_size = (version == 1) ? 20 : 12;
>> @@ -1731,33 +1750,37 @@ static int mov_write_edts_tag(AVIOContext
>> *pb, MOVMuxContext *mov,
>>         }
>>         avio_wb32(pb, 0x00010000);
>>     } else {
>> -        /* Avoid accidentally ending up with start_ct = -1 which has
>> got a
>> -         * special meaning. Normally start_ct should end up positive
>> or zero
>> -         * here, but use FFMIN in case dts is a a small positive
>> integer
>> -         * rounded to 0 when represented in MOV_TIMESCALE units. */
>> -        start_ct  = -FFMIN(track->start_dts, 0);
>> -        /* Note, this delay is calculated from the pts of the first
>> sample,
>> -         * ensuring that we don't reduce the duration for cases with
>> -         * dts<0 pts=0. */
>> -        duration += delay;
>> +        /* Avoid accidentally ending up with mediatime = -1 which
>> has got a
>> +         * special meaning. skip and -track->start_dts are
>> guaranteed to be
>> +         * positive here, so it is not possible mediatime to be -1 */
>> +        skip = FFMAX(skip, -track->start_dts - track->start_cts);
>
> I don't see how it's guaranteed that those numbers are
> positive/nonnegative. Example:
>
> track->timebase = 44100
> track->start_dts = 10
>

You are correct, the comment is misleading.  I'll come up with something
better.

> This should end up with delay = 0, which hits this branch (delay > 0
> doesn't trigger), ending up with -track->start_dts as a negative value.
>
> With the FFMAX here though, it does seem to guarantee the same, so it
> probably is fine technically, but the comment has me confused here.
>
> With the logic changed, I think this comment actually adds more
> confusion than what it helps. It could perhaps instead explain what
> this achieves instead, i.e. if start_pts < 0, we should also skip some
> data, just like for initial padding, and skipping according to
> whichever of them is larger.
>
>
> Overall I like where you're going with this. I see that this passes
> fate-movenc, so it doesn't change the actual output, at last for the
> cases tested there.
>
> I'd appreciate if you could add a few cases in that test
> (libavformat/tests/movenc.c) to validate all the different corner
> cases that this code touches. This test writes synthetic test data in
> different setups and hashes the results. By running it as
> ./libavformat/tests/movenc -w, it will write the actual output files
> to disk. Since it's synthetic data (i.e. broken/fake packet payloads),
> it's mostly interesting to look at the produced edit lists with
> boxdumper or so.

I've been tripped up by a few corner cases while working on this.  I
found yet another today that I'll need to modify this patch for one more
time (round-off error causing a 1ms shift in delay).  I'll have a look
at the existing tests and see what can be added to improve coverage.  It
would help if someone could resolve my problem with uploading to fate
samples.  I mentioned this on IRC and I'm not the only one that has had
a problem with this.  The wiki says that if you have commit access you
should automatically have fate upload privileges.  But that doesn't seem
to be the case.

>
> E.g. first test something like video with b-frames starting from both
> pts=0 and pts > 0 (it does test the former but not the latter iirc).
> If that test is added before doing this change, we can be sure that we
> don't break/change those parts that aren't intended to be changed.
>
> For audio, this test currently doesn't set the initial_padding field
> at all. The test could be extended to test audio both with and without
> that set
>
> And later we can add cases for audio with initial padding starting
> from pts<0 but where the padding makes it start with a delay > 0, etc.
>
>
> This function is really hard to grasp fully in general. I played with
> the thought of trying to achieve the same as you're doing, but by
> refactoring through a few smaller steps, to keep track of it better.
> On the other hand, I'm not sure if the main change can be done much
> smaller than what you do here - the more I think about it, the closer
> I end up with exactly what you have here.
>
>
> After reading all of the code a few more times while typing this
> review comment, I'm pretty convinced that your patch is good. But then
> my main request would be to extend the movenc unit test file, to have
> more extensive tests of the cases that shouldn't be changed, and new
> tests for cases with/without initial_padding set.
>

FYI, my ultimate goal with this is to be able to do multi-generation
reencodes with the result being that first pts and duration of all
tracks remains constant through all encodes.  This requires another
patch to mov.c to recover initial_padding from the edit list MediaTime. 
I have something that is almost ready, though I expect there to be some
discussion about my methods since there's really no definitive way to
know if the edit is due to encoder delay.  But you'll see this tomorrow
I expect.

> // Martin
> _______________________________________________
> libav-devel mailing list
> libav-devel@libav.org
> https://lists.libav.org/mailman/listinfo/libav-devel
Martin Storsjö Feb. 21, 2017, 9:57 p.m. | #3
On Mon, 20 Feb 2017, John Stebbins wrote:

> On 02/20/2017 03:27 PM, Martin Storsjö wrote:
>> I'd appreciate if you could add a few cases in that test
>> (libavformat/tests/movenc.c) to validate all the different corner
>> cases that this code touches. This test writes synthetic test data in
>> different setups and hashes the results. By running it as
>> ./libavformat/tests/movenc -w, it will write the actual output files
>> to disk. Since it's synthetic data (i.e. broken/fake packet payloads),
>> it's mostly interesting to look at the produced edit lists with
>> boxdumper or so.
>
> I've been tripped up by a few corner cases while working on this.  I
> found yet another today that I'll need to modify this patch for one more
> time (round-off error causing a 1ms shift in delay).  I'll have a look
> at the existing tests and see what can be added to improve coverage.  It
> would help if someone could resolve my problem with uploading to fate
> samples.  I mentioned this on IRC and I'm not the only one that has had
> a problem with this.  The wiki says that if you have commit access you
> should automatically have fate upload privileges.  But that doesn't seem
> to be the case.

FWIW, the movenc unit test doesn't need/use any external samples or 
reference files, but you should have access to the fate samples in any 
case.

// Martin

Patch

diff --git a/libavformat/movenc.c b/libavformat/movenc.c
index 689291d..b2c0c92 100644
--- a/libavformat/movenc.c
+++ b/libavformat/movenc.c
@@ -1698,9 +1698,28 @@  static int mov_write_edts_tag(AVIOContext *pb, MOVMuxContext *mov,
                                       track->timescale, AV_ROUND_UP);
     int version = duration < INT32_MAX ? 0 : 1;
     int entry_size, entry_count, size;
-    int64_t delay, start_ct = track->start_cts;
-    delay = av_rescale_rnd(track->start_dts + start_ct, MOV_TIMESCALE,
+    int64_t delay;
+    int64_t mediatime;
+    int64_t skip = 0;
+
+    delay = av_rescale_rnd(track->start_dts + track->start_cts, MOV_TIMESCALE,
                            track->timescale, AV_ROUND_DOWN);
+
+    if (track->par->codec_type == AVMEDIA_TYPE_AUDIO &&
+        track->par->initial_padding > 0) {
+        /* Adjust delay so that initial_padding gets recorded in the
+         * MediaTime of an edit list entry even in the case that
+         * delay is positive. I.e. we don't want initial_padding to be
+         * absorbed and hidden in the delay. MediaTime must contain
+         * initial_padding in order to know where the actual media
+         * timeline begins. A player should drop samples until MediaTime
+         * is reached */
+        delay += av_rescale_rnd(track->par->initial_padding, MOV_TIMESCALE,
+                                track->par->sample_rate, AV_ROUND_DOWN);
+        skip = av_rescale_rnd(track->par->initial_padding,
+                              track->timescale,
+                              track->par->sample_rate, AV_ROUND_DOWN);
+    }
     version |= delay < INT32_MAX ? 0 : 1;
 
     entry_size = (version == 1) ? 20 : 12;
@@ -1731,33 +1750,37 @@  static int mov_write_edts_tag(AVIOContext *pb, MOVMuxContext *mov,
         }
         avio_wb32(pb, 0x00010000);
     } else {
-        /* Avoid accidentally ending up with start_ct = -1 which has got a
-         * special meaning. Normally start_ct should end up positive or zero
-         * here, but use FFMIN in case dts is a a small positive integer
-         * rounded to 0 when represented in MOV_TIMESCALE units. */
-        start_ct  = -FFMIN(track->start_dts, 0);
-        /* Note, this delay is calculated from the pts of the first sample,
-         * ensuring that we don't reduce the duration for cases with
-         * dts<0 pts=0. */
-        duration += delay;
+        /* Avoid accidentally ending up with mediatime = -1 which has got a
+         * special meaning. skip and -track->start_dts are guaranteed to be
+         * positive here, so it is not possible mediatime to be -1 */
+        skip = FFMAX(skip, -track->start_dts - track->start_cts);
     }
+    mediatime = skip + track->start_cts;
+
+    /* skip is the duration of the media segment that will be dropped
+     * during playback when an edit entry is applied.  The edit entry
+     * duration must be reduced by this amount. */
+    duration -= av_rescale_rnd(skip, MOV_TIMESCALE,
+                               track->timescale, AV_ROUND_UP);
 
     /* For fragmented files, we don't know the full length yet. Setting
      * duration to 0 allows us to only specify the offset, including
      * the rest of the content (from all future fragments) without specifying
      * an explicit duration. */
-    if (mov->flags & FF_MOV_FLAG_FRAGMENT)
+    if (mov->flags & FF_MOV_FLAG_FRAGMENT || duration < 0)
         duration = 0;
 
-    /* duration */
+    /* add edit entry that defines the presentation time of the first
+     * sample to render during playback and the duration of the segment */
     if (version == 1) {
         avio_wb64(pb, duration);
-        avio_wb64(pb, start_ct);
+        avio_wb64(pb, mediatime);
     } else {
         avio_wb32(pb, duration);
-        avio_wb32(pb, start_ct);
+        avio_wb32(pb, mediatime);
     }
     avio_wb32(pb, 0x00010000);
+
     return size;
 }