[05/14] movenc: Add an option for delaying writing the moov with empty_moov

Message ID 1419862828-30060-5-git-send-email-martin@martin.st
State Committed
Headers show

Commit Message

Martin Storsjö Dec. 29, 2014, 2:20 p.m.
This delays writing the moov until the first fragment is written,
or can be flushed by the caller explicitly when wanted. If the first
sample in all streams is available at this point, we can write
a proper editlist at this point, allowing streams to start at
something else than dts=0. For AC3 and DNXHD, a packet is
needed in order to write the moov header properly.

This isn't added to the normal behaviour for empty_moov, since
the behaviour that ftyp+moov is written during avformat_write_header
would be changed. Callers that split the output stream into header+segments
(either by flushing manually, with the custom_frag flag set, or by
just differentiating between data written during avformat_write_header
and the rest) will need to be adjusted to take this option into use.

For handling streams that start at something else than dts=0, an
alternative would be to use different kinds of heuristics for
guessing the start dts (using AVCodecContext delay or has_b_frames
together with the frame rate), but this is not reliable and doesn't
necessarily work well with stream copy, and wouldn't work for getting
the right initialization data for AC3 or DNXHD either.
---
 libavformat/movenc.c | 71 ++++++++++++++++++++++++++++++++++++++++++----------
 libavformat/movenc.h |  1 +
 2 files changed, 59 insertions(+), 13 deletions(-)

Comments

Derek Buitenhuis Dec. 29, 2014, 5:11 p.m. | #1
On 12/29/2014 2:20 PM, Martin Storsjö wrote:
> This delays writing the moov until the first fragment is written,
> or can be flushed by the caller explicitly when wanted. If the first
> sample in all streams is available at this point, we can write
> a proper editlist at this point, allowing streams to start at
> something else than dts=0. For AC3 and DNXHD, a packet is
> needed in order to write the moov header properly.


Aside/unrelated: Why don't AC3 and DNXHD use the global header flag
with extradata like other codecs?

> This isn't added to the normal behaviour for empty_moov, since
> the behaviour that ftyp+moov is written during avformat_write_header
> would be changed. Callers that split the output stream into header+segments
> (either by flushing manually, with the custom_frag flag set, or by
> just differentiating between data written during avformat_write_header
> and the rest) will need to be adjusted to take this option into use.

Perhaps I am having trouble understanding here. Does this mean callers
already doing this (me) need to update their code? I'm not totally clear
on this. It looks like it should require a minor version bump.

> For handling streams that start at something else than dts=0, an
> alternative would be to use different kinds of heuristics for
> guessing the start dts (using AVCodecContext delay or has_b_frames
> together with the frame rate), but this is not reliable and doesn't
> necessarily work well with stream copy, and wouldn't work for getting
> the right initialization data for AC3 or DNXHD either.

[...]

> +    { "delay_moov", "Delay writing the initial moov (until the first fragment is cut, or until the first fragment flush)", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_DELAY_MOOV}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" },

No need for paren really.

Code seems to be OK and straightforward, I believe.

- Derek
Martin Storsjö Dec. 29, 2014, 5:22 p.m. | #2
On Mon, 29 Dec 2014, Derek Buitenhuis wrote:

> On 12/29/2014 2:20 PM, Martin Storsjö wrote:
>> This delays writing the moov until the first fragment is written,
>> or can be flushed by the caller explicitly when wanted. If the first
>> sample in all streams is available at this point, we can write
>> a proper editlist at this point, allowing streams to start at
>> something else than dts=0. For AC3 and DNXHD, a packet is
>> needed in order to write the moov header properly.
>
>
> Aside/unrelated: Why don't AC3 and DNXHD use the global header flag
> with extradata like other codecs?

Probably because it's not extradata per se (e.g. a normal stream of 
packets can be decoded just fine without this extra piece of data), just 
that the headers in a moov atom contain fields that you can't deduce 
without having a sample available.

>> This isn't added to the normal behaviour for empty_moov, since
>> the behaviour that ftyp+moov is written during avformat_write_header
>> would be changed. Callers that split the output stream into header+segments
>> (either by flushing manually, with the custom_frag flag set, or by
>> just differentiating between data written during avformat_write_header
>> and the rest) will need to be adjusted to take this option into use.
>
> Perhaps I am having trouble understanding here. Does this mean callers
> already doing this (me) need to update their code? I'm not totally clear
> on this.

No, callers using the mov muxer for fragmenting/segmenting won't need to 
change their code. If you want to take the extra/better/fancier behaviour 
into use, you need to add the new flag - without it, your code will behave 
as it did before.

I.e., ideally one might want this to be the default behaviour (since it's 
allows doing a lot of things you couldn't do before) but I'm arguing it 
can't be made default since it'd break existing setups (and the behaviour 
is slightly less obvious), thus, an extra flag.

> It looks like it should require a minor version bump.

Good point, will add that.

>> For handling streams that start at something else than dts=0, an
>> alternative would be to use different kinds of heuristics for
>> guessing the start dts (using AVCodecContext delay or has_b_frames
>> together with the frame rate), but this is not reliable and doesn't
>> necessarily work well with stream copy, and wouldn't work for getting
>> the right initialization data for AC3 or DNXHD either.
>
> [...]
>
>> +    { "delay_moov", "Delay writing the initial moov (until the first fragment is cut, or until the first fragment flush)", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_DELAY_MOOV}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" },
>
> No need for paren really.

Ok, will change.

// Martin
Derek Buitenhuis Dec. 29, 2014, 5:30 p.m. | #3
On 12/29/2014 5:22 PM, Martin Storsjö wrote:
>> Aside/unrelated: Why don't AC3 and DNXHD use the global header flag
>> with extradata like other codecs?
> 
> Probably because it's not extradata per se (e.g. a normal stream of 
> packets can be decoded just fine without this extra piece of data), just 
> that the headers in a moov atom contain fields that you can't deduce 
> without having a sample available.

Well that sucks. I'm surprised we don't leave this up to the caller to
fill in (via find_stream_info or something), but that's probably treading
into Lovecraftian territory.

>> Perhaps I am having trouble understanding here. Does this mean callers
>> already doing this (me) need to update their code? I'm not totally clear
>> on this.
> 
> No, callers using the mov muxer for fragmenting/segmenting won't need to 
> change their code. If you want to take the extra/better/fancier behaviour 
> into use, you need to add the new flag - without it, your code will behave 
> as it did before.

OK.

> I.e., ideally one might want this to be the default behaviour (since it's 
> allows doing a lot of things you couldn't do before) but I'm arguing it 
> can't be made default since it'd break existing setups (and the behaviour 
> is slightly less obvious), thus, an extra flag.

Is there any place to document it? It isn't exactly possible to have super
verbose avoption descriptions, and many of the movflags seem to be quite
confusing (their interactions and stuff) when writing downstream code.

- Derek
Martin Storsjö Dec. 29, 2014, 5:38 p.m. | #4
On Mon, 29 Dec 2014, Derek Buitenhuis wrote:

> On 12/29/2014 5:22 PM, Martin Storsjö wrote:
>>> Aside/unrelated: Why don't AC3 and DNXHD use the global header flag
>>> with extradata like other codecs?
>>
>> Probably because it's not extradata per se (e.g. a normal stream of
>> packets can be decoded just fine without this extra piece of data), just
>> that the headers in a moov atom contain fields that you can't deduce
>> without having a sample available.
>
> Well that sucks. I'm surprised we don't leave this up to the caller to
> fill in (via find_stream_info or something), but that's probably treading
> into Lovecraftian territory.
>
>>> Perhaps I am having trouble understanding here. Does this mean callers
>>> already doing this (me) need to update their code? I'm not totally clear
>>> on this.
>>
>> No, callers using the mov muxer for fragmenting/segmenting won't need to
>> change their code. If you want to take the extra/better/fancier behaviour
>> into use, you need to add the new flag - without it, your code will behave
>> as it did before.
>
> OK.
>
>> I.e., ideally one might want this to be the default behaviour (since it's
>> allows doing a lot of things you couldn't do before) but I'm arguing it
>> can't be made default since it'd break existing setups (and the behaviour
>> is slightly less obvious), thus, an extra flag.
>
> Is there any place to document it? It isn't exactly possible to have super
> verbose avoption descriptions, and many of the movflags seem to be quite
> confusing (their interactions and stuff) when writing downstream code.

Good question. Much of the muxer option documentation in the normal 
documentation is aimed at the end-user of e.g. avconv, while many of these 
options only make sense to set via code at different points during the 
lifetime of the muxer. I guess some of it could be squeezed into the same 
place but with notices that it doesn't make sense to use from normal code.

FWIW, if you're writing a fragmented mp4 file using avconv, you can use 
this flag without any problems; the output stream looks just as it did 
before (except better, with fields filled in better), only that some parts 
of it are written later than it used to.

// Martin

Patch

diff --git a/libavformat/movenc.c b/libavformat/movenc.c
index 22ba3fd..fe95b84 100644
--- a/libavformat/movenc.c
+++ b/libavformat/movenc.c
@@ -61,6 +61,7 @@  static const AVOption options[] = {
     { "default_base_moof", "Set the default-base-is-moof flag in tfhd atoms", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_DEFAULT_BASE_MOOF}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" },
     { "dash", "Write DASH compatible fragmented MP4", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_DASH}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" },
     { "frag_discont", "Signal that the next fragment is discontinuous from earlier ones", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_FRAG_DISCONT}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" },
+    { "delay_moov", "Delay writing the initial moov (until the first fragment is cut, or until the first fragment flush)", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_DELAY_MOOV}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" },
     FF_RTP_FLAG_OPTS(MOVMuxContext, rtp_flags),
     { "skip_iods", "Skip writing iods atom.", offsetof(MOVMuxContext, iods_skip), AV_OPT_TYPE_INT, {.i64 = 0}, 0, 1, AV_OPT_FLAG_ENCODING_PARAM},
     { "iods_audio_profile", "iods audio profile atom.", offsetof(MOVMuxContext, iods_audio_profile), AV_OPT_TYPE_INT, {.i64 = -1}, -1, 255, AV_OPT_FLAG_ENCODING_PARAM},
@@ -1260,7 +1261,7 @@  static int mov_write_stbl_tag(AVIOContext *pb, MOVTrack *track)
     if (track->mode == MODE_MOV && track->flags & MOV_TRACK_STPS)
         mov_write_stss_tag(pb, track, MOV_PARTIAL_SYNC_SAMPLE);
     if (track->enc->codec_type == AVMEDIA_TYPE_VIDEO &&
-        track->flags & MOV_TRACK_CTTS)
+        track->flags & MOV_TRACK_CTTS && track->entry)
         mov_write_ctts_tag(pb, track);
     mov_write_stsc_tag(pb, track);
     mov_write_stsz_tag(pb, track);
@@ -1771,6 +1772,12 @@  static int mov_write_trak_tag(AVIOContext *pb, MOVMuxContext *mov,
                               MOVTrack *track, AVStream *st)
 {
     int64_t pos = avio_tell(pb);
+    int entry_backup = track->entry;
+    /* If we want to have an empty moov, but some samples already have been
+     * buffered (delay_moov), pretend that no samples have been written yet. */
+    if (mov->flags & FF_MOV_FLAG_EMPTY_MOOV)
+        track->entry = 0;
+
     avio_wb32(pb, 0); /* size */
     ffio_wfourcc(pb, "trak");
     mov_write_tkhd_tag(pb, mov, track, st);
@@ -1802,6 +1809,7 @@  static int mov_write_trak_tag(AVIOContext *pb, MOVMuxContext *mov,
         }
     }
     mov_write_track_udta_tag(pb, mov, st);
+    track->entry = entry_backup;
     return update_size(pb, pos);
 }
 
@@ -1876,6 +1884,12 @@  static int mov_write_mvhd_tag(AVIOContext *pb, MOVMuxContext *mov)
                 max_track_id = mov->tracks[i].track_id;
         }
     }
+    /* If using delay_moov, make sure the output is the same as if no
+     * samples had been written yet. */
+    if (mov->flags & FF_MOV_FLAG_EMPTY_MOOV) {
+        max_track_len = 0;
+        max_track_id  = 1;
+    }
 
     version = max_track_len < UINT32_MAX ? 0 : 1;
     (version == 1) ? avio_wb32(pb, 120) : avio_wb32(pb, 108); /* size */
@@ -3110,8 +3124,18 @@  static int mov_flush_fragment(AVFormatContext *s)
         for (i = 0; i < mov->nb_streams; i++)
             mov->tracks[i].data_offset = pos + buf_size + 8;
 
+        if (mov->flags & FF_MOV_FLAG_DELAY_MOOV)
+            mov_write_identification(s->pb, s);
         mov_write_moov_tag(s->pb, mov, s);
 
+        if (mov->flags & FF_MOV_FLAG_DELAY_MOOV) {
+            if (mov->flags & FF_MOV_FLAG_FASTSTART)
+                mov->reserved_moov_pos = avio_tell(s->pb);
+            avio_flush(s->pb);
+            mov->fragments++;
+            return 0;
+        }
+
         buf_size = avio_close_dyn_buf(mov->mdat_buf, &buf);
         mov->mdat_buf = NULL;
         avio_wb32(s->pb, buf_size + 8);
@@ -3194,6 +3218,19 @@  static int mov_flush_fragment(AVFormatContext *s)
     return 0;
 }
 
+static int mov_auto_flush_fragment(AVFormatContext *s)
+{
+    MOVMuxContext *mov = s->priv_data;
+    int ret = mov_flush_fragment(s);
+    if (ret < 0)
+        return ret;
+    // If using delay_moov, the first flush only wrote the moov,
+    // not the actual moof+mdat pair, thus flush once again.
+    if (mov->fragments == 1 && mov->flags & FF_MOV_FLAG_DELAY_MOOV)
+        ret = mov_flush_fragment(s);
+    return ret;
+}
+
 int ff_mov_write_packet(AVFormatContext *s, AVPacket *pkt)
 {
     MOVMuxContext *mov = s->priv_data;
@@ -3206,7 +3243,7 @@  int ff_mov_write_packet(AVFormatContext *s, AVPacket *pkt)
 
     if (mov->flags & FF_MOV_FLAG_FRAGMENT) {
         int ret;
-        if (mov->fragments > 0) {
+        if (mov->fragments > 0 || mov->flags & FF_MOV_FLAG_EMPTY_MOOV) {
             if (!trk->mdat_buf) {
                 if ((ret = avio_open_dyn_buf(&trk->mdat_buf)) < 0)
                     return ret;
@@ -3334,11 +3371,12 @@  int ff_mov_write_packet(AVFormatContext *s, AVPacket *pkt)
             trk->frag_start   = pkt->dts;
             trk->start_dts    = 0;
             trk->frag_discont = 0;
-        } else if (pkt->dts && mov->flags & FF_MOV_FLAG_EMPTY_MOOV)
+        } else if (mov->fragments >= 1)
             av_log(s, AV_LOG_WARNING,
-                   "Track %d starts with a nonzero dts %"PRId64". This "
-                   "currently isn't handled correctly in combination with "
-                   "empty_moov.\n", pkt->stream_index, pkt->dts);
+                   "Track %d starts with a nonzero dts %"PRId64", while the moov "
+                   "already has been written. Set the delay_moov flag to handle "
+                   "this case.\n",
+                   pkt->stream_index, pkt->dts);
     }
     trk->track_duration = pkt->dts - trk->start_dts + pkt->duration;
 
@@ -3413,7 +3451,7 @@  static int mov_write_packet(AVFormatContext *s, AVPacket *pkt)
               enc->codec_type == AVMEDIA_TYPE_VIDEO &&
               trk->entry && pkt->flags & AV_PKT_FLAG_KEY)) {
             if (frag_duration >= mov->min_fragment_duration)
-                mov_flush_fragment(s);
+                mov_auto_flush_fragment(s);
         }
 
         return ff_mov_write_packet(s, pkt);
@@ -3636,6 +3674,9 @@  static int mov_write_header(AVFormatContext *s)
         else if (!strcmp("f4v", s->oformat->name)) mov->mode = MODE_F4V;
     }
 
+    if (mov->flags & FF_MOV_FLAG_DELAY_MOOV)
+        mov->flags |= FF_MOV_FLAG_EMPTY_MOOV;
+
     /* Set the FRAGMENT flag if any of the fragmentation methods are
      * enabled. */
     if (mov->max_fragment_duration || mov->max_fragment_size ||
@@ -3663,8 +3704,9 @@  static int mov_write_header(AVFormatContext *s)
                 mov->use_editlist = 0;
         }
     }
-    if (mov->flags & FF_MOV_FLAG_EMPTY_MOOV && mov->use_editlist)
-        av_log(s, AV_LOG_WARNING, "No meaningful edit list will be written when using empty_moov\n");
+    if (mov->flags & FF_MOV_FLAG_EMPTY_MOOV &&
+        !(mov->flags & FF_MOV_FLAG_DELAY_MOOV) && mov->use_editlist)
+        av_log(s, AV_LOG_WARNING, "No meaningful edit list will be written when using empty_moov without delay_moov\n");
 
     if (!mov->use_editlist && s->avoid_negative_ts == AVFMT_AVOID_NEG_TS_AUTO)
         s->avoid_negative_ts = AVFMT_AVOID_NEG_TS_MAKE_ZERO;
@@ -3678,8 +3720,10 @@  static int mov_write_header(AVFormatContext *s)
     }
 
 
-    if ((ret = mov_write_identification(pb, s)) < 0)
-        return ret;
+    if (!(mov->flags & FF_MOV_FLAG_DELAY_MOOV)) {
+        if ((ret = mov_write_identification(pb, s)) < 0)
+            return ret;
+    }
 
     mov->nb_streams = s->nb_streams;
     if (mov->mode & (MODE_MP4|MODE_MOV|MODE_IPOD) && s->nb_chapters)
@@ -3828,7 +3872,8 @@  static int mov_write_header(AVFormatContext *s)
     if (mov->flags & FF_MOV_FLAG_ISML)
         mov_write_isml_manifest(pb, mov);
 
-    if (mov->flags & FF_MOV_FLAG_EMPTY_MOOV) {
+    if (mov->flags & FF_MOV_FLAG_EMPTY_MOOV &&
+        !(mov->flags & FF_MOV_FLAG_DELAY_MOOV)) {
         mov_write_moov_tag(pb, mov, s);
         mov->fragments++;
         if (mov->flags & FF_MOV_FLAG_FASTSTART)
@@ -4025,7 +4070,7 @@  static int mov_write_trailer(AVFormatContext *s)
             mov_write_moov_tag(pb, mov, s);
         }
     } else {
-        mov_flush_fragment(s);
+        mov_auto_flush_fragment(s);
         for (i = 0; i < mov->nb_streams; i++)
            mov->tracks[i].data_offset = 0;
         if (mov->flags & FF_MOV_FLAG_FASTSTART) {
diff --git a/libavformat/movenc.h b/libavformat/movenc.h
index 97c0583..8b1084e 100644
--- a/libavformat/movenc.h
+++ b/libavformat/movenc.h
@@ -185,6 +185,7 @@  typedef struct MOVMuxContext {
 #define FF_MOV_FLAG_DEFAULT_BASE_MOOF     (1 << 10)
 #define FF_MOV_FLAG_DASH                  (1 << 11)
 #define FF_MOV_FLAG_FRAG_DISCONT          (1 << 12)
+#define FF_MOV_FLAG_DELAY_MOOV            (1 << 13)
 
 int ff_mov_write_packet(AVFormatContext *s, AVPacket *pkt);