[06/10] movenc: Add support for writing sidx atoms for DASH segments

Message ID 1415365019-26521-6-git-send-email-martin@martin.st
State Superseded
Headers show

Commit Message

Martin Storsjö Nov. 7, 2014, 12:56 p.m.
A flag "dash" is added, which enables the necessary flags for
creating DASH compatible fragments.

When this is enabled, one sidx atom is written for each track
before every moof atom.
---
 Changelog             |  1 +
 libavformat/movenc.c  | 86 +++++++++++++++++++++++++++++++++++++++++++++++++--
 libavformat/movenc.h  |  1 +
 libavformat/version.h |  2 +-
 4 files changed, 87 insertions(+), 3 deletions(-)

Comments

Derek Buitenhuis Nov. 14, 2014, 2:25 p.m. | #1
On 11/7/2014 12:56 PM, Martin Storsjö wrote:
> A flag "dash" is added, which enables the necessary flags for
> creating DASH compatible fragments.
> 
> When this is enabled, one sidx atom is written for each track
> before every moof atom.
> ---

[...] Went through some of this on IRC, so skipping chunks here.

> +static int mov_write_sidx_tag(AVIOContext *pb,
> +                              MOVTrack *track, int ref_size, int total_sidx_size)
> +{
> +    int64_t pos = avio_tell(pb), offset_pos, end_pos;
> +    int64_t presentation_time = track->start_dts + track->frag_start +
> +                                track->cluster[0].cts;

Aren't MP4 (not MOV) timestamps unsigned?

> +    avio_wb32(pb, 0); /* size */
> +    ffio_wfourcc(pb, "sidx");
> +    avio_w8(pb, 1); /* version */

I swear I saw some libs write 0... though I doubt this field means
anything in a practical sense.

> +        // First run one round to calculate the total size of all
> +        // sidx atoms.
> +        // This would be much simpler if we'd only write one sidx
> +        // atom, for the first track in the moof.

Diego-nit: boxes. ;)

> +    if (mov->flags & FF_MOV_FLAG_DASH)
> +        ffio_wfourcc(pb, "dash");
> +

What about msdh and msix? I thought 'dash' was only for
"Indexed self-initializing Media Segment"s?

- Derek
Martin Storsjö Nov. 14, 2014, 2:36 p.m. | #2
On Fri, 14 Nov 2014, Derek Buitenhuis wrote:

> On 11/7/2014 12:56 PM, Martin Storsjö wrote:
>> A flag "dash" is added, which enables the necessary flags for
>> creating DASH compatible fragments.
>>
>> When this is enabled, one sidx atom is written for each track
>> before every moof atom.
>> ---
>
> [...] Went through some of this on IRC, so skipping chunks here.
>
>> +static int mov_write_sidx_tag(AVIOContext *pb,
>> +                              MOVTrack *track, int ref_size, int total_sidx_size)
>> +{
>> +    int64_t pos = avio_tell(pb), offset_pos, end_pos;
>> +    int64_t presentation_time = track->start_dts + track->frag_start +
>> +                                track->cluster[0].cts;
>
> Aren't MP4 (not MOV) timestamps unsigned?

Yes, in principle. After the "if (presentation_time < 0) presentation_time 
= 0;" case below it will be nonnegative anyway, but in order to handle it 
correctly I'd rather keep it like this - I don't mind much losing the 
upper half of the range :-)

>> +    avio_wb32(pb, 0); /* size */
>> +    ffio_wfourcc(pb, "sidx");
>> +    avio_w8(pb, 1); /* version */
>
> I swear I saw some libs write 0... though I doubt this field means
> anything in a practical sense.

This field means I'll use 64 bit presentation and first_offset. I'm 
potentially wasting 8 bytes per box here when I'm not checking whether I 
actually need the 64 bit range...

>> +        // First run one round to calculate the total size of all
>> +        // sidx atoms.
>> +        // This would be much simpler if we'd only write one sidx
>> +        // atom, for the first track in the moof.
>
> Diego-nit: boxes. ;)

Most of movenc.c and mov.c talk about it as atoms instead of boxes, so I'm 
just being consistent with the rest of it :P

>> +    if (mov->flags & FF_MOV_FLAG_DASH)
>> +        ffio_wfourcc(pb, "dash");
>> +
>
> What about msdh and msix? I thought 'dash' was only for
> "Indexed self-initializing Media Segment"s?

For the individual segment files, I write those (in the styp box), but in 
dashenc.c in the following patch. For the full file itself, I only add the 
dash brand, which probably only makes sense if you'd use it with the 
global-sidx stuff that I'm adding in a later patch. (Perhaps I shouldn't 
be adding any dash brand at all unless I'm writing a global sidx index?)

// Martin
Derek Buitenhuis Nov. 14, 2014, 3:01 p.m. | #3
On 11/14/2014 2:36 PM, Martin Storsjö wrote:
>> Aren't MP4 (not MOV) timestamps unsigned?
> 
> Yes, in principle. After the "if (presentation_time < 0) presentation_time 
> = 0;" case below it will be nonnegative anyway, but in order to handle it 
> correctly I'd rather keep it like this - I don't mind much losing the 
> upper half of the range :-)

As long as we don't produce any of those evil invalid MP4 files with signed
timestamps.

Also, how does it behave if input has unsigned 64-bit large timestamps?

>> I swear I saw some libs write 0... though I doubt this field means
>> anything in a practical sense.
> 
> This field means I'll use 64 bit presentation and first_offset. I'm 
> potentially wasting 8 bytes per box here when I'm not checking whether I 
> actually need the 64 bit range...

OK.

>> Diego-nit: boxes. ;)
> 
> Most of movenc.c and mov.c talk about it as atoms instead of boxes, so I'm 
> just being consistent with the rest of it :P

Figured as much.

>> What about msdh and msix? I thought 'dash' was only for
>> "Indexed self-initializing Media Segment"s?
> 
> For the individual segment files, I write those (in the styp box), but in 
> dashenc.c in the following patch.

Yeah I saw that later.

> (Perhaps I shouldn't be adding any dash brand at all unless I'm writing a global sidx index?)

That was my thought, yes.

- Derek
Martin Storsjö Nov. 14, 2014, 6:33 p.m. | #4
On Fri, 14 Nov 2014, Derek Buitenhuis wrote:

> On 11/14/2014 2:36 PM, Martin Storsjö wrote:
>>> Aren't MP4 (not MOV) timestamps unsigned?
>>
>> Yes, in principle. After the "if (presentation_time < 0) presentation_time
>> = 0;" case below it will be nonnegative anyway, but in order to handle it
>> correctly I'd rather keep it like this - I don't mind much losing the
>> upper half of the range :-)
>
> As long as we don't produce any of those evil invalid MP4 files with signed
> timestamps.

Hmm, you mean with pts < dts? That's at least a different issue than this, 
and yes, we shouldn't produce such files.

> Also, how does it behave if input has unsigned 64-bit large timestamps?

Then we're probably screwed (aka interpreting it as the pts<0 case here). 
Note though that both pts and dts are int64_t in lavf, so you'd be out of 
range when passing that into the muxer in the first place.

// Martin
Derek Buitenhuis Nov. 14, 2014, 6:50 p.m. | #5
On 11/14/2014 6:33 PM, Martin Storsjö wrote:
> Hmm, you mean with pts < dts? That's at least a different issue than this, 
> and yes, we shouldn't produce such files.

I meant MP4 files that use MOV-style timestamps (great than 1<<31 for negative).

- Derek
Martin Storsjö Nov. 14, 2014, 7:41 p.m. | #6
On Fri, 14 Nov 2014, Derek Buitenhuis wrote:

> On 11/14/2014 6:33 PM, Martin Storsjö wrote:
>> Hmm, you mean with pts < dts? That's at least a different issue than this,
>> and yes, we shouldn't produce such files.
>
> I meant MP4 files that use MOV-style timestamps (great than 1<<31 for negative).

Right, no, that shouldn't really happen. Since the input to lavf is 
int64_t, those would be treated as negative, and trimmed out (or shifted 
to start at 0), and you wouldn't end up with negative values here at 
least.

// Martin

Patch

diff --git a/Changelog b/Changelog
index c51fa8f..ecec401 100644
--- a/Changelog
+++ b/Changelog
@@ -6,6 +6,7 @@  version <next>:
 - HEVC/H.265 RTP payload format (draft v6) packetizer and depacketizer
 - avplay now exits by default at the end of playback
 - XCB-based screen-grabber
+- creating DASH compatible fragmented MP4
 
 
 version 11:
diff --git a/libavformat/movenc.c b/libavformat/movenc.c
index 8d378c4..157ca17 100644
--- a/libavformat/movenc.c
+++ b/libavformat/movenc.c
@@ -59,6 +59,7 @@  static const AVOption options[] = {
     { "omit_tfhd_offset", "Omit the base data offset in tfhd atoms", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_OMIT_TFHD_OFFSET}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" },
     { "disable_chpl", "Disable Nero chapter atom", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_DISABLE_CHPL}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" },
     { "default_base_moof", "Set the default-base-is-moof flag in tfhd atoms", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_DEFAULT_BASE_MOOF}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" },
+    { "dash", "DASH", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_DASH}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" },
     FF_RTP_FLAG_OPTS(MOVMuxContext, rtp_flags),
     { "skip_iods", "Skip writing iods atom.", offsetof(MOVMuxContext, iods_skip), AV_OPT_TYPE_INT, {.i64 = 0}, 0, 1, AV_OPT_FLAG_ENCODING_PARAM},
     { "iods_audio_profile", "iods audio profile atom.", offsetof(MOVMuxContext, iods_audio_profile), AV_OPT_TYPE_INT, {.i64 = -1}, -1, 255, AV_OPT_FLAG_ENCODING_PARAM},
@@ -2675,7 +2676,78 @@  static int mov_write_moof_tag_internal(AVIOContext *pb, MOVMuxContext *mov,
     return update_size(pb, pos);
 }
 
-static int mov_write_moof_tag(AVIOContext *pb, MOVMuxContext *mov, int tracks)
+static int mov_write_sidx_tag(AVIOContext *pb,
+                              MOVTrack *track, int ref_size, int total_sidx_size)
+{
+    int64_t pos = avio_tell(pb), offset_pos, end_pos;
+    int64_t presentation_time = track->start_dts + track->frag_start +
+                                track->cluster[0].cts;
+    int64_t duration = track->start_dts + track->track_duration -
+                       track->cluster[0].dts;
+    int64_t offset;
+    int starts_with_SAP = track->cluster[0].flags & MOV_SYNC_SAMPLE;
+
+    // pts<0 should be cut away using edts
+    if (presentation_time < 0)
+        presentation_time = 0;
+
+    avio_wb32(pb, 0); /* size */
+    ffio_wfourcc(pb, "sidx");
+    avio_w8(pb, 1); /* version */
+    avio_wb24(pb, 0);
+    avio_wb32(pb, track->track_id); /* reference_ID */
+    avio_wb32(pb, track->timescale); /* timescale */
+    avio_wb64(pb, presentation_time); /* earliest_presentation_time */
+    offset_pos = avio_tell(pb);
+    avio_wb64(pb, 0); /* first_offset (offset to referenced moof) */
+    avio_wb16(pb, 0); /* reserved */
+    avio_wb16(pb, 1); /* reference_count */
+    avio_wb32(pb, (0 << 31) | (ref_size & 0x7fffffff)); /* reference_type (0 = media) | referenced_size */
+    avio_wb32(pb, duration); /* subsegment_duration */
+    avio_wb32(pb, (starts_with_SAP << 31) | (0 << 28) | 0); /* starts_with_SAP | SAP_type | SAP_delta_time */
+
+    end_pos = avio_tell(pb);
+    offset = pos + total_sidx_size - end_pos;
+    avio_seek(pb, offset_pos, SEEK_SET);
+    avio_wb64(pb, offset);
+    avio_seek(pb, end_pos, SEEK_SET);
+    return update_size(pb, pos);
+}
+
+static int mov_write_sidx_tags(AVIOContext *pb, MOVMuxContext *mov,
+                               int tracks, int ref_size)
+{
+    int i, round, ret;
+    AVIOContext *avio_buf;
+    int total_size = 0;
+    for (round = 0; round < 2; round++) {
+        // First run one round to calculate the total size of all
+        // sidx atoms.
+        // This would be much simpler if we'd only write one sidx
+        // atom, for the first track in the moof.
+        if (round == 0) {
+            if ((ret = ffio_open_null_buf(&avio_buf)) < 0)
+                return ret;
+        } else {
+            avio_buf = pb;
+        }
+        for (i = 0; i < mov->nb_streams; i++) {
+            MOVTrack *track = &mov->tracks[i];
+            if (tracks >= 0 && i != tracks)
+                continue;
+            if (!track->entry)
+                continue;
+            total_size -= mov_write_sidx_tag(avio_buf, track, ref_size,
+                                             total_size);
+        }
+        if (round == 0)
+            total_size = ffio_close_null_buf(avio_buf);
+    }
+    return 0;
+}
+
+static int mov_write_moof_tag(AVIOContext *pb, MOVMuxContext *mov, int tracks,
+                              int64_t mdat_size)
 {
     AVIOContext *avio_buf;
     int ret, moof_size;
@@ -2685,6 +2757,9 @@  static int mov_write_moof_tag(AVIOContext *pb, MOVMuxContext *mov, int tracks)
     mov_write_moof_tag_internal(avio_buf, mov, tracks, 0);
     moof_size = ffio_close_null_buf(avio_buf);
 
+    if (mov->flags & FF_MOV_FLAG_DASH)
+        mov_write_sidx_tags(pb, mov, tracks, moof_size + 8 + mdat_size);
+
     if ((ret = mov_add_tfra_entries(pb, mov, tracks)) < 0)
         return ret;
 
@@ -2821,6 +2896,10 @@  static int mov_write_ftyp_tag(AVIOContext *pb, AVFormatContext *s)
         ffio_wfourcc(pb, "MSNV");
     else if (mov->mode == MODE_MP4)
         ffio_wfourcc(pb, "mp41");
+
+    if (mov->flags & FF_MOV_FLAG_DASH)
+        ffio_wfourcc(pb, "dash");
+
     return update_size(pb, pos);
 }
 
@@ -3054,7 +3133,7 @@  static int mov_flush_fragment(AVFormatContext *s)
         if (write_moof) {
             avio_flush(s->pb);
 
-            mov_write_moof_tag(s->pb, mov, moof_tracks);
+            mov_write_moof_tag(s->pb, mov, moof_tracks, mdat_size);
             mov->fragments++;
 
             avio_wb32(s->pb, mdat_size + 8);
@@ -3504,6 +3583,9 @@  static int mov_write_header(AVFormatContext *s)
     if (mov->mode == MODE_ISM)
         mov->flags |= FF_MOV_FLAG_EMPTY_MOOV | FF_MOV_FLAG_SEPARATE_MOOF |
                       FF_MOV_FLAG_FRAGMENT;
+    if (mov->flags & FF_MOV_FLAG_DASH)
+        mov->flags |= FF_MOV_FLAG_FRAGMENT | FF_MOV_FLAG_EMPTY_MOOV |
+                      FF_MOV_FLAG_DEFAULT_BASE_MOOF;
 
     /* faststart: moov at the beginning of the file, if supported */
     if (mov->flags & FF_MOV_FLAG_FASTSTART) {
diff --git a/libavformat/movenc.h b/libavformat/movenc.h
index 1df5a5c..2a40b2f 100644
--- a/libavformat/movenc.h
+++ b/libavformat/movenc.h
@@ -180,6 +180,7 @@  typedef struct MOVMuxContext {
 #define FF_MOV_FLAG_OMIT_TFHD_OFFSET      (1 <<  8)
 #define FF_MOV_FLAG_DISABLE_CHPL          (1 <<  9)
 #define FF_MOV_FLAG_DEFAULT_BASE_MOOF     (1 << 10)
+#define FF_MOV_FLAG_DASH                  (1 << 11)
 
 int ff_mov_write_packet(AVFormatContext *s, AVPacket *pkt);
 
diff --git a/libavformat/version.h b/libavformat/version.h
index f8c5edb..c10a6b8 100644
--- a/libavformat/version.h
+++ b/libavformat/version.h
@@ -31,7 +31,7 @@ 
 
 #define LIBAVFORMAT_VERSION_MAJOR 56
 #define LIBAVFORMAT_VERSION_MINOR  6
-#define LIBAVFORMAT_VERSION_MICRO  4
+#define LIBAVFORMAT_VERSION_MICRO  5
 
 #define LIBAVFORMAT_VERSION_INT AV_VERSION_INT(LIBAVFORMAT_VERSION_MAJOR, \
                                                LIBAVFORMAT_VERSION_MINOR, \