Message ID | 1415365019-26521-6-git-send-email-martin@martin.st |
---|---|
State | Superseded |
Headers | show |
On 11/7/2014 12:56 PM, Martin Storsjö wrote: > A flag "dash" is added, which enables the necessary flags for > creating DASH compatible fragments. > > When this is enabled, one sidx atom is written for each track > before every moof atom. > --- [...] Went through some of this on IRC, so skipping chunks here. > +static int mov_write_sidx_tag(AVIOContext *pb, > + MOVTrack *track, int ref_size, int total_sidx_size) > +{ > + int64_t pos = avio_tell(pb), offset_pos, end_pos; > + int64_t presentation_time = track->start_dts + track->frag_start + > + track->cluster[0].cts; Aren't MP4 (not MOV) timestamps unsigned? > + avio_wb32(pb, 0); /* size */ > + ffio_wfourcc(pb, "sidx"); > + avio_w8(pb, 1); /* version */ I swear I saw some libs write 0... though I doubt this field means anything in a practical sense. > + // First run one round to calculate the total size of all > + // sidx atoms. > + // This would be much simpler if we'd only write one sidx > + // atom, for the first track in the moof. Diego-nit: boxes. ;) > + if (mov->flags & FF_MOV_FLAG_DASH) > + ffio_wfourcc(pb, "dash"); > + What about msdh and msix? I thought 'dash' was only for "Indexed self-initializing Media Segment"s? - Derek
On Fri, 14 Nov 2014, Derek Buitenhuis wrote: > On 11/7/2014 12:56 PM, Martin Storsjö wrote: >> A flag "dash" is added, which enables the necessary flags for >> creating DASH compatible fragments. >> >> When this is enabled, one sidx atom is written for each track >> before every moof atom. >> --- > > [...] Went through some of this on IRC, so skipping chunks here. > >> +static int mov_write_sidx_tag(AVIOContext *pb, >> + MOVTrack *track, int ref_size, int total_sidx_size) >> +{ >> + int64_t pos = avio_tell(pb), offset_pos, end_pos; >> + int64_t presentation_time = track->start_dts + track->frag_start + >> + track->cluster[0].cts; > > Aren't MP4 (not MOV) timestamps unsigned? Yes, in principle. After the "if (presentation_time < 0) presentation_time = 0;" case below it will be nonnegative anyway, but in order to handle it correctly I'd rather keep it like this - I don't mind much losing the upper half of the range :-) >> + avio_wb32(pb, 0); /* size */ >> + ffio_wfourcc(pb, "sidx"); >> + avio_w8(pb, 1); /* version */ > > I swear I saw some libs write 0... though I doubt this field means > anything in a practical sense. This field means I'll use 64 bit presentation and first_offset. I'm potentially wasting 8 bytes per box here when I'm not checking whether I actually need the 64 bit range... >> + // First run one round to calculate the total size of all >> + // sidx atoms. >> + // This would be much simpler if we'd only write one sidx >> + // atom, for the first track in the moof. > > Diego-nit: boxes. ;) Most of movenc.c and mov.c talk about it as atoms instead of boxes, so I'm just being consistent with the rest of it :P >> + if (mov->flags & FF_MOV_FLAG_DASH) >> + ffio_wfourcc(pb, "dash"); >> + > > What about msdh and msix? I thought 'dash' was only for > "Indexed self-initializing Media Segment"s? For the individual segment files, I write those (in the styp box), but in dashenc.c in the following patch. For the full file itself, I only add the dash brand, which probably only makes sense if you'd use it with the global-sidx stuff that I'm adding in a later patch. (Perhaps I shouldn't be adding any dash brand at all unless I'm writing a global sidx index?) // Martin
On 11/14/2014 2:36 PM, Martin Storsjö wrote: >> Aren't MP4 (not MOV) timestamps unsigned? > > Yes, in principle. After the "if (presentation_time < 0) presentation_time > = 0;" case below it will be nonnegative anyway, but in order to handle it > correctly I'd rather keep it like this - I don't mind much losing the > upper half of the range :-) As long as we don't produce any of those evil invalid MP4 files with signed timestamps. Also, how does it behave if input has unsigned 64-bit large timestamps? >> I swear I saw some libs write 0... though I doubt this field means >> anything in a practical sense. > > This field means I'll use 64 bit presentation and first_offset. I'm > potentially wasting 8 bytes per box here when I'm not checking whether I > actually need the 64 bit range... OK. >> Diego-nit: boxes. ;) > > Most of movenc.c and mov.c talk about it as atoms instead of boxes, so I'm > just being consistent with the rest of it :P Figured as much. >> What about msdh and msix? I thought 'dash' was only for >> "Indexed self-initializing Media Segment"s? > > For the individual segment files, I write those (in the styp box), but in > dashenc.c in the following patch. Yeah I saw that later. > (Perhaps I shouldn't be adding any dash brand at all unless I'm writing a global sidx index?) That was my thought, yes. - Derek
On Fri, 14 Nov 2014, Derek Buitenhuis wrote: > On 11/14/2014 2:36 PM, Martin Storsjö wrote: >>> Aren't MP4 (not MOV) timestamps unsigned? >> >> Yes, in principle. After the "if (presentation_time < 0) presentation_time >> = 0;" case below it will be nonnegative anyway, but in order to handle it >> correctly I'd rather keep it like this - I don't mind much losing the >> upper half of the range :-) > > As long as we don't produce any of those evil invalid MP4 files with signed > timestamps. Hmm, you mean with pts < dts? That's at least a different issue than this, and yes, we shouldn't produce such files. > Also, how does it behave if input has unsigned 64-bit large timestamps? Then we're probably screwed (aka interpreting it as the pts<0 case here). Note though that both pts and dts are int64_t in lavf, so you'd be out of range when passing that into the muxer in the first place. // Martin
On 11/14/2014 6:33 PM, Martin Storsjö wrote: > Hmm, you mean with pts < dts? That's at least a different issue than this, > and yes, we shouldn't produce such files. I meant MP4 files that use MOV-style timestamps (great than 1<<31 for negative). - Derek
On Fri, 14 Nov 2014, Derek Buitenhuis wrote: > On 11/14/2014 6:33 PM, Martin Storsjö wrote: >> Hmm, you mean with pts < dts? That's at least a different issue than this, >> and yes, we shouldn't produce such files. > > I meant MP4 files that use MOV-style timestamps (great than 1<<31 for negative). Right, no, that shouldn't really happen. Since the input to lavf is int64_t, those would be treated as negative, and trimmed out (or shifted to start at 0), and you wouldn't end up with negative values here at least. // Martin
diff --git a/Changelog b/Changelog index c51fa8f..ecec401 100644 --- a/Changelog +++ b/Changelog @@ -6,6 +6,7 @@ version <next>: - HEVC/H.265 RTP payload format (draft v6) packetizer and depacketizer - avplay now exits by default at the end of playback - XCB-based screen-grabber +- creating DASH compatible fragmented MP4 version 11: diff --git a/libavformat/movenc.c b/libavformat/movenc.c index 8d378c4..157ca17 100644 --- a/libavformat/movenc.c +++ b/libavformat/movenc.c @@ -59,6 +59,7 @@ static const AVOption options[] = { { "omit_tfhd_offset", "Omit the base data offset in tfhd atoms", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_OMIT_TFHD_OFFSET}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" }, { "disable_chpl", "Disable Nero chapter atom", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_DISABLE_CHPL}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" }, { "default_base_moof", "Set the default-base-is-moof flag in tfhd atoms", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_DEFAULT_BASE_MOOF}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" }, + { "dash", "DASH", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_DASH}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" }, FF_RTP_FLAG_OPTS(MOVMuxContext, rtp_flags), { "skip_iods", "Skip writing iods atom.", offsetof(MOVMuxContext, iods_skip), AV_OPT_TYPE_INT, {.i64 = 0}, 0, 1, AV_OPT_FLAG_ENCODING_PARAM}, { "iods_audio_profile", "iods audio profile atom.", offsetof(MOVMuxContext, iods_audio_profile), AV_OPT_TYPE_INT, {.i64 = -1}, -1, 255, AV_OPT_FLAG_ENCODING_PARAM}, @@ -2675,7 +2676,78 @@ static int mov_write_moof_tag_internal(AVIOContext *pb, MOVMuxContext *mov, return update_size(pb, pos); } -static int mov_write_moof_tag(AVIOContext *pb, MOVMuxContext *mov, int tracks) +static int mov_write_sidx_tag(AVIOContext *pb, + MOVTrack *track, int ref_size, int total_sidx_size) +{ + int64_t pos = avio_tell(pb), offset_pos, end_pos; + int64_t presentation_time = track->start_dts + track->frag_start + + track->cluster[0].cts; + int64_t duration = track->start_dts + track->track_duration - + track->cluster[0].dts; + int64_t offset; + int starts_with_SAP = track->cluster[0].flags & MOV_SYNC_SAMPLE; + + // pts<0 should be cut away using edts + if (presentation_time < 0) + presentation_time = 0; + + avio_wb32(pb, 0); /* size */ + ffio_wfourcc(pb, "sidx"); + avio_w8(pb, 1); /* version */ + avio_wb24(pb, 0); + avio_wb32(pb, track->track_id); /* reference_ID */ + avio_wb32(pb, track->timescale); /* timescale */ + avio_wb64(pb, presentation_time); /* earliest_presentation_time */ + offset_pos = avio_tell(pb); + avio_wb64(pb, 0); /* first_offset (offset to referenced moof) */ + avio_wb16(pb, 0); /* reserved */ + avio_wb16(pb, 1); /* reference_count */ + avio_wb32(pb, (0 << 31) | (ref_size & 0x7fffffff)); /* reference_type (0 = media) | referenced_size */ + avio_wb32(pb, duration); /* subsegment_duration */ + avio_wb32(pb, (starts_with_SAP << 31) | (0 << 28) | 0); /* starts_with_SAP | SAP_type | SAP_delta_time */ + + end_pos = avio_tell(pb); + offset = pos + total_sidx_size - end_pos; + avio_seek(pb, offset_pos, SEEK_SET); + avio_wb64(pb, offset); + avio_seek(pb, end_pos, SEEK_SET); + return update_size(pb, pos); +} + +static int mov_write_sidx_tags(AVIOContext *pb, MOVMuxContext *mov, + int tracks, int ref_size) +{ + int i, round, ret; + AVIOContext *avio_buf; + int total_size = 0; + for (round = 0; round < 2; round++) { + // First run one round to calculate the total size of all + // sidx atoms. + // This would be much simpler if we'd only write one sidx + // atom, for the first track in the moof. + if (round == 0) { + if ((ret = ffio_open_null_buf(&avio_buf)) < 0) + return ret; + } else { + avio_buf = pb; + } + for (i = 0; i < mov->nb_streams; i++) { + MOVTrack *track = &mov->tracks[i]; + if (tracks >= 0 && i != tracks) + continue; + if (!track->entry) + continue; + total_size -= mov_write_sidx_tag(avio_buf, track, ref_size, + total_size); + } + if (round == 0) + total_size = ffio_close_null_buf(avio_buf); + } + return 0; +} + +static int mov_write_moof_tag(AVIOContext *pb, MOVMuxContext *mov, int tracks, + int64_t mdat_size) { AVIOContext *avio_buf; int ret, moof_size; @@ -2685,6 +2757,9 @@ static int mov_write_moof_tag(AVIOContext *pb, MOVMuxContext *mov, int tracks) mov_write_moof_tag_internal(avio_buf, mov, tracks, 0); moof_size = ffio_close_null_buf(avio_buf); + if (mov->flags & FF_MOV_FLAG_DASH) + mov_write_sidx_tags(pb, mov, tracks, moof_size + 8 + mdat_size); + if ((ret = mov_add_tfra_entries(pb, mov, tracks)) < 0) return ret; @@ -2821,6 +2896,10 @@ static int mov_write_ftyp_tag(AVIOContext *pb, AVFormatContext *s) ffio_wfourcc(pb, "MSNV"); else if (mov->mode == MODE_MP4) ffio_wfourcc(pb, "mp41"); + + if (mov->flags & FF_MOV_FLAG_DASH) + ffio_wfourcc(pb, "dash"); + return update_size(pb, pos); } @@ -3054,7 +3133,7 @@ static int mov_flush_fragment(AVFormatContext *s) if (write_moof) { avio_flush(s->pb); - mov_write_moof_tag(s->pb, mov, moof_tracks); + mov_write_moof_tag(s->pb, mov, moof_tracks, mdat_size); mov->fragments++; avio_wb32(s->pb, mdat_size + 8); @@ -3504,6 +3583,9 @@ static int mov_write_header(AVFormatContext *s) if (mov->mode == MODE_ISM) mov->flags |= FF_MOV_FLAG_EMPTY_MOOV | FF_MOV_FLAG_SEPARATE_MOOF | FF_MOV_FLAG_FRAGMENT; + if (mov->flags & FF_MOV_FLAG_DASH) + mov->flags |= FF_MOV_FLAG_FRAGMENT | FF_MOV_FLAG_EMPTY_MOOV | + FF_MOV_FLAG_DEFAULT_BASE_MOOF; /* faststart: moov at the beginning of the file, if supported */ if (mov->flags & FF_MOV_FLAG_FASTSTART) { diff --git a/libavformat/movenc.h b/libavformat/movenc.h index 1df5a5c..2a40b2f 100644 --- a/libavformat/movenc.h +++ b/libavformat/movenc.h @@ -180,6 +180,7 @@ typedef struct MOVMuxContext { #define FF_MOV_FLAG_OMIT_TFHD_OFFSET (1 << 8) #define FF_MOV_FLAG_DISABLE_CHPL (1 << 9) #define FF_MOV_FLAG_DEFAULT_BASE_MOOF (1 << 10) +#define FF_MOV_FLAG_DASH (1 << 11) int ff_mov_write_packet(AVFormatContext *s, AVPacket *pkt); diff --git a/libavformat/version.h b/libavformat/version.h index f8c5edb..c10a6b8 100644 --- a/libavformat/version.h +++ b/libavformat/version.h @@ -31,7 +31,7 @@ #define LIBAVFORMAT_VERSION_MAJOR 56 #define LIBAVFORMAT_VERSION_MINOR 6 -#define LIBAVFORMAT_VERSION_MICRO 4 +#define LIBAVFORMAT_VERSION_MICRO 5 #define LIBAVFORMAT_VERSION_INT AV_VERSION_INT(LIBAVFORMAT_VERSION_MAJOR, \ LIBAVFORMAT_VERSION_MINOR, \