[2/2] Handle unicode file names on windows

Message ID 1303305974-15389-1-git-send-email-martin@martin.st
State Committed
Headers show

Commit Message

Martin Storsjö April 20, 2011, 1:26 p.m.
From: Kirill Gavrilov <gavr.mail@gmail.com>

All file names should be in UTF-8 within libavformat.

This is handled by mapping the open() function to an internal one
in os_support.h for windows.

fopen() could be overridden in the same way, but if that would be
used from ffmpeg.c, it would add a dependency on an ff prefixed
internal lavf function.

This doesn't work on Windows 9x, but Win 9x hasn't been supported
since fc5607f8620, without anyone complaining (as far as I know).
If Win9x compatibility is desired, these codepaths can be skipped
at runtime.
---
Sorry, the previous one was a broken version, disregard that one.

 cmdutils.c               |   65 ++++++++++++++++++++++++++++++++++++++++++++++
 libavformat/os_support.c |   28 +++++++++++++++++++
 libavformat/os_support.h |    5 +++
 3 files changed, 98 insertions(+), 0 deletions(-)

Comments

Martin Storsjö April 22, 2011, 9:39 p.m. | #1
On Wed, 20 Apr 2011, Martin Storsjö wrote:

> From: Kirill Gavrilov <gavr.mail@gmail.com>
> 
> All file names should be in UTF-8 within libavformat.
> 
> This is handled by mapping the open() function to an internal one
> in os_support.h for windows.
> 
> fopen() could be overridden in the same way, but if that would be
> used from ffmpeg.c, it would add a dependency on an ff prefixed
> internal lavf function.

Any comments on this cleaned up version of the hacks (except for the minor 
issues Diego pointed out, that I fixed locally)?

// Martin
Luca Barbato April 22, 2011, 11:02 p.m. | #2
On 4/20/11 3:26 PM, Martin Storsjö wrote:
> From: Kirill Gavrilov<gavr.mail@gmail.com>
>
> All file names should be in UTF-8 within libavformat.
>
> This is handled by mapping the open() function to an internal one
> in os_support.h for windows.
>
> fopen() could be overridden in the same way, but if that would be
> used from ffmpeg.c, it would add a dependency on an ff prefixed
> internal lavf function.
>
> This doesn't work on Windows 9x, but Win 9x hasn't been supported
> since fc5607f8620, without anyone complaining (as far as I know).
> If Win9x compatibility is desired, these codepaths can be skipped
> at runtime.

>   cmdutils.c               |   65 ++++++++++++++++++++++++++++++++++++++++++++++

> diff --git a/cmdutils.c b/cmdutils.c
> index f1cbd55..defe0f6 100644
> --- a/cmdutils.c
> +++ b/cmdutils.c
> @@ -155,6 +155,68 @@ static const OptionDef* find_option(const OptionDef *po, const char *name){
>       return po;
>   }
>
> +/**
> + * Prepare command line arguments for executable.
> + * For Windows - perform wide-char to UTF-8 conversion.
> + * Input arguments should be main() function arguments.
> + * @param argc_ptr Arguments number (including executable)
> + * @param argv_ptr Arguments list.
> + */
> +static void prepare_app_arguments(int *argc_ptr, char ***argv_ptr);

I'd rather have it somewhere else. The common code should not have 
system specific implementations (that should be applied to stray asm as 
well as this case)

The rest seems ok.

lu
Martin Storsjö April 23, 2011, 9:10 a.m. | #3
On Sat, 23 Apr 2011, Luca Barbato wrote:

> On 4/20/11 3:26 PM, Martin Storsjö wrote:
> > From: Kirill Gavrilov<gavr.mail@gmail.com>
> > 
> > All file names should be in UTF-8 within libavformat.
> > 
> > This is handled by mapping the open() function to an internal one
> > in os_support.h for windows.
> > 
> > fopen() could be overridden in the same way, but if that would be
> > used from ffmpeg.c, it would add a dependency on an ff prefixed
> > internal lavf function.
> > 
> > This doesn't work on Windows 9x, but Win 9x hasn't been supported
> > since fc5607f8620, without anyone complaining (as far as I know).
> > If Win9x compatibility is desired, these codepaths can be skipped
> > at runtime.
> 
> >   cmdutils.c               |   65
> > ++++++++++++++++++++++++++++++++++++++++++++++
> 
> > diff --git a/cmdutils.c b/cmdutils.c
> > index f1cbd55..defe0f6 100644
> > --- a/cmdutils.c
> > +++ b/cmdutils.c
> > @@ -155,6 +155,68 @@ static const OptionDef* find_option(const OptionDef
> > *po, const char *name){
> >       return po;
> >   }
> > 
> > +/**
> > + * Prepare command line arguments for executable.
> > + * For Windows - perform wide-char to UTF-8 conversion.
> > + * Input arguments should be main() function arguments.
> > + * @param argc_ptr Arguments number (including executable)
> > + * @param argv_ptr Arguments list.
> > + */
> > +static void prepare_app_arguments(int *argc_ptr, char ***argv_ptr);
> 
> I'd rather have it somewhere else. The common code should not have system
> specific implementations (that should be applied to stray asm as well as this
> case)

Where would you want this moved, then? A cmdutils_win.c, or a header? For 
stray asm, it's easy to include the right header within #if ARCH_foo. If 
you're ok with keeping all this code in a header, it could be doable, 
otherwise, you'd need to expose the condition (WIN32 && !__MINGW32CE__) to 
make, to enable including the file only there. Or always include the file, 
but only compile the actual content within such #ifs within the file.

// Martin
Luca Barbato April 23, 2011, 7:41 p.m. | #4
On 4/23/11 11:10 AM, Martin Storsjö wrote:
> Where would you want this moved, then? A cmdutils_win.c, or a header? For

A private header might do and seem the simple solution.

> stray asm, it's easy to include the right header within #if ARCH_foo. If
> you're ok with keeping all this code in a header, it could be doable,
> otherwise, you'd need to expose the condition (WIN32&&  !__MINGW32CE__) to
> make, to enable including the file only there. Or always include the file,
> but only compile the actual content within such #ifs within the file.

I hope I'm not causing too much hassle, I'm just concerned to avoid 
other swscale situations.

lu
Ronald Bultje April 23, 2011, 7:56 p.m. | #5
Hi,

On Fri, Apr 22, 2011 at 5:39 PM, Martin Storsjö <martin@martin.st> wrote:
> On Wed, 20 Apr 2011, Martin Storsjö wrote:
>
>> From: Kirill Gavrilov <gavr.mail@gmail.com>
>>
>> All file names should be in UTF-8 within libavformat.
>>
>> This is handled by mapping the open() function to an internal one
>> in os_support.h for windows.
>>
>> fopen() could be overridden in the same way, but if that would be
>> used from ffmpeg.c, it would add a dependency on an ff prefixed
>> internal lavf function.
>
> Any comments on this cleaned up version of the hacks (except for the minor
> issues Diego pointed out, that I fixed locally)?

Personal opinion: for non-win32, please implement it as static inline
void in the header, so that it becomes a nop.

Otherwise, assuming this works: awesome! Thanks for doing this. Even
if it only fixes the issue for 95% of the cases, that's better than
nothing.

Ronald
Martin Storsjö April 23, 2011, 9:09 p.m. | #6
On Sat, 23 Apr 2011, Ronald S. Bultje wrote:

> Hi,
> 
> On Fri, Apr 22, 2011 at 5:39 PM, Martin Storsjö <martin@martin.st> wrote:
> > On Wed, 20 Apr 2011, Martin Storsjö wrote:
> >
> >> From: Kirill Gavrilov <gavr.mail@gmail.com>
> >>
> >> All file names should be in UTF-8 within libavformat.
> >>
> >> This is handled by mapping the open() function to an internal one
> >> in os_support.h for windows.
> >>
> >> fopen() could be overridden in the same way, but if that would be
> >> used from ffmpeg.c, it would add a dependency on an ff prefixed
> >> internal lavf function.
> >
> > Any comments on this cleaned up version of the hacks (except for the minor
> > issues Diego pointed out, that I fixed locally)?
> 
> Personal opinion: for non-win32, please implement it as static inline
> void in the header, so that it becomes a nop.

As discussed on irc, this was a misconception, the thing discussed wasn't 
a function in a header, it was a function in cmdutils.c called only from 
there. Added an inline declaration on the empty function though, and 
pushed.

// Martin
Martin Storsjö April 23, 2011, 9:10 p.m. | #7
On Sat, 23 Apr 2011, Luca Barbato wrote:

> On 4/23/11 11:10 AM, Martin Storsjö wrote:
> > Where would you want this moved, then? A cmdutils_win.c, or a header? For
> 
> A private header might do and seem the simple solution.

Adding a separate header for one single function only defined for one 
single OS feels like overkill. I could do that later if needed - pushed in 
this form now.

// Martin

Patch

diff --git a/cmdutils.c b/cmdutils.c
index f1cbd55..defe0f6 100644
--- a/cmdutils.c
+++ b/cmdutils.c
@@ -155,6 +155,68 @@  static const OptionDef* find_option(const OptionDef *po, const char *name){
     return po;
 }
 
+/**
+ * Prepare command line arguments for executable.
+ * For Windows - perform wide-char to UTF-8 conversion.
+ * Input arguments should be main() function arguments.
+ * @param argc_ptr Arguments number (including executable)
+ * @param argv_ptr Arguments list.
+ */
+static void prepare_app_arguments(int *argc_ptr, char ***argv_ptr);
+
+#if defined(_WIN32) && !defined(__MINGW32CE__)
+/* should be released with av_freep() if needed... */
+static char** win32_argv_utf8 = NULL;
+static int win32_argc = 0;
+
+void prepare_app_arguments(int *argc_ptr, char ***argv_ptr)
+{
+    char *argstr_flat;
+    wchar_t **argv_w;
+    int i, buffsize = 0, offset = 0;
+
+    if (win32_argv_utf8) {
+        *argc_ptr = win32_argc;
+        *argv_ptr = win32_argv_utf8;
+        return;
+    }
+
+    win32_argc = 0;
+    argv_w = CommandLineToArgvW(GetCommandLineW(), &win32_argc);
+    if (win32_argc <= 0 || !argv_w)
+        return;
+
+    /* determine the UTF-8 buffer size (including NULL-termination symbols) */
+    for (i = 0; i < win32_argc; i++)
+        buffsize += WideCharToMultiByte(CP_UTF8, 0, argv_w[i], -1,
+                                        NULL, 0, NULL, NULL);
+
+    win32_argv_utf8 = av_mallocz(sizeof(char*) * (win32_argc + 1) + buffsize);
+    argstr_flat     = (char*)win32_argv_utf8 + sizeof(char*) * (win32_argc + 1);
+    if (win32_argv_utf8 == NULL) {
+        LocalFree(argv_w);
+        return;
+    }
+
+    for (i = 0; i < win32_argc; i++) {
+        win32_argv_utf8[i] = &argstr_flat[offset];
+        offset += WideCharToMultiByte(CP_UTF8, 0, argv_w[i], -1,
+                                      &argstr_flat[offset],
+                                      buffsize - offset, NULL, NULL);
+    }
+    win32_argv_utf8[i] = NULL;
+    LocalFree(argv_w);
+
+    *argc_ptr = win32_argc;
+    *argv_ptr = win32_argv_utf8;
+}
+#else
+void prepare_app_arguments(int *argc_ptr, char ***argv_ptr)
+{
+    /* nothing to do */
+}
+#endif /* WIN32 && !__MINGW32CE__ */
+
 void parse_options(int argc, char **argv, const OptionDef *options,
                    void (* parse_arg_function)(const char*))
 {
@@ -162,6 +224,9 @@  void parse_options(int argc, char **argv, const OptionDef *options,
     int optindex, handleoptions=1;
     const OptionDef *po;
 
+    /* perform system-dependent conversions for arguments list */
+    prepare_app_arguments(&argc, &argv);
+
     /* parse options */
     optindex = 1;
     while (optindex < argc) {
diff --git a/libavformat/os_support.c b/libavformat/os_support.c
index 5a3a1bb..05577b7 100644
--- a/libavformat/os_support.c
+++ b/libavformat/os_support.c
@@ -28,6 +28,34 @@ 
 #include "avformat.h"
 #include "os_support.h"
 
+#if defined(_WIN32) && !defined(__MINGW32CE__)
+#include <windows.h>
+
+#undef open
+int ff_win32_open(const char *filename_utf8, int oflag, int pmode)
+{
+    int fd;
+    int num_chars;
+    wchar_t *filename_w;
+
+    /* convert UTF-8 to wide chars */
+    num_chars = MultiByteToWideChar(CP_UTF8, 0, filename_utf8, -1, NULL, 0);
+    if (num_chars <= 0)
+        return -1;
+    filename_w = av_mallocz(sizeof(wchar_t) * num_chars);
+    MultiByteToWideChar(CP_UTF8, 0, filename_utf8, -1, filename_w, num_chars);
+
+    fd = _wopen(filename_w, oflag, pmode);
+    av_freep(&filename_w);
+
+    /* filename maybe be in CP_ACP */
+    if (fd == -1 && !(oflag & O_CREAT))
+        return open(filename_utf8, oflag, pmode);
+
+    return fd;
+}
+#endif
+
 #if CONFIG_NETWORK
 #include <fcntl.h>
 #include <unistd.h>
diff --git a/libavformat/os_support.h b/libavformat/os_support.h
index dc01e64..521e997 100644
--- a/libavformat/os_support.h
+++ b/libavformat/os_support.h
@@ -45,6 +45,11 @@  static inline int is_dos_path(const char *path)
     return 0;
 }
 
+#if defined(_WIN32) && !defined(__MINGW32CE__)
+int ff_win32_open(const char *filename, int oflag, int pmode);
+#define open ff_win32_open
+#endif
+
 #if CONFIG_NETWORK
 #if !HAVE_SOCKLEN_T
 typedef int socklen_t;