network: use strncmp() to avoid an OOB read

Message ID 1518499094-29659-1-git-send-email-gseanmcg@gmail.com
State New
Headers show
Series
  • network: use strncmp() to avoid an OOB read
Related show

Commit Message

Sean McGovern Feb. 13, 2018, 5:18 a.m.
Using strcmp() with constant arrays in recent versions of GCC,
the compiler will "optimize" the calls to use memcmp() instead.

This can be problematic as some implementations of memcmp() are written
to compare full words at a time which can cause an out-of-bounds read.

Avoid the invalid read by using strncmp() instead.
---
 libavformat/network.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Luca Barbato Feb. 13, 2018, 8:58 a.m. | #1
On 13/02/2018 06:18, Sean McGovern wrote:
> Using strcmp() with constant arrays in recent versions of GCC,
> the compiler will "optimize" the calls to use memcmp() instead.
> 
> This can be problematic as some implementations of memcmp() are written
> to compare full words at a time which can cause an out-of-bounds read.
> 
> Avoid the invalid read by using strncmp() instead.
> ---
>   libavformat/network.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libavformat/network.c b/libavformat/network.c
> index 86d7955..2bbbb93 100644
> --- a/libavformat/network.c
> +++ b/libavformat/network.c
> @@ -252,7 +252,7 @@ static int match_host_pattern(const char *pattern, const char *hostname)
>       if (len_p > len_h)
>           return 0;
>       // Simply check if the end of hostname is equal to 'pattern'
> -    if (!strcmp(pattern, &hostname[len_h - len_p])) {
> +    if (!strncmp(pattern, &hostname[len_h - len_p], len_h)) {
>           if (len_h == len_p)
>               return 1; // Exact match
>           if (hostname[len_h - len_p - 1] == '.')
> 

Fine for me.

lu
Martin Storsjö Feb. 13, 2018, 9:05 a.m. | #2
On Tue, 13 Feb 2018, Sean McGovern wrote:

> Using strcmp() with constant arrays in recent versions of GCC,
> the compiler will "optimize" the calls to use memcmp() instead.
>
> This can be problematic as some implementations of memcmp() are written
> to compare full words at a time which can cause an out-of-bounds read.
>
> Avoid the invalid read by using strncmp() instead.
> ---
> libavformat/network.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libavformat/network.c b/libavformat/network.c
> index 86d7955..2bbbb93 100644
> --- a/libavformat/network.c
> +++ b/libavformat/network.c
> @@ -252,7 +252,7 @@ static int match_host_pattern(const char *pattern, const char *hostname)
>     if (len_p > len_h)
>         return 0;
>     // Simply check if the end of hostname is equal to 'pattern'
> -    if (!strcmp(pattern, &hostname[len_h - len_p])) {
> +    if (!strncmp(pattern, &hostname[len_h - len_p], len_h)) {
>         if (len_h == len_p)
>             return 1; // Exact match
>         if (hostname[len_h - len_p - 1] == '.')
> -- 
> 2.7.4

Despite the commit message, I don't really understand what's happening. 
Can you give a more detailed explanation? I don't want to obfuscate code 
to dance around optimizations unless I at least understand why and how.

// Martin
Rémi Denis-Courmont Feb. 13, 2018, 4:04 p.m. | #3
Le tiistaina 13. helmikuuta 2018, 7.18.14 EET Sean McGovern a écrit :
> Using strcmp() with constant arrays in recent versions of GCC,
> the compiler will "optimize" the calls to use memcmp() instead.

Does it? How do you reproduce that?

> This can be problematic as some implementations of memcmp() are written
> to compare full words at a time which can cause an out-of-bounds read.

That looks like a compiler bug, and I really do not see how patching one call 
site inside avformat is going to fix that.
Sean McGovern Feb. 13, 2018, 4:16 p.m. | #4
Hi Martin,

On Tue, Feb 13, 2018 at 4:05 AM, Martin Storsjö <martin@martin.st> wrote:
> On Tue, 13 Feb 2018, Sean McGovern wrote:
>
>> Using strcmp() with constant arrays in recent versions of GCC,
>> the compiler will "optimize" the calls to use memcmp() instead.
>>
>> This can be problematic as some implementations of memcmp() are written
>> to compare full words at a time which can cause an out-of-bounds read.
>>
>> Avoid the invalid read by using strncmp() instead.
>> ---
>> libavformat/network.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/libavformat/network.c b/libavformat/network.c
>> index 86d7955..2bbbb93 100644
>> --- a/libavformat/network.c
>> +++ b/libavformat/network.c
>> @@ -252,7 +252,7 @@ static int match_host_pattern(const char *pattern,
>> const char *hostname)
>>     if (len_p > len_h)
>>         return 0;
>>     // Simply check if the end of hostname is equal to 'pattern'
>> -    if (!strcmp(pattern, &hostname[len_h - len_p])) {
>> +    if (!strncmp(pattern, &hostname[len_h - len_p], len_h)) {
>>         if (len_h == len_p)
>>             return 1; // Exact match
>>         if (hostname[len_h - len_p - 1] == '.')
>> --
>> 2.7.4
>
>
> Despite the commit message, I don't really understand what's happening. Can
> you give a more detailed explanation? I don't want to obfuscate code to
> dance around optimizations unless I at least understand why and how.
>
> // Martin
>

I discovered this while doing a full valgrind FATE run on a POWER7
machine -- among others, fate-noproxy failed.

The result for the noproxy test in this case makes me believe it is
using the aforementioned behaviour of memcmp():

==47650== Memcheck, a memory error detector
==47650== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==47650== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==47650== Command: /home/seanmcg/build/libav-gcc7/libavformat/tests/noproxy
==47650==
==47650== Invalid read of size 8
==47650==    at 0x1000646C: match_host_pattern (network.c:255)
==47650==    by 0x1000646C: ff_http_match_no_proxy (network.c:284)
==47650==    by 0x100059CF: test (noproxy.c:25)
==47650==    by 0x100057BB: main (noproxy.c:34)
==47650==  Address 0x4480054 is 20 bytes inside a block of size 23 alloc'd
==47650==    at 0x40861E4: malloc (vg_replace_malloc.c:298)
==47650==    by 0x4088AD3: realloc (vg_replace_malloc.c:785)
==47650==    by 0x100A9A2F: av_realloc (mem.c:116)
==47650==    by 0x100A9A2F: av_strdup (mem.c:215)
==47650==    by 0x10006267: ff_http_match_no_proxy (network.c:272)
==47650==    by 0x100059CF: test (noproxy.c:25)
==47650==    by 0x100057BB: main (noproxy.c:34)
==47650==

<..snipped for brevity, pattern repeats for each test..>

I found the following bug reports which seemed relevant in the GCC Bugzilla:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257

I don't think this is a compiler bug or cargo-culting, although Clang
does not appear to exhibit this behaviour.

-- Sean McG.
Rémi Denis-Courmont Feb. 13, 2018, 4:23 p.m. | #5
Le tiistaina 13. helmikuuta 2018, 7.18.14 EET Sean McGovern a écrit :
> Using strcmp() with constant arrays in recent versions of GCC,
> the compiler will "optimize" the calls to use memcmp() instead.
> 
> This can be problematic as some implementations of memcmp() are written
> to compare full words at a time which can cause an out-of-bounds read.
> 
> Avoid the invalid read by using strncmp() instead.
> ---
>  libavformat/network.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libavformat/network.c b/libavformat/network.c
> index 86d7955..2bbbb93 100644
> --- a/libavformat/network.c
> +++ b/libavformat/network.c
> @@ -252,7 +252,7 @@ static int match_host_pattern(const char *pattern, const
> char *hostname) if (len_p > len_h)
>          return 0;
>      // Simply check if the end of hostname is equal to 'pattern'
> -    if (!strcmp(pattern, &hostname[len_h - len_p])) {
> +    if (!strncmp(pattern, &hostname[len_h - len_p], len_h)) {
>          if (len_h == len_p)
>              return 1; // Exact match
>          if (hostname[len_h - len_p - 1] == '.')

By the way, the code does not even look logically equivalent, depending on the 
value on len_h.
Rémi Denis-Courmont Feb. 13, 2018, 4:31 p.m. | #6
Le tiistaina 13. helmikuuta 2018, 18.16.55 EET Sean McGovern a écrit :
> I discovered this while doing a full valgrind FATE run on a POWER7
> machine -- among others, fate-noproxy failed.
> 
> The result for the noproxy test in this case makes me believe it is
> using the aforementioned behaviour of memcmp():
> 
> ==47650== Memcheck, a memory error detector
> ==47650== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
> ==47650== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
> ==47650== Command: /home/seanmcg/build/libav-gcc7/libavformat/tests/noproxy
> ==47650==
> ==47650== Invalid read of size 8
> ==47650==    at 0x1000646C: match_host_pattern (network.c:255)
> ==47650==    by 0x1000646C: ff_http_match_no_proxy (network.c:284)
> ==47650==    by 0x100059CF: test (noproxy.c:25)
> ==47650==    by 0x100057BB: main (noproxy.c:34)
> ==47650==  Address 0x4480054 is 20 bytes inside a block of size 23 alloc'd
> ==47650==    at 0x40861E4: malloc (vg_replace_malloc.c:298)
> ==47650==    by 0x4088AD3: realloc (vg_replace_malloc.c:785)
> ==47650==    by 0x100A9A2F: av_realloc (mem.c:116)
> ==47650==    by 0x100A9A2F: av_strdup (mem.c:215)
> ==47650==    by 0x10006267: ff_http_match_no_proxy (network.c:272)
> ==47650==    by 0x100059CF: test (noproxy.c:25)
> ==47650==    by 0x100057BB: main (noproxy.c:34)
> ==47650==
> 
> <..snipped for brevity, pattern repeats for each test..>
> 
> I found the following bug reports which seemed relevant in the GCC Bugzilla:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171 and
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257
> 
> I don't think this is a compiler bug or cargo-culting, although Clang
> does not appear to exhibit this behaviour.

This is not so much a GCC bug as an incompatibility between GCC and valgrind, 
likely because not many people care about POWER:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80479
https://bugs.kde.org/show_bug.cgi?id=386945

I don't think this patch is valid anyway, and even if it were, I don't think 
it is a sane thing to do. Why replace just that one call site? And what about 
all other <string.h> that GCC has built-ins for? By this line of thinking, you 
should reinvent much of <string.h> - just because valgrind breaks on POWER.
Hendrik Leppkes Feb. 13, 2018, 4:31 p.m. | #7
On Tue, Feb 13, 2018 at 5:16 PM, Sean McGovern <gseanmcg@gmail.com> wrote:
>
> I discovered this while doing a full valgrind FATE run on a POWER7
> machine -- among others, fate-noproxy failed.
>
> The result for the noproxy test in this case makes me believe it is
> using the aforementioned behaviour of memcmp():
>
> ==47650== Memcheck, a memory error detector
> ==47650== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
> ==47650== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
> ==47650== Command: /home/seanmcg/build/libav-gcc7/libavformat/tests/noproxy
> ==47650==
> ==47650== Invalid read of size 8
> ==47650==    at 0x1000646C: match_host_pattern (network.c:255)
> ==47650==    by 0x1000646C: ff_http_match_no_proxy (network.c:284)
> ==47650==    by 0x100059CF: test (noproxy.c:25)
> ==47650==    by 0x100057BB: main (noproxy.c:34)
> ==47650==  Address 0x4480054 is 20 bytes inside a block of size 23 alloc'd
> ==47650==    at 0x40861E4: malloc (vg_replace_malloc.c:298)
> ==47650==    by 0x4088AD3: realloc (vg_replace_malloc.c:785)
> ==47650==    by 0x100A9A2F: av_realloc (mem.c:116)
> ==47650==    by 0x100A9A2F: av_strdup (mem.c:215)
> ==47650==    by 0x10006267: ff_http_match_no_proxy (network.c:272)
> ==47650==    by 0x100059CF: test (noproxy.c:25)
> ==47650==    by 0x100057BB: main (noproxy.c:34)
> ==47650==
>
> <..snipped for brevity, pattern repeats for each test..>
>
> I found the following bug reports which seemed relevant in the GCC Bugzilla:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171 and
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257
>
> I don't think this is a compiler bug or cargo-culting, although Clang
> does not appear to exhibit this behaviour.
>

How is this not a compiler bug if valid C code results in an
out-of-bounds access due to some optimization?

- Hendrik
Sean McGovern Feb. 13, 2018, 4:35 p.m. | #8
Hi Rémi,

On Tue, Feb 13, 2018 at 11:31 AM, Rémi Denis-Courmont <remi@remlab.net> wrote:
> Le tiistaina 13. helmikuuta 2018, 18.16.55 EET Sean McGovern a écrit :
>> I discovered this while doing a full valgrind FATE run on a POWER7
>> machine -- among others, fate-noproxy failed.
>>
>> The result for the noproxy test in this case makes me believe it is
>> using the aforementioned behaviour of memcmp():
>>
>> ==47650== Memcheck, a memory error detector
>> ==47650== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
>> ==47650== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
>> ==47650== Command: /home/seanmcg/build/libav-gcc7/libavformat/tests/noproxy
>> ==47650==
>> ==47650== Invalid read of size 8
>> ==47650==    at 0x1000646C: match_host_pattern (network.c:255)
>> ==47650==    by 0x1000646C: ff_http_match_no_proxy (network.c:284)
>> ==47650==    by 0x100059CF: test (noproxy.c:25)
>> ==47650==    by 0x100057BB: main (noproxy.c:34)
>> ==47650==  Address 0x4480054 is 20 bytes inside a block of size 23 alloc'd
>> ==47650==    at 0x40861E4: malloc (vg_replace_malloc.c:298)
>> ==47650==    by 0x4088AD3: realloc (vg_replace_malloc.c:785)
>> ==47650==    by 0x100A9A2F: av_realloc (mem.c:116)
>> ==47650==    by 0x100A9A2F: av_strdup (mem.c:215)
>> ==47650==    by 0x10006267: ff_http_match_no_proxy (network.c:272)
>> ==47650==    by 0x100059CF: test (noproxy.c:25)
>> ==47650==    by 0x100057BB: main (noproxy.c:34)
>> ==47650==
>>
>> <..snipped for brevity, pattern repeats for each test..>
>>
>> I found the following bug reports which seemed relevant in the GCC Bugzilla:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171 and
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257
>>
>> I don't think this is a compiler bug or cargo-culting, although Clang
>> does not appear to exhibit this behaviour.
>
> This is not so much a GCC bug as an incompatibility between GCC and valgrind,
> likely because not many people care about POWER:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80479
> https://bugs.kde.org/show_bug.cgi?id=386945
>
> I don't think this patch is valid anyway, and even if it were, I don't think
> it is a sane thing to do. Why replace just that one call site? And what about
> all other <string.h> that GCC has built-ins for? By this line of thinking, you
> should reinvent much of <string.h> - just because valgrind breaks on POWER.
>

OK. Fair enough.

I will drop this patch.

Sorry for wasting people's time.

-- Sean McG.
wm4 Feb. 13, 2018, 4:43 p.m. | #9
On Tue, 13 Feb 2018 17:31:56 +0100
Hendrik Leppkes <h.leppkes@gmail.com> wrote:

> On Tue, Feb 13, 2018 at 5:16 PM, Sean McGovern <gseanmcg@gmail.com> wrote:
> >
> > I discovered this while doing a full valgrind FATE run on a POWER7
> > machine -- among others, fate-noproxy failed.
> >
> > The result for the noproxy test in this case makes me believe it is
> > using the aforementioned behaviour of memcmp():
> >
> > ==47650== Memcheck, a memory error detector
> > ==47650== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
> > ==47650== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
> > ==47650== Command: /home/seanmcg/build/libav-gcc7/libavformat/tests/noproxy
> > ==47650==
> > ==47650== Invalid read of size 8
> > ==47650==    at 0x1000646C: match_host_pattern (network.c:255)
> > ==47650==    by 0x1000646C: ff_http_match_no_proxy (network.c:284)
> > ==47650==    by 0x100059CF: test (noproxy.c:25)
> > ==47650==    by 0x100057BB: main (noproxy.c:34)
> > ==47650==  Address 0x4480054 is 20 bytes inside a block of size 23 alloc'd
> > ==47650==    at 0x40861E4: malloc (vg_replace_malloc.c:298)
> > ==47650==    by 0x4088AD3: realloc (vg_replace_malloc.c:785)
> > ==47650==    by 0x100A9A2F: av_realloc (mem.c:116)
> > ==47650==    by 0x100A9A2F: av_strdup (mem.c:215)
> > ==47650==    by 0x10006267: ff_http_match_no_proxy (network.c:272)
> > ==47650==    by 0x100059CF: test (noproxy.c:25)
> > ==47650==    by 0x100057BB: main (noproxy.c:34)
> > ==47650==
> >
> > <..snipped for brevity, pattern repeats for each test..>
> >
> > I found the following bug reports which seemed relevant in the GCC Bugzilla:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171 and
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257
> >
> > I don't think this is a compiler bug or cargo-culting, although Clang
> > does not appear to exhibit this behaviour.
> >  
> 
> How is this not a compiler bug if valid C code results in an
> out-of-bounds access due to some optimization?

Isn't it just because they use SIMD (as long as they're within page
boundaries)? Of course it's going to set up some analyzers if they're
not aware of it.
Rémi Denis-Courmont Feb. 13, 2018, 4:49 p.m. | #10
Le tiistaina 13. helmikuuta 2018, 18.31.56 EET Hendrik Leppkes a écrit :
> How is this not a compiler bug if valid C code results in an
> out-of-bounds access due to some optimization?

GCC assumes that what you call "out-of-bounds" reads are legal as long as they 
do not straddle a page boundary. This does not lead to any observable side 
effects, in the sense of the ISO C specification, is a valid assumption with 
the targetted ISA. This enables GCC to use SIMD heavily.

The problem here is a disagreement between valgrind and GCC memory models.

Patch

diff --git a/libavformat/network.c b/libavformat/network.c
index 86d7955..2bbbb93 100644
--- a/libavformat/network.c
+++ b/libavformat/network.c
@@ -252,7 +252,7 @@  static int match_host_pattern(const char *pattern, const char *hostname)
     if (len_p > len_h)
         return 0;
     // Simply check if the end of hostname is equal to 'pattern'
-    if (!strcmp(pattern, &hostname[len_h - len_p])) {
+    if (!strncmp(pattern, &hostname[len_h - len_p], len_h)) {
         if (len_h == len_p)
             return 1; // Exact match
         if (hostname[len_h - len_p - 1] == '.')