[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Info-mtools] segfault in 4.0.29 mcopy
From: |
Natanael Copa |
Subject: |
Re: [Info-mtools] segfault in 4.0.29 mcopy |
Date: |
Mon, 7 Jun 2021 13:48:58 +0200 |
On Mon, 7 Jun 2021 12:24:17 +0200
Alain Knaff <alain@knaff.lu> wrote:
> Hi,
>
> On 07/06/2021 10:40, Natanael Copa wrote:
> > Hi,
> >
> > I am about to make a release candidate of alpine linux 3.14, but bumped
> > into a segfault on 32 bit x86:
> >
> > Here is the backtrace:
> >
> > Type "apropos word" to search for commands related to "word"...
> > Reading symbols from /usr/bin/mcopy...
> > Reading symbols from /usr/lib/debug//usr/bin/mtools.debug...
> >
> > warning: core file may not match specified executable file.
> > [New LWP 7584]
> > Core was generated by `mcopy -i
> > /tmp/mkimage.cEPbHl/image-aa545b554d9d4d9480eab6d0f1edd6ef8b932df4-x86'.
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Is this the full command line?
I don't think it is. I believe the command line from the script was:
mcopy -i ${DESTDIR}/boot/grub/efi.img -s ${DESTDIR}/efi ::
From here:
https://gitlab.alpinelinux.org/alpine/aports/-/blob/master/scripts/mkimg.base.sh#L253
>
> If so, it is strange that this is even getting that far, as for me it
> results in a usage output, due to absence of source and destination.
>
> However, the following works ok here
>
> ./mcopy -i
> /tmp/mkimage.cEPbHl/image-aa545b554d9d4d9480eab6d0f1edd6ef8b932df4-x86
> /etc/issue :: ./mcopy -i
> /tmp/mkimage.cEPbHl/image-aa545b554d9d4d9480eab6d0f1edd6ef8b932df4-x86
> ::issue .
I was able to reproduce it with:
mformat -i /tmp/efi.img -C -f 1440
mcopy -i /tmp/efi.img /etc/issue ::
Segmentation fault
> So, in order to allow me to reproduce this, please send me the full
> command line, and all other relevant state (such as contents of
> /tmp/mkimage.cEPbHl/image-aa545b554d9d4d9480eab6d0f1edd6ef8b932df4-x86
> image file, if crash depends on state of that file). Or,
> alternatively, if file too large, the steps needed to create it.
The mformatted efi.img is attached.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0 0x566013d2 in force_read (Stream=0xf7ee4711 <get_stride+5>,
> > buf=0xffc7a08c "", start=0, len=512) at force_io.c:61
> ^^^^^^^^^^
> This one is weird. There is no symbol get_stride mentioned anywhere
> in mtools.
>
> I wonder where this is coming from?
>
> On the other hand, it might be worthwhile comparing this with what
> shows at this point in 4.0.28 (or 4.0.27) (by putting a breakpoint in
> force_read). Maybe this symbol somehow comes from musl libc, and is
> expected?
seems like it is from musl libc's malloc implementation:
https://git.musl-libc.org/cgit/musl/tree/src/malloc/mallocng/meta.h#n175
>
>
> > 61 return force_io(Stream, buf, start, len,
> > (gdb) bt
> > #0 0x566013d2 in force_read (Stream=0xf7ee4711 <get_stride+5>,
> > buf=0xffc7a08c "", start=0, len=512) at force_io.c:61 #1
> > 0x56601c6c in read_boot (size=512, boot=0xffc7a08c,
> > Stream=<optimized out>) at init.c:50
>
> ^^^^^^^^^^^^^
> Ok, so it was an optimized compile. However, even with -O8, I still
> couldn't reproduce this here.
>
> Did you use any other compilation flags which might help me reproduce
> this?
From build log:
gcc -Os -fomit-frame-pointer -DHAVE_CONFIG_H -DSYSCONFDIR=\"/etc/mtools\"
-DCPU_i586 -DVENDOR_alpine -DOS_linux_musl -Os -fomit-frame-pointer -g -Wall
-fno-strict-aliasing -I. -I. -c strtonum.c
gcc -Os -fomit-frame-pointer -DHAVE_CONFIG_H -DSYSCONFDIR=\"/etc/mtools\"
-DCPU_i586 -DVENDOR_alpine -DOS_linux_musl -Os -fomit-frame-pointer -g -Wall
-fno-strict-aliasing -I. -I. -c mkmanifest.c -Os
So compiler flags are -Os -fomit-frame-pointer
Another thing I discovered is that -DOS_linux_musl does not set the OS_linux
define, which I think it should.
> Another weird thing is that the read_boot parameters are in a different
> order than what is in the 4.0.29 sources...
>
> Again, comparing this with a build of a previous mtools, might be
> helpful here too, maybe the musl libc toolchain changes order of
> function parameters in some cases?
Don't know. A few observations and comments that might or might not be relevant.
It didn't happen on x86_64, or on armhf (arv6). Only been observed on 32 bit
x86.
It didn't happen with mtools-4.0.26 and earlier. (with alpine 3.13)
The mallocng implementation in musl has been able to detect unknown
bugs in other open source projects, like use after free and similar.
Since musl 1.2, the time_t is 64 bit even on 32 bit architectures.
>
> [...]
> > #9 0x5660602e in mcopy (argc=5, argv=0xffc7bff4, mtype=0) at mcopy.c:615
>
> ok, so 5 arguments where indeed given => good.
>
> But a short command history leading up to this might still be helpful :-)
>
> > #10 0x565faab2 in main (argc=<optimized out>, argv=0xffc7bff4) at
> > mtools.c:184
> > (gdb)
> >
> >
> > Note that this is built with musl libc.
>
> Where can I get musl libc from most easily, and how to use it (i.e.
> ./configure and make command lines)
I think the easiest is with docker:
cat >>Dockerfile <<EOF
FROM alpine:edge
ENV ver=4.0.29
RUN apk add build-base
RUN wget ftp://ftp.gnu.org/gnu/mtools/mtools-$ver.tar.bz2
RUN tar -jxf mtools-$ver.tar.bz2
RUN cd mtools-$ver && ./configure && make -j$(nproc)
EOF
I have experimented a bit with it and it seems like I am not able to
reproduce it when I build without -fomit-frame-pointer. Here is another
backtrace without optimizations (but with -fomit-frame-pointer):
(gdb) run
Starting program: /home/ncopa/aports/main/mtools/src/mtools-4.0.29/mcopy -i
/tmp/efi.img /etc/issue ::
Program received signal SIGSEGV, Segmentation fault.
force_read (Stream=0xf7ffaf88, buf=0xffffbc1c "", start=0, len=512) at
force_io.c:61
61 return force_io(Stream, buf, start, len,
(gdb) bt
#0 force_read (Stream=0xf7ffaf88, buf=0xffffbc1c "", start=0, len=512) at
force_io.c:61
#1 0x56562fc0 in read_boot (Stream=0xf7ffaf88, boot=0xffffbc1c, size=512) at
init.c:50
#2 0x565633b1 in try_device (dev=0x56595b80, mode=2, out_dev=0xffffbbd0,
boot=0xffffbc1c, name=0xffffcc1c "/tmp/efi.img", media=0xffffbba4,
maxSize=0xffffbbc8, isRop=0xffffba88, try_writable=0, errmsg=0xffffba94 "Drive
'::' not supported") at init.c:223
#3 0x565636a1 in find_device (drive=58 ':', mode=2, out_dev=0xffffbbd0,
boot=0xffffbc1c, name=0xffffcc1c "/tmp/efi.img", media=0xffffbba4,
maxSize=0xffffbbc8, isRop=0x0) at init.c:308
#4 0x5656383d in fs_init (drive=58 ':', mode=2, isRop=0x0) at init.c:357
#5 0x5657acff in open_root_dir (drive=58 ':', flags=2, isRop=0x0) at
streamcache.c:69
#6 0x56565951 in common_dos_loop (mp=0xffffd590, pathname=0xffffdd30 "",
lookupState=0xffffd4e8, open_mode=2) at mainloop.c:452
#7 0x56565a96 in dos_target_lookup (mp=0xffffd590, arg=0xffffdd2e "::") at
mainloop.c:486
#8 0x56565c89 in target_lookup (mp=0xffffd590, arg=0xffffdd2e "::") at
mainloop.c:536
#9 0x56569bd4 in mcopy (argc=4, argv=0xffffdb84, mtype=0) at mcopy.c:615
#10 0x56577d2d in main (argc=5, argv=0xffffdb84) at mtools.c:184
(gdb)
Here is the output from valgrind:
==37442== Memcheck, a memory error detector
==37442== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==37442== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==37442== Command: ./mcopy -i /tmp/efi.img /etc/issue ::
==37442==
==37442== Conditional jump or move depends on uninitialised value(s)
==37442== at 0x11623C: try_device (init.c:181)
==37442== by 0x1166A0: find_device (init.c:308)
==37442== by 0x11683C: fs_init (init.c:357)
==37442== by 0x12DCFE: open_root_dir (streamcache.c:69)
==37442== by 0x118950: common_dos_loop (mainloop.c:452)
==37442== by 0x118A95: dos_target_lookup (mainloop.c:486)
==37442== by 0x118C88: target_lookup (mainloop.c:536)
==37442== by 0x11CBD3: mcopy (mcopy.c:615)
==37442== by 0x12AD2C: main (mtools.c:184)
==37442==
==37442== Conditional jump or move depends on uninitialised value(s)
==37442== at 0x11629C: try_device (init.c:190)
==37442== by 0x1166A0: find_device (init.c:308)
==37442== by 0x11683C: fs_init (init.c:357)
==37442== by 0x12DCFE: open_root_dir (streamcache.c:69)
==37442== by 0x118950: common_dos_loop (mainloop.c:452)
==37442== by 0x118A95: dos_target_lookup (mainloop.c:486)
==37442== by 0x118C88: target_lookup (mainloop.c:536)
==37442== by 0x11CBD3: mcopy (mcopy.c:615)
==37442== by 0x12AD2C: main (mtools.c:184)
==37442==
==37442== Conditional jump or move depends on uninitialised value(s)
==37442== at 0x11632D: try_device (init.c:208)
==37442== by 0x1166A0: find_device (init.c:308)
==37442== by 0x11683C: fs_init (init.c:357)
==37442== by 0x12DCFE: open_root_dir (streamcache.c:69)
==37442== by 0x118950: common_dos_loop (mainloop.c:452)
==37442== by 0x118A95: dos_target_lookup (mainloop.c:486)
==37442== by 0x118C88: target_lookup (mainloop.c:536)
==37442== by 0x11CBD3: mcopy (mcopy.c:615)
==37442== by 0x12AD2C: main (mtools.c:184)
==37442==
==37442== Use of uninitialised value of size 4
==37442== at 0x115660: force_read (force_io.c:62)
==37442== by 0x115FBF: read_boot (init.c:50)
==37442== by 0x1163B0: try_device (init.c:223)
==37442== by 0x1166A0: find_device (init.c:308)
==37442== by 0x11683C: fs_init (init.c:357)
==37442== by 0x12DCFE: open_root_dir (streamcache.c:69)
==37442== by 0x118950: common_dos_loop (mainloop.c:452)
==37442== by 0x118A95: dos_target_lookup (mainloop.c:486)
==37442== by 0x118C88: target_lookup (mainloop.c:536)
==37442== by 0x11CBD3: mcopy (mcopy.c:615)
==37442== by 0x12AD2C: main (mtools.c:184)
==37442==
==37442== Invalid read of size 4
==37442== at 0x115662: force_read (force_io.c:61)
==37442== by 0x115FBF: read_boot (init.c:50)
==37442== by 0x1163B0: try_device (init.c:223)
==37442== by 0x1166A0: find_device (init.c:308)
==37442== by 0x11683C: fs_init (init.c:357)
==37442== by 0x12DCFE: open_root_dir (streamcache.c:69)
==37442== by 0x118950: common_dos_loop (mainloop.c:452)
==37442== by 0x118A95: dos_target_lookup (mainloop.c:486)
==37442== by 0x118C88: target_lookup (mainloop.c:536)
==37442== by 0x11CBD3: mcopy (mcopy.c:615)
==37442== by 0x12AD2C: main (mtools.c:184)
==37442== Address 0x94ec8 is not stack'd, malloc'd or (recently) free'd
==37442==
==37442==
==37442== Process terminating with default action of signal 11 (SIGSEGV)
==37442== Access not within mapped region at address 0x94EC8
==37442== at 0x115662: force_read (force_io.c:61)
==37442== by 0x115FBF: read_boot (init.c:50)
==37442== by 0x1163B0: try_device (init.c:223)
==37442== by 0x1166A0: find_device (init.c:308)
==37442== by 0x11683C: fs_init (init.c:357)
==37442== by 0x12DCFE: open_root_dir (streamcache.c:69)
==37442== by 0x118950: common_dos_loop (mainloop.c:452)
==37442== by 0x118A95: dos_target_lookup (mainloop.c:486)
==37442== by 0x118C88: target_lookup (mainloop.c:536)
==37442== by 0x11CBD3: mcopy (mcopy.c:615)
==37442== by 0x12AD2C: main (mtools.c:184)
==37442== If you believe this happened as a result of a stack
==37442== overflow in your program's main thread (unlikely but
==37442== possible), you can try to increase the size of the
==37442== main thread stack using the --main-stacksize= flag.
==37442== The main thread stack size used in this run was 8388608.
==37442==
==37442== HEAP SUMMARY:
==37442== in use at exit: 469 bytes in 3 blocks
==37442== total heap usage: 4 allocs, 1 frees, 621 bytes allocated
==37442==
==37442== LEAK SUMMARY:
==37442== definitely lost: 0 bytes in 0 blocks
==37442== indirectly lost: 0 bytes in 0 blocks
==37442== possibly lost: 0 bytes in 0 blocks
==37442== still reachable: 469 bytes in 3 blocks
==37442== suppressed: 0 bytes in 0 blocks
==37442== Rerun with --leak-check=full to see details of leaked memory
==37442==
==37442== Use --track-origins=yes to see where uninitialised values come from
==37442== For lists of detected and suppressed errors, rerun with: -s
==37442== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0)
-nc
efi.img
Description: application/raw-disk-image