Issue
Remote debugging a code running in Qemu with GDB, based on an os-dev tutorial.
My version is here. The problem only happens when remote-debugging code inside qemu, not when building a normal executable to run directly inside GDB under the normal OS.
Code looks something like this:
#define BUFSIZE 255
static char buf[BUFSIZE];
void foo() {
// Making sure it's all zero.
for (int i = 0; i < BUFSIZE; i++) buf[i] = 0;
// Setting first char:
buf[0] = 'a';
// >> insert breakpoint right after setting the char <<
// Prints 'a'.
printf("%s", buf);
}
If I place a breakpoint at the marked spot and print the buffer with p buf I get random values from random places, seemingly from my code section. If I get the address by p &buf I get something that does not look correct, for two things:
If I do a
char* p_buf = bufand I check the address withp p_bufit gives me a totally different address, which is stable across executions (the other was not). Then I inspect that memory section withx /255b 0x____and I can see theaand then zeros (97 0 0 0 ... 0).The next command (
printf("%s", buf);) does actually printsa.
This leaves me believing it might be GDB not knowing the correct location if I only inspect the static variable.
Where should I start debugging this?
Details about the compile conditions:
- Compile flags:
-g -Wall -Wextra -pedantic -nostdlib -nostdinc -fno-builtin -fno-stack-protector -nostartfiles -nodefaultlibs -m32 - qemu-system-i386
- Gcc: i386 elf target
Example output from GDB:
(gdb) p buf
$1 = "dfghjkl;'`\000\\zxcvbnm,./\000*\000 ", '\000' <repeats 198 times>...
(gdb) p p_buf
$2 = 0x40c0 <buf+224> "a"
(gdb) p &buf
$3 = (char (*)[255]) 0x3fe0 <buf>
(gdb) info address buf
Symbol "buf" is static storage at address 0x3fe0.
Update 2:
Disassembled a version of the code that shows the discrepancy:
; void foo
0x19f1 <foo> push %ebp
0x19f2 <foo+1> mov %esp,%ebp
0x19f4 <foo+3> sub $0x10,%esp
; char* p_buf = char_buf; --> `p &char_buf` is 0x4040 (incorrect) but `p p_buf` is 0x4100
0x19f7 <foo+6> movl $0x4100,-0x4(%ebp)
; void* p_p_buf = (void*)p_buf; --> `p p_p_buf` gives 0x4100
0x19fe <foo+13> mov -0x4(%ebp),%eax
0x1a01 <foo+16> mov %eax,-0x8(%ebp)
; void* p_char_buf = (void*)&char_buf; --> `p p_char_buf` gives 0x4100
0x1a04 <foo+19> movl $0x4100,-0xc(%ebp)
; char_buf[0] = 'a'; --> correct address
0x1a0b <foo+26> movb $0x61,0x4100
; char_buf[1] = 'b'; --> correct address (asking `p &char_buf` here is still incorrectly 0x4040)
0x1a12 <foo+33> movb $0x62,0x4101
; void foo return
0x1a19 <foo+40> nop
0x1a1a <foo+41> leave
0x1a1b <foo+42> ret
My Makefile for building the project looks like:
C_SOURCES = $(wildcard kernel/*.c drivers/*.c)
C_HEADERS = $(wildcard kernel/*.h drivers/*.h)
OBJ = ${C_SOURCES:.c=.o kernel/interrupt_table.o}
CC = /home/itarato/code/os/i386elfgcc/bin/i386-elf-gcc
# GDB = /home/itarato/code/os/i386elfgcc/bin/i386-elf-gdb
GDB = /usr/bin/gdb
CFLAGS = -g -Wall -Wextra -ffreestanding -fno-exceptions -pedantic -fno-builtin -fno-stack-protector -nostartfiles -nodefaultlibs -m32
QEMU = qemu-system-i386
os-image.bin: boot/boot.bin kernel.bin
cat $^ > $@
kernel.bin: boot/kernel_entry.o ${OBJ}
i386-elf-ld -o $@ -Ttext 0x1000 $^ --oformat binary
kernel.elf: boot/kernel_entry.o ${OBJ}
i386-elf-ld -o $@ -Ttext 0x1000 $^
kernel.dis: kernel.bin
ndisasm -b 32 $< > $@
run: os-image.bin
${QEMU} -drive format=raw,media=disk,file=$<,index=0,if=floppy
debug: os-image.bin kernel.elf
${QEMU} -s -S -drive format=raw,media=disk,file=$<,index=0,if=floppy &
${GDB} -ex "target remote localhost:1234" -ex "symbol-file kernel.elf" -ex "tui enable" -ex "layout split" -ex "focus cmd"
%.o: %.c ${C_HEADERS}
${CC} ${CFLAGS} -c $< -o $@
%.o: %.asm
nasm $< -f elf -o $@
%.bin: %.asm
nasm $< -f bin -o $@
build: os-image.bin
echo Pass
clean:
rm -rf *.bin *.o *.dis *.elf
rm -rf kernel/*.o boot/*.bin boot/*.o
Solution
This is an interesting problem. It comes down to the fact that the code generated by LD (linker) for the ELF executable kernel.elf is different from that of the code generated by LD for kernel.bin when using the --oformat binary option. While one would expect these to be the same, they are not.
More simply put these Makefile rules do not produce the same code as you might expect:
kernel.elf: boot/kernel_entry.o ${OBJ}
i386-elf-ld -o $@ -Ttext 0x1000 $^
and
kernel.bin: boot/kernel_entry.o ${OBJ}
i386-elf-ld -o $@ -Ttext 0x1000 $^ --oformat binary
It appears the difference is in how the linker is aligning the sections when used with and without --oformat binary. The ELF file (and the symbols used for debugging) are seen to be in one place while the binary file that is actually running in QEMU had code and data generated at different offsets.
I hadn't ever observed this issue because I use my own linker scripts and I always generate the binary file from the ELF executable with OBJCOPY rather than using LD to link twice. OBJCOPY can take an ELF executable and convert it to a binary file. The Makefile rules could be amended to look like:
kernel.bin: kernel.elf
i386-elf-objcopy -O binary $^ $@
kernel.elf: boot/kernel_entry.o ${OBJ}
i386-elf-ld -o $@ -Ttext 0x1000 $^
Doing it this way will ensure the binary file that is generated matches what was produced for the ELF executable.
Answered By - Michael Petch
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.