Understanding Valgrind errors (1)
While debugging segmentation faults (crashes) in PHP and its extensions, I often use Valgrind to assist me finding the root cause. Valgrind is an instrumentation framework for building dynamic analysis tools. It contains several tools, and its Memcheck tool is the one that detects memory-management problems. Memcheck is really valuable for C and C++ developers and something you should learn, especially when you write PHP extensions.
Memcheck's error messages can sometimes be difficult to understand, so with this (infrequent series), I hope to shed some light on it.
Let's have a look at the following Valgrind error output, which I encountered while debugging issue PHP-963 of the MongoDB driver for PHP:
==18500== Invalid read of size 8 ==18500== at 0xC5FB7F1: zim_MongoCursor_info (cursor.c:866) ==18500== by 0x9AF93F: execute_internal (zend_execute.c:1480) ==18500== by 0xBF1B8FD: xdebug_execute_internal (xdebug.c:1565) ==18500== by 0x9B0715: zend_do_fcall_common_helper_SPEC (zend_vm_execute.h:645) ==18500== by 0x9B0D88: ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER (zend_vm_execute.h:756) ==18500== by 0x9AFCAA: execute (zend_vm_execute.h:410) ==18500== by 0xBF1B47C: xdebug_execute (xdebug.c:1453) ==18500== by 0x976CC6: zend_execute_scripts (zend.c:1315) ==18500== by 0x8F3340: php_execute_script (main.c:2502) ==18500== by 0xA172B9: do_cli (php_cli.c:989) ==18500== by 0xA1825E: main (php_cli.c:1365) ==18500== Address 0x38 is not stack'd, malloc'd or (recently) free'd
An Invalid read means that the memory location that the process was trying to read is outside of the memory addresses that are available to the process. size 8 means that the process was trying to read 8 bytes. On 64-bit platforms this could be a pointer, but also for example a long int.
The last line of the error report says Address 0x38 is not stack'd, malloc'd
or (recently) free'd, which means that the address that the process was trying to read 8 bytes from starts at 0x38. The line also says that the address is unavailable through stack space, heap space (malloc), or that it was recently a valid memory location.
A very low address, such as 0x38 (56 in decimal), combined with size 8 often indicates that you tried to dereference an element of a struct which was pointed to by a NULL pointer.
When I checked line 866 of cursor.c I saw:
add_assoc_string(return_value, "server", cursor->connection->hash, 1);
cursor is a struct of type mongo_cursor which is defined as:
typedef struct {
zend_object std;
/* Connection */
mongo_connection *connection;
…
} mongo_cursor;
The zend_object std; is a general struct that is part of every overloaded object (overloaded in the PHP Internals sense) and contains four pointers that make up the first 32 bytes. This means that connection starts at offset 32.
The connection member is a struct of type mongo_connection that is defined as:
typedef struct _mongo_connection
{
time_t last_ping; // 0
int ping_ms; // 8
int last_ismaster; // 12
int last_reqid; // 16
void *socket; // 24
int connection_type; // 32
int max_bson_size; // 36
int max_message_size; // 40
int tag_count; // 44
char **tags; // 48
char *hash; // 56
mongo_connection_deregister_callback *cleanup_list; // 64
} mongo_connection;
I've added the offsets as they are on a 64-bit platform as comments.
In C the code cursor->connection->hash gets converted to an address as follows:
-
Take the address in
cursor -
Find the offset for the
connectionmember (32) -
Find the address that is stored at offset 32 (In my example, that'd be
NULL) -
Find the offset of
hashin the struct thatconnectionrepresents (56) -
Read the pointer data (8 bytes data) at the address from step 3 plus the offset of step 4.
This reads 8 bytes of data from address 56 (0x38) which is not valid, and hence Valgrind produces the error message, and then kills the process.
In short, an error where Address 0x38 is not stack'd, malloc'd or (recently)
free'd has a low address in the message often means a NULL-pointer dereference.
Comments
Thanks for sharing this insight! I'm new to c and there's something I do not understand about the offsets. I wonder why some mongo_connection members of type int are counted as 4 bytes and others as 8 bytes. How is this possible?
@Jaap: An int is always 4 bytes (on my platform), but offsets also need to be aligned. This means that a data object that is 4 bytes large, needs to start on an address divisible by 4. And if a data object is 8 bytes large(such as a pointer), it needs to start on an address divisible by 8.
last_reqid seems to take up 8 bytes because of this, as the void* for socket is 8 bytes long and hence cannot start on offset 20.
Thanks for pointing that out!
Are all these structs and types (cursor, mongo_cursor etc.) in the same file cursor.c. If not do they belong to the files defined as "by" line in the Valgrind output.
They are not all in the same file. The "by" lines just indicate the current stack, and not where structs live.
Life Line
Created 2 waste_baskets, 2 boards, and 2 benches; Updated 8 benches, 6 waste_baskets, and 2 other objects; Deleted 2 waste_baskets and a bench
Updated a pub
Merge branch 'v2022'
Merge branch 'php_gh_19803' into v2022
Fix PHP GH-19803: Parsing a string with a single white space does cre…
Merge branch 'xdebug_3_5'
Back to -dev
Go with 3.5.1
Thanks Brussels, you've been lovely to me this weekend.
And I've a lot to process from the conference too.
I walked 3.9km in 36m24s
I walked 1.1km in 11m31s
Updated a restaurant; Confirmed 3 fast_foods, a convenience shop, and 2 other objects
I walked 4.4km in 44m06s
I walked 5.1km in 56m16s
I walked 3.8km in 37m43s
I'm up early to get the Eurostar to #FOSDEM and I've already seen friends being made.
Fixed another address
Fixed address for flats
Fixed address, and ass missing service way
I walked 6.4km in 1h20m58s
I walked 11.4km in 9m22s
I walked 10.4km in 1h41m18s
I walked 7.4km in 1h26m49s
Good job antivaxer shit heads:
BBC News: UK loses measles elimination status
I walked 8.2km in 1h24m04s


Shortlink
This article has a short URL available: https://drck.me/valgrind1-aum