Ok, I dunno about your leak… but I can help you with interpreting that valgrind output. I’ll do it here instead of in a PM because I think it might be helpful for others to see the debugging process…
The first thing Valgrind is telling you that you the statement on line comm.c:729 is trying to write one byte to a piece of memory that valgrind knows has already been free’d. From looking at the source line, it is trying to write a ‘\0’ character to the first byte of the (struct descriptor_data *)d->incom array of characters. Since that array is a statically allocated part of your descriptor_data and not a pointer to another string somewhere, it means that the code has somehow managed to be working on a descriptor_data item that it must have gotten out of descriptor_list, but that’s already supposed to have been removed from the list somehow. How can that happen?
The next thing Valgrind is telling you is what free’d that memory block. It’s giving you the whole call stack of what did the free, and you can see that it was was a close_socket() called by do_quit(). That makes sense, because the write error happens when a character types “quit”. So why is that a problem?
The last thing Valgrind gives you is what originally allocated the memory block that comm.c:729 is trying to write into. From the call trace, the block came from new_descriptor. That implies the memory block was at one time a valid, allocated (struct descriptor_data). Its not like comm.c:729 is completely off in the weeds- its working on something that used to be a descriptor_data, but suddenly isn’t anymore. That is likely because of some order of operations issue. What could it be?
Here’s what it probably is.
game_loop_unix() has a few loops that iterate over the descriptor list. It loops over the list to process new socket descriptors, then it loops to handle any sockets with errors, it loops to process input, and it loops to process output. Those loops are all written to hold the current and next item of the linked descriptor_data in the variables d and d_next. It does it so that if it has to call close_socket(d) and the (struct descriptor_data*) pointed to by d suddenly gets free’d, the loop can still remembers what the value of d->next was supposed to be without having access to d anymore.
However, doing it that way implies that the ONLY thing that can modify the descriptor_list is that loop itself, otherwise its values for what the current and next items in the list are (d and d_next) may not be accurate.
And the smoking gun? do_quit() calling close_socket() from inside the input processing loop. Close socket free’s the descriptor_data, but it doesn’t have any way to let the parent callers know it happened. game_loop_unix() has no way of knowing that calling substitute_alias() had the effect of free’ing the descriptor_data pointed to by d, so game_loop_unix() has no way of knowing that it can’t use the data pointed to by d again like it tries to do in comm.c:729. That’s your invalid write.
There are several ways to solve it… an obvious but complex one is to have your callers provide a return value that can be used to indicate that the descriptor_data is gone. A much easier one might be to add a new value like CON_CLOSING do your d->connected types. Have do_quit() set that connected type instead of calling close_socket() directly, and then have game_loop_unix() check for that change of connected-ness right after input processing, and let it handle the close_socket() instead.
Hope that helps.