gfapi: drain open fds in glfs_fini before graph teardown#4669
Open
ThalesBarretto wants to merge 2 commits into
Open
gfapi: drain open fds in glfs_fini before graph teardown#4669ThalesBarretto wants to merge 2 commits into
ThalesBarretto wants to merge 2 commits into
Conversation
glfs_open() and 30+ other gfapi entry points can succeed while glfs_fini() is tearing down the xlator graph, because __GLFS_ENTRY_VALIDATE_FS only checks if the fs pointer is NULL. There is no shutdown gate. A thread calling glfs_open() between fini's drain-loop exit and inode_table_destroy_all() will create a file descriptor referencing freed inodes and xlator private data — a use-after-free. This race has been discussed informally in the GlusterFS community for over 10 years (see issue gluster#404, "Implement proper cleanup sequence", filed 2018 and closed without resolution). The drain loop comment at line 1303 ("leaked frames may exist, we ignore") acknowledges the symptom. A TLA+ formal model of the gfapi session lifecycle proves the race exists at protocol level: splitting glfs_fini into two non-atomic steps (set flag, transition state) and adding a ghost variable to track "stale fds" (fds opened after shutdown begins) shows an invariant violation at depth 5. The real C code is worse than the model — no flag is set at fini entry at all. Fix: add a shutting_down flag to struct glfs, set it at the top of pub_glfs_fini (before the drain loop), and check it at two choke points: 1. __GLFS_ENTRY_VALIDATE_FS (fast path, no mutex — safe because the flag is monotonic: only transitions false -> true) 2. priv_glfs_active_subvol (authoritative gate, under fs->mutex) All 30+ gfapi entry points flow through both checks. Cleanup paths (glfs_fd_destroy, glfs_subvol_done) use glfs_lock with _gf_false and are unaffected. Since glfs_fini itself would be rejected by the new gate in priv_glfs_active_subvol, replace its call to glfs_active_subvol with a direct read of fs->active_subvol under the mutex. The old_subvol handling from priv_glfs_active_subvol is mirrored so a pending old graph still gets its PARENT_DOWN. This patch addresses Bug A (no new operations after fini starts). Bug B (in-flight operations between drain-loop exit and graph destruction) is a separate, more complex fix that requires a barrier or re-check mechanism. Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com> # api/src/glfs-internal.h | 5 +++++ # api/src/glfs-resolve.c | 11 +++++++++++ # api/src/glfs.c | 43 ++++++++++++++++++++++++++++++++++++++++++- # 3 files changed, 58 insertions(+), 1 deletion(-)
glfs_fini() never iterates fs->openfds. If a consumer calls fini
with open file descriptors, inode_table_destroy_all force-frees
inodes ("approach 2", inode.c:1835) while fd_t objects still hold
references. The result is dangling pointers, leaked glfs_fd_t
objects, and unbounded memory growth.
Add glfs_drain_openfds() which pops each fd from the openfds list,
marks it GLFD_CLOSE, releases the fd_t's inode ref via fd_unref
while the inode table is still alive, NULLs glfd->fd, and calls
GF_REF_PUT to trigger destruction.
The fd_unref+NULL step is critical: it makes the drain safe even
when a leaked frame holds an extra GF_REF_GET (refcount > 1).
Without it, the drain would convert a silent leak into a
use-after-free. With the NULL, any later glfs_fd_destroy from a
leaked frame's PUT skips fd_unref and does a clean GF_FREE.
TLA+ model (gfapi_fd_drain_actual.tla) verified by TLC:
ALL 6 invariants HOLD (129 states, 56 distinct, depth 9).
Three-iteration verification loop discovered and fixed the
fd_unref+NULL regression before it shipped.
This is Fix B (companion to Fix A in PR gluster#4667). Fix A blocks
new opens during fini (Bug A). This fix drains pre-existing
opens that were never closed (Bug B).
Fixes: gluster#4668
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
glfs_fini()never iteratesfs->openfdsto close file descriptors before tearing down the xlator graph. If a consumer callsglfs_fini()with open fds,inode_table_destroy_allforce-frees inodes ("approach 2") whilefd_tobjects still hold references, causing dangling pointers and leakedglfs_fd_tobjects.This patch adds
glfs_drain_openfds()which:fs->openfdsunderglfs_lockGLFD_CLOSEand callsgf_dirent_freefd_unref(glfd->fd)to release the inode ref while the table is aliveglfd->fdso eventual leaked-frame destroy is a cleanGF_FREEGF_REF_PUT(glfd)to trigger destructiongf_smsgat WARNING levelDepends on: PR #4667 (the
shutting_downgate — Fix A). This PR is Fix B.Relationship to #4667
shutting_downgate blocks new opensfs->openfdsand close each fdTLA+ Formal Verification
Four TLA+ models verify the fix across 8 TLC runs:
NoLeakedFdsNoLeakedFdsThe fd_unref+NULL mitigation is critical: without it, the drain converts a silent leak into a use-after-free when a leaked frame holds an extra
GF_REF_GET. With the NULL, the eventualglfs_fd_destroyskipsfd_unrefand does a cleanGF_FREE.Full models and investigation: https://gist.github.com/ThalesBarretto/2b0873838e451725617dc492e304d1a2
Fixes: #4668
Evidence
pub_glfs_fininever referencesopenfds(exhaustive grep, 10 hits inapi/src/, none in fini)inode_table_destroyforce-frees atinode.c:1898, destroysfd_mem_poolatinode.c:1908glfs_fini: "racy and crashes" (3b90883ce7, 2019)Test plan
--enable-debug, verify no compiler warningsprove -vmfe '/bin/bash' tests/basic/gfapi/(existing gfapi tests)glfs_new→glfs_init→glfs_open(no close) →glfs_finiAPI_MSG_FINI_FD_DRAINwith count, clean shutdownglfs_finibefore completionglfs_fd_destroy→fd_unrefGF_FREE, no crash