Skip to content

gfapi: drain open fds in glfs_fini before graph teardown#4669

Open
ThalesBarretto wants to merge 2 commits into
gluster:develfrom
ThalesBarretto:fix-gfapi-fd-drain
Open

gfapi: drain open fds in glfs_fini before graph teardown#4669
ThalesBarretto wants to merge 2 commits into
gluster:develfrom
ThalesBarretto:fix-gfapi-fd-drain

Conversation

@ThalesBarretto
Copy link
Copy Markdown
Contributor

Summary

glfs_fini() never iterates fs->openfds to close file descriptors before tearing down the xlator graph. If a consumer calls glfs_fini() with open fds, inode_table_destroy_all force-frees inodes ("approach 2") while fd_t objects still hold references, causing dangling pointers and leaked glfs_fd_t objects.

This patch adds glfs_drain_openfds() which:

  • Pops each fd from fs->openfds under glfs_lock
  • Marks it GLFD_CLOSE and calls gf_dirent_free
  • Calls fd_unref(glfd->fd) to release the inode ref while the table is alive
  • NULLs glfd->fd so eventual leaked-frame destroy is a clean GF_FREE
  • Calls GF_REF_PUT(glfd) to trigger destruction
  • Logs the count via gf_smsg at WARNING level

Depends on: PR #4667 (the shutting_down gate — Fix A). This PR is Fix B.

Relationship to #4667

PR #4667 (Fix A) This PR (Fix B)
Bug New fd opened during fini Pre-existing fd not closed before fini
Fix shutting_down gate blocks new opens Drain fs->openfds and close each fd
Issue #4666 #4668

TLA+ Formal Verification

Four TLA+ models verify the fix across 8 TLC runs:

Model Invariant Result
Base (no fix) NoLeakedFds VIOLATED depth 7
Gate only (#4667) NoLeakedFds VIOLATED depth 5
Gate + drain + fd_unref+NULL (this PR) ALL 6 invariants HOLD (129 states, 56 distinct, depth 9)

The fd_unref+NULL mitigation is critical: without it, the drain converts a silent leak into a use-after-free when a leaked frame holds an extra GF_REF_GET. With the NULL, the eventual glfs_fd_destroy skips fd_unref and does a clean GF_FREE.

Full models and investigation: https://gist.github.com/ThalesBarretto/2b0873838e451725617dc492e304d1a2

Fixes: #4668

Evidence

  • pub_glfs_fini never references openfds (exhaustive grep, 10 hits in api/src/, none in fini)
  • inode_table_destroy force-frees at inode.c:1898, destroys fd_mem_pool at inode.c:1908
  • Two test files comment out glfs_fini: "racy and crashes" (3b90883ce7, 2019)
  • Samba leaked 13,000 fds/day/brick (BZ #16043, fixed consumer-side by MR !4474)
  • The "further cleanups underway" from glfs_fini's first implementation (2013) never materialized

Test plan

  • Build with --enable-debug, verify no compiler warnings
  • Run prove -vmfe '/bin/bash' tests/basic/gfapi/ (existing gfapi tests)
  • Write a test program: glfs_newglfs_initglfs_open (no close) → glfs_fini
    • Without patch: WARNING about active inodes, potential crash
    • With patch: WARNING log API_MSG_FINI_FD_DRAIN with count, clean shutdown
  • Test with leaked frames: start async I/O, call glfs_fini before completion
    • Without patch: use-after-free in glfs_fd_destroyfd_unref
    • With patch: clean GF_FREE, no crash
  • Verify Samba VFS module still works (sync-only consumer, drain should be transparent)

ThalesBarretto and others added 2 commits April 12, 2026 19:38
glfs_open() and 30+ other gfapi entry points can succeed while
glfs_fini() is tearing down the xlator graph, because
__GLFS_ENTRY_VALIDATE_FS only checks if the fs pointer is NULL.
There is no shutdown gate.  A thread calling glfs_open() between
fini's drain-loop exit and inode_table_destroy_all() will create
a file descriptor referencing freed inodes and xlator private
data — a use-after-free.

This race has been discussed informally in the GlusterFS community
for over 10 years (see issue gluster#404, "Implement proper cleanup
sequence", filed 2018 and closed without resolution).  The drain
loop comment at line 1303 ("leaked frames may exist, we ignore")
acknowledges the symptom.

A TLA+ formal model of the gfapi session lifecycle proves the
race exists at protocol level: splitting glfs_fini into two
non-atomic steps (set flag, transition state) and adding a ghost
variable to track "stale fds" (fds opened after shutdown begins)
shows an invariant violation at depth 5.  The real C code is
worse than the model — no flag is set at fini entry at all.

Fix: add a shutting_down flag to struct glfs, set it at the top
of pub_glfs_fini (before the drain loop), and check it at two
choke points:

 1. __GLFS_ENTRY_VALIDATE_FS (fast path, no mutex — safe because
    the flag is monotonic: only transitions false -> true)

 2. priv_glfs_active_subvol (authoritative gate, under fs->mutex)

All 30+ gfapi entry points flow through both checks.  Cleanup
paths (glfs_fd_destroy, glfs_subvol_done) use glfs_lock with
_gf_false and are unaffected.

Since glfs_fini itself would be rejected by the new gate in
priv_glfs_active_subvol, replace its call to glfs_active_subvol
with a direct read of fs->active_subvol under the mutex.  The
old_subvol handling from priv_glfs_active_subvol is mirrored so
a pending old graph still gets its PARENT_DOWN.

This patch addresses Bug A (no new operations after fini starts).
Bug B (in-flight operations between drain-loop exit and graph
destruction) is a separate, more complex fix that requires a
barrier or re-check mechanism.

Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com>
#  api/src/glfs-internal.h |  5 +++++
#  api/src/glfs-resolve.c  | 11 +++++++++++
#  api/src/glfs.c          | 43 ++++++++++++++++++++++++++++++++++++++++++-
#  3 files changed, 58 insertions(+), 1 deletion(-)
glfs_fini() never iterates fs->openfds.  If a consumer calls fini
with open file descriptors, inode_table_destroy_all force-frees
inodes ("approach 2", inode.c:1835) while fd_t objects still hold
references.  The result is dangling pointers, leaked glfs_fd_t
objects, and unbounded memory growth.

Add glfs_drain_openfds() which pops each fd from the openfds list,
marks it GLFD_CLOSE, releases the fd_t's inode ref via fd_unref
while the inode table is still alive, NULLs glfd->fd, and calls
GF_REF_PUT to trigger destruction.

The fd_unref+NULL step is critical: it makes the drain safe even
when a leaked frame holds an extra GF_REF_GET (refcount > 1).
Without it, the drain would convert a silent leak into a
use-after-free.  With the NULL, any later glfs_fd_destroy from a
leaked frame's PUT skips fd_unref and does a clean GF_FREE.

TLA+ model (gfapi_fd_drain_actual.tla) verified by TLC:
ALL 6 invariants HOLD (129 states, 56 distinct, depth 9).
Three-iteration verification loop discovered and fixed the
fd_unref+NULL regression before it shipped.

This is Fix B (companion to Fix A in PR gluster#4667).  Fix A blocks
new opens during fini (Bug A).  This fix drains pre-existing
opens that were never closed (Bug B).

Fixes: gluster#4668

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gfapi: glfs_fini does not close open file descriptors (no openfds drain)

1 participant