gfapi: drain open fds in glfs_fini before graph teardown by ThalesBarretto · Pull Request #4669 · gluster/glusterfs · GitHub

ThalesBarretto · 2026-04-14T05:19:49Z

Summary

glfs_fini() never iterates fs->openfds to close file descriptors before tearing down the xlator graph. If a consumer calls glfs_fini() with open fds, inode_table_destroy_all force-frees inodes ("approach 2") while fd_t objects still hold references, causing dangling pointers and leaked glfs_fd_t objects.

This patch adds glfs_drain_openfds() which:

Pops each fd from fs->openfds under glfs_lock
Marks it GLFD_CLOSE and calls gf_dirent_free
Calls fd_unref(glfd->fd) to release the inode ref while the table is alive
NULLs glfd->fd so eventual leaked-frame destroy is a clean GF_FREE
Calls GF_REF_PUT(glfd) to trigger destruction
Logs the count via gf_smsg at WARNING level

Depends on: PR #4667 (the shutting_down gate — Fix A). This PR is Fix B.

Relationship to #4667

	PR #4667 (Fix A)	This PR (Fix B)
Bug	New fd opened during fini	Pre-existing fd not closed before fini
Fix	`shutting_down` gate blocks new opens	Drain `fs->openfds` and close each fd
Issue	#4666	#4668

TLA+ Formal Verification

Four TLA+ models verify the fix across 8 TLC runs:

Model	Invariant	Result
Base (no fix)	`NoLeakedFds`	VIOLATED depth 7
Gate only (#4667)	`NoLeakedFds`	VIOLATED depth 5
Gate + drain + fd_unref+NULL (this PR)	ALL 6 invariants	HOLD (129 states, 56 distinct, depth 9)

The fd_unref+NULL mitigation is critical: without it, the drain converts a silent leak into a use-after-free when a leaked frame holds an extra GF_REF_GET. With the NULL, the eventual glfs_fd_destroy skips fd_unref and does a clean GF_FREE.

Full models and investigation: https://gist.github.com/ThalesBarretto/2b0873838e451725617dc492e304d1a2

Fixes: #4668

Evidence

pub_glfs_fini never references openfds (exhaustive grep, 10 hits in api/src/, none in fini)
inode_table_destroy force-frees at inode.c:1898, destroys fd_mem_pool at inode.c:1908
Two test files comment out glfs_fini: "racy and crashes" (3b90883ce7, 2019)
Samba leaked 13,000 fds/day/brick (BZ #16043, fixed consumer-side by MR !4474)
The "further cleanups underway" from glfs_fini's first implementation (2013) never materialized

Test plan

Build with --enable-debug, verify no compiler warnings
Run prove -vmfe '/bin/bash' tests/basic/gfapi/ (existing gfapi tests)
Write a test program: glfs_new → glfs_init → glfs_open (no close) → glfs_fini
- Without patch: WARNING about active inodes, potential crash
- With patch: WARNING log API_MSG_FINI_FD_DRAIN with count, clean shutdown
Test with leaked frames: start async I/O, call glfs_fini before completion
- Without patch: use-after-free in glfs_fd_destroy → fd_unref
- With patch: clean GF_FREE, no crash
Verify Samba VFS module still works (sync-only consumer, drain should be transparent)

glfs_open() and 30+ other gfapi entry points can succeed while glfs_fini() is tearing down the xlator graph, because __GLFS_ENTRY_VALIDATE_FS only checks if the fs pointer is NULL. There is no shutdown gate. A thread calling glfs_open() between fini's drain-loop exit and inode_table_destroy_all() will create a file descriptor referencing freed inodes and xlator private data — a use-after-free. This race has been discussed informally in the GlusterFS community for over 10 years (see issue gluster#404, "Implement proper cleanup sequence", filed 2018 and closed without resolution). The drain loop comment at line 1303 ("leaked frames may exist, we ignore") acknowledges the symptom. A TLA+ formal model of the gfapi session lifecycle proves the race exists at protocol level: splitting glfs_fini into two non-atomic steps (set flag, transition state) and adding a ghost variable to track "stale fds" (fds opened after shutdown begins) shows an invariant violation at depth 5. The real C code is worse than the model — no flag is set at fini entry at all. Fix: add a shutting_down flag to struct glfs, set it at the top of pub_glfs_fini (before the drain loop), and check it at two choke points: 1. __GLFS_ENTRY_VALIDATE_FS (fast path, no mutex — safe because the flag is monotonic: only transitions false -> true) 2. priv_glfs_active_subvol (authoritative gate, under fs->mutex) All 30+ gfapi entry points flow through both checks. Cleanup paths (glfs_fd_destroy, glfs_subvol_done) use glfs_lock with _gf_false and are unaffected. Since glfs_fini itself would be rejected by the new gate in priv_glfs_active_subvol, replace its call to glfs_active_subvol with a direct read of fs->active_subvol under the mutex. The old_subvol handling from priv_glfs_active_subvol is mirrored so a pending old graph still gets its PARENT_DOWN. This patch addresses Bug A (no new operations after fini starts). Bug B (in-flight operations between drain-loop exit and graph destruction) is a separate, more complex fix that requires a barrier or re-check mechanism. Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com> # api/src/glfs-internal.h | 5 +++++ # api/src/glfs-resolve.c | 11 +++++++++++ # api/src/glfs.c | 43 ++++++++++++++++++++++++++++++++++++++++++- # 3 files changed, 58 insertions(+), 1 deletion(-)

glfs_fini() never iterates fs->openfds. If a consumer calls fini with open file descriptors, inode_table_destroy_all force-frees inodes ("approach 2", inode.c:1835) while fd_t objects still hold references. The result is dangling pointers, leaked glfs_fd_t objects, and unbounded memory growth. Add glfs_drain_openfds() which pops each fd from the openfds list, marks it GLFD_CLOSE, releases the fd_t's inode ref via fd_unref while the inode table is still alive, NULLs glfd->fd, and calls GF_REF_PUT to trigger destruction. The fd_unref+NULL step is critical: it makes the drain safe even when a leaked frame holds an extra GF_REF_GET (refcount > 1). Without it, the drain would convert a silent leak into a use-after-free. With the NULL, any later glfs_fd_destroy from a leaked frame's PUT skips fd_unref and does a clean GF_FREE. TLA+ model (gfapi_fd_drain_actual.tla) verified by TLC: ALL 6 invariants HOLD (129 states, 56 distinct, depth 9). Three-iteration verification loop discovered and fixed the fd_unref+NULL regression before it shipped. This is Fix B (companion to Fix A in PR gluster#4667). Fix A blocks new opens during fini (Bug A). This fix drains pre-existing opens that were never closed (Bug B). Fixes: gluster#4668 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com>

ThalesBarretto and others added 2 commits April 12, 2026 19:38

ThalesBarretto mentioned this pull request Apr 14, 2026

gfapi: glfs_fini does not close open file descriptors (no openfds drain) #4668

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gfapi: drain open fds in glfs_fini before graph teardown#4669

gfapi: drain open fds in glfs_fini before graph teardown#4669
ThalesBarretto wants to merge 2 commits into
gluster:develfrom
ThalesBarretto:fix-gfapi-fd-drain

ThalesBarretto commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ThalesBarretto commented Apr 14, 2026

Summary

Relationship to #4667

TLA+ Formal Verification

Evidence

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant