A high-performance glob pattern matcher for Java strings with zero dependencies.
This library provides a direct implementation of glob pattern matching, offering a safer and faster alternative to regex-based glob implementations. Unlike regular expressions, glob patterns are simple, intuitive, and free from catastrophic backtracking issues.
Most glob implementations convert patterns to regular expressions, which has several drawbacks:
- Complexity: Escaping regex metacharacters correctly is non-trivial (including
\Qand\Esequences) - Performance: Regex engines can suffer catastrophic backtracking with patterns like
*a*b*c*on long non-matching strings - Safety: Greedy matching can cause unexpected performance degradation
This library implements glob matching directly, providing:
- Better Performance: 1.5x faster than regex for typical patterns (see benchmarks)
- Predictable Behavior: Non-greedy matching prevents backtracking issues
- Type Safety: Compile-time pattern validation
A glob is a simple pattern matching syntax where:
*(or%) matches zero or more characters?(or_) matches exactly one character
Common uses:
- Unix shells:
*.txt,file?.log - SQL LIKE:
%pattern%,user_
Globs are simpler than regular expressions but sufficient for many matching tasks. See Wikipedia for more details.
- β Zero Dependencies: No runtime dependencies (test and benchmark tools only)
- β Thread-Safe: Compiled matchers can be safely shared across threads
- β High Performance: 1.5x faster than regex for typical patterns
- β Low Memory: Minimal memory footprint with efficient compiled patterns
- β 100% Test Coverage: Comprehensive JUnit test suite
- β Well Documented: Full JavaDoc and benchmarks included
- β
Flexible: Supports Unix (
*,?) and SQL (%,_) syntax - β Smart Optimizations: Automatic selection of optimal matching engine
implementation 'com.hrakaroo:glob:0.9.0'<dependency>
<groupId>com.hrakaroo</groupId>
<artifactId>glob</artifactId>
<version>0.9.0</version>
</dependency>import com.hrakaroo.glob.GlobPattern;
import com.hrakaroo.glob.MatchingEngine;
// Compile pattern once
MatchingEngine matcher = GlobPattern.compile("dog*cat\\*goat??");
// Use many times (thread-safe)
matcher.matches("dog horse cat*goat!~"); // true
matcher.matches("dogcat*goat.."); // true
matcher.matches("dog catgoat!/"); // falseMatchingEngine matcher = GlobPattern.compile(
"dog%cat\\%goat_",
'%', // wildcard (zero or more)
'_', // match one
GlobPattern.HANDLE_ESCAPES
);
matcher.matches("dog horse cat%goat!"); // true
matcher.matches("dogcat%goat."); // trueMatchingEngine matcher = GlobPattern.compile(
"Hello*World",
'*',
'?',
GlobPattern.CASE_INSENSITIVE | GlobPattern.HANDLE_ESCAPES
);
matcher.matches("hello beautiful world"); // true
matcher.matches("HELLO WORLD"); // true// Disable wildcards (exact match only)
MatchingEngine exact = GlobPattern.compile(
"exact_string",
'\0', // disable wildcard
'\0', // disable match-one
0
);
// Custom characters
MatchingEngine custom = GlobPattern.compile(
"foo#bar@",
'#', // use # as wildcard
'@', // use @ as match-one
GlobPattern.HANDLE_ESCAPES
);When GlobPattern.HANDLE_ESCAPES is enabled, the following escape sequences are supported:
| Escape | Result | Description |
|---|---|---|
\\* |
* |
Literal asterisk |
\\% |
% |
Literal percent |
\\? |
? |
Literal question mark |
\\_ |
_ |
Literal underscore |
\\n |
newline | Line feed |
\\r |
return | Carriage return |
\\t |
tab | Tab character |
\\\\ |
\ |
Literal backslash |
\\uXXXX |
Unicode | Unicode character (hex) |
Example:
GlobPattern.compile("file\\*.txt"); // Matches "file*.txt" literally
GlobPattern.compile("line1\\nline2"); // Matches string with newline
GlobPattern.compile("\\u0041BC"); // Matches "ABC"Higher scores indicate better throughput. Benchmarks designed to prevent optimization shortcuts.
Benchmark Mode Cnt Score Error Units
Benchmark1.globWords thrpt 10 19.460 Β± 0.967 ops/s
Benchmark1.greedyRegexWords thrpt 10 12.609 Β± 0.339 ops/s
Benchmark1.nonGreedyRegexWords thrpt 10 13.291 Β± 0.303 ops/s
Result: Glob is 1.5x faster than regex
Benchmark Mode Cnt Score Error Units
Benchmark1.globLogLines thrpt 10 10.707 Β± 0.204 ops/s
Benchmark1.greedyRegexLogLines thrpt 10 8.598 Β± 0.247 ops/s
Benchmark1.nonGreedyRegexLogLines thrpt 10 8.409 Β± 0.162 ops/s
Result: Glob is 1.2x faster than regex
Benchmark Mode Cnt Score Error Units
Benchmark1.globCompare thrpt 10 179.345 Β± 3.151 ops/s
Benchmark1.globCompareCaseInsensitive thrpt 10 169.957 Β± 23.889 ops/s
Benchmark1.stringCompare thrpt 10 211.104 Β± 3.435 ops/s
Benchmark1.stringCompareCaseInsensitive thrpt 10 126.214 Β± 5.041 ops/s
Result: Glob optimization makes it competitive with String.equals() and faster than String.equalsIgnoreCase()
./gradlew jmhThe library automatically selects the most efficient matching engine based on the pattern:
| Pattern Type | Engine | Example | Optimization |
|---|---|---|---|
| Empty | EmptyOnlyEngine |
"" |
Matches empty strings only |
| Match all | EverythingEngine |
* |
Always returns true |
| Exact match | EqualToEngine |
foo |
Simple character comparison |
| Starts with | StartsWithEngine |
foo* |
Prefix matching |
| Ends with | EndsWithEngine |
*foo |
Suffix matching |
| Contains | ContainsEngine |
*foo* |
Substring search |
| Complex | GlobEngine |
*foo*bar* |
Full glob matching |
- Non-greedy matching: Prevents catastrophic backtracking
- Compile-time optimization: Pattern processing happens once during compilation
- Multiple wildcard folding:
**becomes*at compile time - No recursion: Stack-based algorithm for predictable performance
- Minimal allocations: Reuses compiled pattern data structures
- Thread-safe: Immutable compiled patterns
// Default: Unix-style with escapes
MatchingEngine compile(String pattern)
// Custom wildcard and match-one characters
MatchingEngine compile(
String pattern,
char wildcardChar, // e.g., '*' or '%'
char matchOneChar, // e.g., '?' or '_'
int flags // CASE_INSENSITIVE | HANDLE_ESCAPES
)GlobPattern.CASE_INSENSITIVE- Enable case-insensitive matchingGlobPattern.HANDLE_ESCAPES- Enable escape sequence processing- Use
|to combine:CASE_INSENSITIVE | HANDLE_ESCAPES
GlobPattern.NULL_CHARACTER('\0') - Use to disable wildcard or match-one features
boolean matches(String input) // Test if input matches pattern
int matchingSizeInBytes() // Memory usage estimate during matching
int staticSizeInBytes() // Static memory usage of compiled patternThe library has 100% test coverage verified by JaCoCo.
Run tests:
./gradlew testGenerate coverage report:
./gradlew jacocoTestReport
# Report at: build/reports/coverage/index.html# Clone repository
git clone https://github.com/hrakaroo/glob-library-java.git
cd glob-library-java
# Build
./gradlew build
# Run tests
./gradlew test
# Run benchmarks
./gradlew jmhContributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Write tests for your changes
- Ensure 100% test coverage (
./gradlew jacocoTestCoverageVerification) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This library is ideal for:
- File filtering: Match filenames against patterns
- Configuration matching: Match config keys or values
- Search functionality: Simple pattern-based search
- Database-like filtering: SQL LIKE functionality in Java
- Log filtering: Match log messages by pattern
- Input validation: Simple pattern-based validation
| Feature | glob-library-java | Java Regex | Java NIO PathMatcher | Spring AntPathMatcher |
|---|---|---|---|---|
| Performance | β Fast | |||
| Backtracking safety | β Yes | β No | β No | β Yes |
| Dependencies | β Zero | β Built-in | β Built-in (Java 7+) | β Spring Framework |
| Thread-safe | β Yes | β Yes | β Yes | β Yes |
| Use case | String matching | Full regex | File paths | URL/Path patterns |
| Pattern complexity | β Full regex | |||
| Learning curve | β Easy | β Complex | β Easy | β Easy |
This project is licensed under the MIT License - see the LICENSE.txt file for details.
Joshua Gerth - hrakaroo
- Designed to avoid the pitfalls of regex-based glob implementations
- Optimized for real-world Java applications
- No external libraries to minimize dependency conflicts
