GoSQLX
High-Performance SQL Parser for Go
Production-ready, high-performance SQL parsing SDK for Go Zero-copy tokenization * Object pooling * Multi-dialect engine * Query transforms * WASM playground * Python bindings
New to GoSQLX? Get Started in 5 Minutes -
Installation * Quick Start * Documentation * Examples * Benchmarks
Overview
GoSQLX is a high-performance SQL parsing library designed for production use. It provides zero-copy tokenization, intelligent object pooling, and comprehensive SQL dialect support while maintaining a simple, idiomatic Go API.
Key Features
- Blazing Fast: ~50% faster parsing via token type overhaul; 1.25M+ ops/sec peak throughput
- Memory Efficient: 60-80% reduction through intelligent object pooling
- Thread-Safe: Race-free, linear scaling to 128+ cores, 0 race conditions detected
- Multi-Dialect Engine (v1.8.0): First-class dialect support with
ParseWithDialect()-- PostgreSQL, MySQL, SQL Server, Oracle, SQLite, Snowflake - MySQL Syntax (v1.8.0): SHOW, DESCRIBE, REPLACE INTO, ON DUPLICATE KEY UPDATE, GROUP_CONCAT, MATCH AGAINST, REGEXP/RLIKE
- Query Transform API (v1.8.0): Programmatic SQL rewriting -- add WHERE clauses, columns, JOINs, pagination via composable rules (
pkg/transform/) - WASM Playground (v1.8.0): Browser-based SQL parsing, formatting, linting via WebAssembly
- Comment Preservation (v1.8.0): SQL comments survive parse-format round-trips
- AST-to-SQL Roundtrip (v1.8.0):
SQL()methods on all AST nodes for full serialization - AST-based Formatter (v1.8.0): Configurable SQL formatter with CompactStyle/ReadableStyle presets
- Error Recovery (v1.8.0): Multi-error parsing with
ParseWithRecovery()for IDE-quality diagnostics - Complete JOIN Support: All JOIN types (INNER/LEFT/RIGHT/FULL OUTER/CROSS/NATURAL) with proper tree logic
- Advanced SQL Features: CTEs with RECURSIVE support, Set Operations (UNION/EXCEPT/INTERSECT)
- Window Functions: Complete SQL-99 window function support with OVER clause, PARTITION BY, ORDER BY, frame specs
- MERGE Statements: Full SQL:2003 MERGE support with WHEN MATCHED/NOT MATCHED clauses
- Grouping Operations: GROUPING SETS, ROLLUP, CUBE (SQL-99 T431)
- PostgreSQL Extensions: LATERAL JOIN, DISTINCT ON, FILTER clause, JSON/JSONB operators, aggregate ORDER BY,
::type casting, UPSERT, dollar-quoted strings - SQL Injection Detection: Built-in security scanner (
pkg/sql/security) with LIKE injection, blind injection, tautology detection - Unicode Support: Complete UTF-8 support for international SQL
- Zero-Copy: Direct byte slice operations, <1ms latency
- Intelligent Errors: Structured error codes with typo detection, context highlighting, and helpful hints
- Python Bindings: PyGoSQLX -- use GoSQLX from Python via ctypes FFI, 100x+ faster than pure Python parsers
- Production Ready: Battle-tested with 0 race conditions detected, ~85% SQL-99 compliance, Apache-2.0 licensed
Performance & Quality Highlights (v1.9.0)
| ~50% | 1.25M+ | <1ms | 6 | 84%+ | 74 |
|---|---|---|---|---|---|
| Faster Parsing | Peak Ops/sec | Latency | SQL Dialects | Parser Coverage | New Commits |
v1.9.0 Released * SQLite PRAGMA * Tautology Detection * 19 Post-UAT Fixes * lint CI-gate * UNION false-positive fix
What's New in v1.9.0
| Feature | Description |
|---|---|
| SQLite PRAGMA | Fully parsed: bare (PRAGMA x), arg (PRAGMA x(n)), assignment (PRAGMA x=v) forms |
| WITHOUT ROWID | SQLite CREATE TABLE ... WITHOUT ROWID; reserved keywords valid as DDL column names |
| Tautology Detection | ScanSQL() detects 1=1, 'a'='a', col=col, OR TRUE - CRITICAL severity |
| UNION False-positive Fix | PatternUnionInjection (CRITICAL, system tables) vs PatternUnionGeneric (HIGH) |
| lint CI-gate | gosqlx lint now exits 1 on any violation -- usable in CI pipelines without --fail-on-warn |
| CLI Output Fixes | token_count, Query Size, CTE output, SELECT indentation, / validate output all corrected |
| Parser Fixes | KEY/INDEX in qualified names, NATURAL JOIN type, OVER window_name, backtick/bracket identifiers |
| E1009 | Dedicated error code ErrCodeUnterminatedBlockComment for unterminated /* ... */ comments |
See CHANGELOG.md for the complete list of 19 fixes in this release.
Project Stats
Installation
Library Installation
CLI Installation
go install github.com/ajitpratap0/GoSQLX/cmd/gosqlx@latest
# Or build from source
git clone https://github.com/ajitpratap0/GoSQLX.git
cd GoSQLX
go build -o gosqlx ./cmd/gosqlx
Python Bindings (PyGoSQLX)
Use GoSQLX from Python with native performance via ctypes FFI:
cd pkg/cbinding && ./build.sh && cd ../..
# Install the Python package
cd python && pip install .
result = pygosqlx.parse("SELECT * FROM users WHERE active = true")
print(result.statement_types) # ['SELECT']
tables = pygosqlx.extract_tables("SELECT * FROM users u JOIN orders o ON u.id = o.user_id")
print(tables) # ['users', 'orders']
See the full PyGoSQLX documentation for the complete API.
Requirements:
- Go 1.21 or higher
- Python 3.8+ (for Python bindings)
- No external dependencies for the Go library
Quick Start
CLI Usage
Inline SQL:
gosqlx validate "SELECT * FROM users WHERE active = true"
# Analyze SQL structure and complexity
gosqlx analyze "SELECT COUNT(*) FROM orders GROUP BY status"
File Processing:
gosqlx format -i query.sql
# Parse SQL to AST representation
gosqlx parse -f json complex_query.sql
Pipeline/Stdin:
echo "SELECT * FROM users" | gosqlx validate # Validate from pipe
gosqlx format query.sql | gosqlx validate # Chain commands
cat *.sql | gosqlx format | tee formatted.sql # Pipeline composition
Pipeline/Stdin Support (v1.6.0+):
echo "SELECT * FROM users" | gosqlx validate
cat query.sql | gosqlx format
cat complex.sql | gosqlx analyze --security
# Explicit stdin marker
gosqlx validate -
gosqlx format - < query.sql
# Input redirection
gosqlx validate < query.sql
gosqlx parse < complex_query.sql
# Full pipeline chains
cat query.sql | gosqlx format | gosqlx validate
echo "select * from users" | gosqlx format > formatted.sql
find . -name "*.sql" -exec cat {} \; | gosqlx validate
# Works on Windows PowerShell too!
Get-Content query.sql | gosqlx format
"SELECT * FROM users" | gosqlx validate
Cross-Platform Pipeline Examples:
cat query.sql | gosqlx format | tee formatted.sql | gosqlx validate
echo "SELECT 1" | gosqlx validate && echo "Valid!"
# Windows PowerShell
Get-Content query.sql | gosqlx format | Set-Content formatted.sql
"SELECT * FROM users" | gosqlx validate
# Git hooks (pre-commit)
git diff --cached --name-only --diff-filter=ACM "*.sql" | \
xargs cat | gosqlx validate --quiet
Language Server Protocol (LSP) (v1.6.0+):
gosqlx lsp
# With debug logging
gosqlx lsp --log /tmp/gosqlx-lsp.log
The LSP server provides real-time SQL intelligence for IDEs:
- Diagnostics: Real-time syntax error detection with position info
- Hover: Documentation for 60+ SQL keywords
- Completion: 100+ SQL keywords, functions, and 22 snippets
- Formatting: SQL code formatting via
textDocument/formatting - Document Symbols: SQL statement outline navigation
- Signature Help: Function signatures for 20+ SQL functions
- Code Actions: Quick fixes (add semicolon, uppercase keywords)
Linting (v1.6.0+):
gosqlx lint query.sql
# With auto-fix
gosqlx lint --fix query.sql
# Specific rules
gosqlx lint --rules L001,L002,L003 query.sql
Available rules (L001-L010):
L001: Trailing Whitespace (auto-fix)L002: Mixed Indentation (auto-fix)L003: Consecutive Blank Lines (auto-fix)L004: Indentation DepthL005: Line LengthL006: Column AlignmentL007: Keyword Case (auto-fix)L008: Comma PlacementL009: Aliasing ConsistencyL010: Redundant Whitespace (auto-fix)
IDE Integration:
{
"gosqlx.lsp.enable": true,
"gosqlx.lsp.path": "gosqlx"
}
require('lspconfig.configs').gosqlx = {
default_config = {
cmd = { 'gosqlx', 'lsp' },
filetypes = { 'sql' },
root_dir = function() return vim.fn.getcwd() end,
},
}
require('lspconfig').gosqlx.setup{}
Library Usage - Simple API
GoSQLX provides a simple, high-level API that handles all complexity for you:
import (
"fmt"
"log"
"github.com/ajitpratap0/GoSQLX/pkg/gosqlx"
)
func main() {
// Parse SQL in one line - that's it!
ast, err := gosqlx.Parse("SELECT * FROM users WHERE active = true")
if err != nil {
log.Fatal(err)
}
fmt.Printf("Successfully parsed %d statement(s)\n", len(ast.Statements))
}
That's it! Just 3 lines of code. No pool management, no manual cleanup - everything is handled for you.
More Examples
if err := gosqlx.Validate("SELECT * FROM users"); err != nil {
fmt.Println("Invalid SQL:", err)
}
// Parse multiple queries efficiently
queries := []string{
"SELECT * FROM users",
"SELECT * FROM orders",
}
asts, err := gosqlx.ParseMultiple(queries)
// Parse with timeout for long queries
ast, err := gosqlx.ParseWithTimeout(sql, 5*time.Second)
// Parse from byte slice (zero-copy)
ast, err := gosqlx.ParseBytes([]byte("SELECT * FROM users"))
Advanced Usage - Low-Level API
For performance-critical code that needs fine-grained control, use the low-level API:
import (
"fmt"
"github.com/ajitpratap0/GoSQLX/pkg/sql/tokenizer"
"github.com/ajitpratap0/GoSQLX/pkg/sql/parser"
)
func main() {
// Get tokenizer from pool (always return it!)
tkz := tokenizer.GetTokenizer()
defer tokenizer.PutTokenizer(tkz)
// Tokenize SQL
sql := "SELECT id, name FROM users WHERE age > 18"
tokens, err := tkz.Tokenize([]byte(sql))
if err != nil {
panic(err)
}
// Convert tokens
converter := parser.NewTokenConverter()
result, err := converter.Convert(tokens)
if err != nil {
panic(err)
}
// Parse to AST
p := parser.NewParser()
defer p.Release()
ast, err := p.Parse(result.Tokens)
if err != nil {
panic(err)
}
fmt.Printf("Parsed %d statement(s)\n", len(ast.Statements))
fmt.Printf("Statement type: %T\n", ast.Statements[0])
}
Note: The simple API has < 1% performance overhead compared to low-level API. Use the simple API unless you need fine-grained control.
Documentation
Comprehensive Guides
| Guide | Description |
|---|---|
| Getting Started | Get started in 5 minutes |
| Comparison Guide | GoSQLX vs SQLFluff, JSQLParser, pg_query |
| CLI Guide | Complete CLI documentation and usage examples |
| API Reference | Complete API documentation with examples |
| Usage Guide | Detailed patterns and best practices |
| Architecture | System design and internal architecture |
| Python Bindings | PyGoSQLX -- Python API, installation, and examples |
| Troubleshooting | Common issues and solutions |
Getting Started
| Document | Purpose |
|---|---|
| Production Guide | Deployment and monitoring |
| SQL Compatibility | Dialect support matrix |
| Migration Guide | v1.7.0 - v1.8.0 breaking changes |
| Security Analysis | Security assessment |
| LSP Guide | LSP server and IDE integration |
| Linting Rules | All 10 linting rules reference |
| Error Codes | Error code reference (E1001-E3004) |
| Upgrade Guide | Version upgrade instructions |
| Examples | Working code examples (including transform API) |
Quick Links
Advanced SQL Features
GoSQLX supports Common Table Expressions (CTEs) and Set Operations alongside complete JOIN support:
Common Table Expressions (CTEs)
sql := `
WITH sales_summary AS (
SELECT region, SUM(amount) as total
FROM sales
GROUP BY region
)
SELECT region FROM sales_summary WHERE total > 1000
`
// Recursive CTE for hierarchical data
sql := `
WITH RECURSIVE employee_tree AS (
SELECT employee_id, manager_id, name
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.employee_id, e.manager_id, e.name
FROM employees e
JOIN employee_tree et ON e.manager_id = et.employee_id
)
SELECT * FROM employee_tree
`
// Multiple CTEs in single query
sql := `
WITH regional AS (SELECT region, total FROM sales),
summary AS (SELECT region FROM regional WHERE total > 1000)
SELECT * FROM summary
`
Set Operations
sql := "SELECT name FROM users UNION SELECT name FROM customers"
// UNION ALL - combine results preserving duplicates
sql := "SELECT id FROM orders UNION ALL SELECT id FROM invoices"
// EXCEPT - set difference
sql := "SELECT product FROM inventory EXCEPT SELECT product FROM discontinued"
// INTERSECT - set intersection
sql := "SELECT customer_id FROM orders INTERSECT SELECT customer_id FROM payments"
// Left-associative parsing for multiple operations
sql := "SELECT a FROM t1 UNION SELECT b FROM t2 INTERSECT SELECT c FROM t3"
// Parsed as: (SELECT a FROM t1 UNION SELECT b FROM t2) INTERSECT SELECT c FROM t3
Complete JOIN Support
GoSQLX supports all JOIN types with proper left-associative tree logic:
sql := `
SELECT u.name, o.order_date, p.product_name, c.category_name
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
INNER JOIN products p ON o.product_id = p.id
RIGHT JOIN categories c ON p.category_id = c.id
WHERE u.active = true
ORDER BY o.order_date DESC
`
// Parse with the simple API (recommended)
tree, err := gosqlx.Parse(sql)
if err != nil {
panic(err)
}
// Access JOIN information
if selectStmt, ok := tree.Statements[0].(*ast.SelectStatement); ok {
fmt.Printf("Found %d JOINs:\n", len(selectStmt.Joins))
for i, join := range selectStmt.Joins {
fmt.Printf("JOIN %d: %s (left: %s, right: %s)\n",
i+1, join.Type, join.Left.Name, join.Right.Name)
}
}
Supported JOIN Types:
-
INNER JOIN- Standard inner joins -
LEFT JOIN/LEFT OUTER JOIN- Left outer joins -
RIGHT JOIN/RIGHT OUTER JOIN- Right outer joins -
FULL JOIN/FULL OUTER JOIN- Full outer joins -
CROSS JOIN- Cartesian product joins -
NATURAL JOIN- Natural joins (implicit ON clause) -
USING (column)- Single-column using clause
Advanced SQL Features (v1.4+)
MERGE Statements (SQL:2003 F312)
MERGE INTO target_table t
USING source_table s ON t.id = s.id
WHEN MATCHED THEN
UPDATE SET t.name = s.name, t.value = s.value
WHEN NOT MATCHED THEN
INSERT (id, name, value) VALUES (s.id, s.name, s.value)
`
ast, err := gosqlx.Parse(sql)
GROUPING SETS, ROLLUP, CUBE (SQL-99 T431)
sql := `SELECT region, product, SUM(sales)
FROM orders
GROUP BY GROUPING SETS ((region), (product), (region, product), ())`
// ROLLUP - hierarchical subtotals
sql := `SELECT year, quarter, month, SUM(revenue)
FROM sales
GROUP BY ROLLUP (year, quarter, month)`
// CUBE - all possible combinations
sql := `SELECT region, product, SUM(amount)
FROM sales
GROUP BY CUBE (region, product)`
Materialized Views
sql := `CREATE MATERIALIZED VIEW sales_summary AS
SELECT region, SUM(amount) as total
FROM sales GROUP BY region`
// Refresh materialized view
sql := `REFRESH MATERIALIZED VIEW CONCURRENTLY sales_summary`
// Drop materialized view
sql := `DROP MATERIALIZED VIEW IF EXISTS sales_summary`
SQL Injection Detection
// Create scanner
scanner := security.NewScanner()
// Scan for injection patterns
result := scanner.Scan(ast)
if result.HasCritical() {
fmt.Printf("Found %d critical issues!\n", result.CriticalCount)
for _, finding := range result.Findings {
fmt.Printf(" [%s] %s: %s\n",
finding.Severity, finding.Pattern, finding.Description)
}
}
// Detected patterns include:
// - Tautology (1=1, 'a'='a')
// - UNION-based injection
// - Time-based blind (SLEEP, WAITFOR DELAY)
// - Comment bypass (--, /**/)
// - Stacked queries
// - Dangerous functions (xp_cmdshell, LOAD_FILE)
Expression Operators (BETWEEN, IN, LIKE, IS NULL)
sql := `SELECT * FROM orders WHERE amount BETWEEN 100 AND 500`
// IN with subquery
sql := `SELECT * FROM users WHERE id IN (SELECT user_id FROM admins)`
// LIKE with pattern matching
sql := `SELECT * FROM products WHERE name LIKE '%widget%'`
// IS NULL / IS NOT NULL
sql := `SELECT * FROM users WHERE deleted_at IS NULL`
// NULLS FIRST/LAST ordering (SQL-99 F851)
sql := `SELECT * FROM users ORDER BY last_login DESC NULLS LAST`
PostgreSQL-Specific Features (v1.6+)
LATERAL JOIN - Correlated subqueries in FROM clause:
sql := `
SELECT u.name, recent_orders.order_date, recent_orders.total
FROM users u
LEFT JOIN LATERAL (
SELECT order_date, total
FROM orders
WHERE user_id = u.id
ORDER BY order_date DESC
LIMIT 1
) AS recent_orders ON true
`
ast, err := gosqlx.Parse(sql)
ORDER BY inside Aggregates - Ordered set functions:
sql := `SELECT STRING_AGG(name, ', ' ORDER BY name DESC NULLS LAST) FROM users`
// ARRAY_AGG with ORDER BY
sql := `SELECT ARRAY_AGG(value ORDER BY created_at, priority DESC) FROM items`
// JSON_AGG with ORDER BY
sql := `SELECT JSON_AGG(employee_data ORDER BY hire_date) FROM employees`
// Multiple aggregates with different orderings
sql := `
SELECT
department,
STRING_AGG(name, '; ' ORDER BY name ASC NULLS FIRST) AS employee_names,
ARRAY_AGG(salary ORDER BY salary DESC) AS salaries
FROM employees
GROUP BY department
`
ast, err := gosqlx.Parse(sql)
JSON/JSONB Operators - PostgreSQL JSON support:
sql := `SELECT data -> 'user' -> 'profile' ->> 'email' FROM users`
// Path operators for nested access
sql := `SELECT data #> '{address,city}', data #>> '{address,zipcode}' FROM users`
// Containment operators
sql := `SELECT * FROM users WHERE data @> '{"active": true}'`
sql := `SELECT * FROM users WHERE '{"admin": true}' <@ data`
// Combined JSON operators in complex queries
sql := `
SELECT
u.id,
u.data ->> 'name' AS user_name,
u.data -> 'settings' ->> 'theme' AS theme
FROM users u
WHERE u.data @> '{"verified": true}'
AND u.data ->> 'status' = 'active'
`
ast, err := gosqlx.Parse(sql)
DISTINCT ON - PostgreSQL unique row selection:
sql := `
SELECT DISTINCT ON (user_id) user_id, created_at, status
FROM orders
ORDER BY user_id, created_at DESC
`
ast, err := gosqlx.Parse(sql)
FILTER Clause - Conditional aggregation:
sql := `
SELECT
COUNT(*) AS total_orders,
COUNT(*) FILTER (WHERE status = 'completed') AS completed_orders,
SUM(amount) FILTER (WHERE region = 'US') AS us_revenue
FROM orders
`
ast, err := gosqlx.Parse(sql)
Examples
Multi-Dialect Support (v1.8.0)
// Parse with explicit dialect
ast, err := parser.ParseWithDialect("SHOW TABLES", "mysql")
// MySQL-specific syntax
ast, err = parser.ParseWithDialect(`
INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com')
ON DUPLICATE KEY UPDATE email = VALUES(email)
`, "mysql")
// PostgreSQL (default)
ast, err = parser.ParseWithDialect(`
SELECT * FROM users WHERE tags @> ARRAY['admin']
`, "postgresql")
// CLI with dialect flag
// gosqlx validate --dialect mysql "SHOW TABLES"
Query Transform API (v1.8.0)
// Parse SQL, add multi-tenant WHERE filter
stmt, _ := transform.ParseSQL("SELECT * FROM orders")
transform.AddWhere(stmt, "tenant_id = 42")
sql := transform.FormatSQL(stmt) // SELECT * FROM orders WHERE tenant_id = 42
// Composable rules
transform.Apply(stmt,
transform.AddWhereRule("active = true"),
transform.SetLimitRule(100),
transform.AddOrderByRule("created_at", "DESC"),
)
Unicode and International SQL
sql := `SELECT "Ming Qian ", "Nian Ling " FROM "yu-za-"`
// Russian
sql := `SELECT "imia", "vozrast" FROM "pol'zovateli"`
// Arabic
sql := `SELECT "lsm", "l`mr" FROM "lmstkhdmwn"`
// Emoji support
sql := `SELECT * FROM users WHERE status = ''`
Concurrent Processing
var wg sync.WaitGroup
for _, sql := range queries {
wg.Add(1)
go func(query string) {
defer wg.Done()
// Each goroutine gets its own tokenizer
tkz := tokenizer.GetTokenizer()
defer tokenizer.PutTokenizer(tkz)
tokens, _ := tkz.Tokenize([]byte(query))
// Process tokens...
}(sql)
}
wg.Wait()
}
Performance
v1.0.0 Performance Improvements
| Metric | Previous | v1.0.0 | Improvement |
|---|---|---|---|
| Sustained Throughput | 2.2M ops/s | 946K+ ops/s | Production Grade |
| Peak Throughput | 2.2M ops/s | 1.25M+ ops/s | Enhanced |
| Token Processing | 8M tokens/s | 8M+ tokens/s | Maintained |
| Simple Query Latency | 200ns | <280ns | Optimized |
| Complex Query Latency | N/A | <1ms (CTE/Set Ops) | New Capability |
| Memory Usage | Baseline | 60-80% reduction | -70% |
| SQL-92 Compliance | 40% | ~70% | +75% |
Latest Benchmark Results
BenchmarkParserSustainedLoad-16 946,583 1,057 ns/op 1,847 B/op 23 allocs/op
BenchmarkParserThroughput-16 1,252,833 798 ns/op 1,452 B/op 18 allocs/op
BenchmarkParserSimpleSelect-16 3,571,428 279 ns/op 536 B/op 9 allocs/op
BenchmarkParserComplexSelect-16 985,221 1,014 ns/op 2,184 B/op 31 allocs/op
BenchmarkCTE/SimpleCTE-16 524,933 1,891 ns/op 3,847 B/op 52 allocs/op
BenchmarkCTE/RecursiveCTE-16 387,654 2,735 ns/op 5,293 B/op 71 allocs/op
BenchmarkSetOperations/UNION-16 445,782 2,234 ns/op 4,156 B/op 58 allocs/op
BenchmarkTokensPerSecond-16 815,439 1,378 ns/op 8,847,625 tokens/sec
Performance Characteristics
| Metric | Value | Details |
|---|---|---|
| Sustained Throughput | 946K+ ops/sec | 30s load testing |
| Peak Throughput | 1.25M+ ops/sec | Concurrent goroutines |
| Token Rate | 8M+ tokens/sec | Sustained processing |
| Simple Query Latency | <280ns | Basic SELECT (p50) |
| Complex Query Latency | <1ms | CTEs/Set Operations |
| Memory | 1.8KB/query | Complex SQL with pooling |
| Scaling | Linear to 128+ | Perfect concurrency |
| Pool Efficiency | 95%+ hit rate | Effective reuse |
Run go test -bench=. -benchmem ./pkg/... for detailed performance analysis.
Testing
go test -race ./...
# Run benchmarks
go test -bench=. -benchmem ./...
# Generate coverage report
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out
# Run specific test suites
go test -v ./pkg/sql/tokenizer/
go test -v ./pkg/sql/parser/
Project Structure
GoSQLX/
+-- cmd/gosqlx/ # CLI tool (validate, format, parse, analyze, lint, lsp, optimize, action)
| +-- cmd/ # Core CLI commands
| +-- internal/ # Extracted sub-packages (lspcmd, actioncmd, optimizecmd, cmdutil)
+-- pkg/
| +-- models/ # Core data structures (tokens, spans, locations)
| +-- errors/ # Structured error handling with position tracking
| +-- config/ # Configuration management (YAML/JSON/env)
| +-- metrics/ # Performance monitoring and observability
| +-- gosqlx/ # High-level simple API (recommended entry point)
| +-- cbinding/ # C shared library bindings (for Python/FFI)
| +-- linter/ # SQL linting engine with 10 rules (L001-L010)
| +-- lsp/ # Language Server Protocol server for IDEs
| +-- transform/ # Query rewriting/transform API (v1.8.0)
| +-- formatter/ # Public SQL formatter package (v1.8.0)
| +-- advisor/ # Query optimization advisor with 12 rules
| +-- schema/ # Schema-aware validation
| +-- compatibility/ # API stability testing
| +-- sql/
| +-- tokenizer/ # Zero-copy lexical analysis with dialect support
| +-- parser/ # Recursive descent parser with dialect modes
| +-- ast/ # Abstract syntax tree with SQL() serialization
| +-- token/ # Token type definitions (int-based, v1.8.0)
| +-- keywords/ # Multi-dialect SQL keyword definitions
| +-- security/ # SQL injection detection with fuzz testing
| +-- monitor/ # SQL monitoring utilities
+-- wasm/ # WebAssembly build + browser playground (v1.8.0)
+-- python/ # PyGoSQLX - Python bindings via ctypes FFI
+-- examples/ # Usage examples (including transform examples)
+-- docs/ # Comprehensive documentation (20+ guides)
+-- vscode-extension/ # Official VSCode extension
Development
Prerequisites
- Go 1.21+
- Task - task runner (install:
go install github.com/go-task/task/v3/cmd/task@latest) - golangci-lint, staticcheck (for code quality, install:
task deps:tools)
Task Runner
This project uses Task as the task runner. Install with:
# Or: brew install go-task (macOS)
Building
task
# Build the project
task build
# Build the CLI binary
task build:cli
# Install CLI globally
task install
# Run all quality checks
task quality
# Run all tests
task test
# Run tests with race detection (recommended)
task test:race
# Clean build artifacts
task clean
Code Quality
task fmt
# Run go vet
task vet
# Run golangci-lint
task lint
# Run all quality checks (fmt, vet, lint)
task quality
# Full CI check (format, vet, lint, test:race)
task check
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
How to Contribute
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Guidelines
- Write tests for new features
- Ensure all tests pass with race detection
- Follow Go idioms and best practices
- Update documentation for API changes
- Add benchmarks for performance-critical code
Roadmap
| Phase | Version | Status | Highlights |
|---|---|---|---|
| Phase 1 | v1.1.0 | Complete | JOIN Support |
| Phase 2 | v1.2.0 | Complete | CTEs & Set Operations |
| Phase 2.5 | v1.3.0-v1.4.0 | Complete | Window Functions, MERGE, Grouping Sets |
| Phase 3 | v1.5.0-v1.6.0 | Complete | PostgreSQL Extensions, LSP, Linter |
| Phase 4 | v1.7.0 | Complete | Parser Enhancements, Schema-Qualified Names |
| Phase 5 | v1.8.0 | Complete | Dialect Engine, MySQL, Query Transforms, WASM, Token Overhaul |
| Phase 6 | v2.0.0 | Planned | Advanced Optimizations & Schema Intelligence |
Phase 5: Dialect Engine, Query Transforms & Tooling - v1.8.0
- Dialect Mode Engine -
ParseWithDialect(),--dialectCLI flag, 6 dialects - MySQL Syntax - SHOW, DESCRIBE, REPLACE INTO, ON DUPLICATE KEY UPDATE, GROUP_CONCAT, MATCH AGAINST, REGEXP
- Query Transform API -
pkg/transform/with WHERE, columns, JOINs, tables, LIMIT/OFFSET, ORDER BY manipulation - WASM Playground - Browser-based SQL parsing, formatting, linting via WebAssembly
- Comment Preservation - SQL comments survive parse-format round-trips
- AST-to-SQL Serialization -
SQL()methods on all AST nodes with roundtrip support - AST-based Formatter - CompactStyle/ReadableStyle presets with keyword casing options
- DDL Formatters - Format() for ALTER TABLE, CREATE INDEX/VIEW, DROP, TRUNCATE
- Error Recovery -
ParseWithRecovery()for multi-error IDE diagnostics - Dollar-Quoted Strings - PostgreSQL
$$body$$tokenizer support - Token Type Overhaul - ~50% faster parsing via O(1) integer token comparison
- Query Advisor - 12 optimization rules (OPT-001 through OPT-012)
- Schema Validation - NOT NULL, type compatibility, foreign key validation
- Snowflake Dialect - Keyword detection and support
- Apache-2.0 License - Relicensed from AGPL
Phase 4: Parser Enhancements & PostgreSQL Extensions - v1.7.0
- Schema-Qualified Names -
schema.tableanddb.schema.tableacross all DML/DDL - PostgreSQL Type Casting -
::operator for type casts - UPSERT -
INSERT ... ON CONFLICT DO UPDATE/NOTHING - ARRAY Constructors -
ARRAY[1, 2, 3]with subscript/slice operations - Regex Operators -
~,~*,!~,!~*for pattern matching - INTERVAL Expressions - Temporal literals
- FOR UPDATE/SHARE - Row-level locking clauses
- Positional Parameters -
$1,$2style placeholders - Python Bindings - PyGoSQLX with ctypes FFI, thread-safe, memory-safe
Phase 6: Advanced Optimizations & Schema Intelligence - v2.0.0
- Advanced Query Cost Estimation - Extended complexity analysis
- Schema Diff - Compare and generate migration scripts
- Entity-Relationship Extraction - Generate ER diagrams from DDL
- Stored Procedures - CREATE PROCEDURE/FUNCTION parsing
- PL/pgSQL - PostgreSQL procedural language
- T-SQL Extensions - PIVOT/UNPIVOT, CROSS/OUTER APPLY parsing
See ARCHITECTURE.md for detailed system design and CHANGELOG.md for version history
Community & Support
Community Health
Join Our Community
Get Help
| Channel | Purpose | Response Time |
|---|---|---|
| Bug Reports | Report issues | Community-driven |
| Feature Requests | Suggest improvements | Community-driven |
| Docs Issues | Fix docs | Community-driven |
| Q&A | Questions & help | Community-driven |
| Ideas | Propose features | Community-driven |
| Show & Tell | Share your project | Community-driven |
| Security | Report vulnerabilities privately | Best effort |
Contributors
Core Team
How to Contribute
We love your input! We want to make contributing as easy and transparent as possible.
Quick Contribution Guide
- Fork the repo and create a feature branch
- Make your changes following the patterns in CLAUDE.md
- Ensure tests pass with race detection (
go test -race ./...) - Update CHANGELOG.md and relevant docs
- Submit a PR -- CI runs automatically
Use Cases
| Industry | Use Case | Benefits |
|---|---|---|
| FinTech | SQL validation & auditing | Fast validation, compliance tracking |
| Analytics | Query parsing & optimization | Real-time analysis, performance insights |
| Security | SQL injection detection | Pattern matching, threat prevention |
| DevTools | IDE integration & linting | Syntax highlighting, auto-completion |
| Education | SQL learning platforms | Interactive parsing, error explanation |
| Migration | Cross-database migration | Dialect conversion, compatibility check |
| Python | SQL parsing in Python apps | Native speed via FFI, 100x+ faster than pure Python |
Who's Using GoSQLX
Using GoSQLX in production? Let us know!
Project Metrics
Performance Benchmarks
A[SQL Input] -->|946K+ ops/sec| B[Tokenizer]
B -->|8M+ tokens/sec| C[Parser]
C -->|Zero-copy| D[AST]
D -->|60-80% less memory| E[Output]
Support This Project
If GoSQLX helps your project, please consider:
Other Ways to Support
- Star this repository
- Tweet about GoSQLX
- Write a blog post
- Create a tutorial
- Report bugs
- Suggest features
- Submit PRs
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Built with by the GoSQLX Team
Copyright (c) 2024-2026 GoSQLX. All rights reserved.