From f-string SQL to Parameterized Queries: Preventing Injection in Database Scripts
The Scenario
You ask your AI to write a Python script that manages a user directory — searching for users by name, inserting new records, and deleting old ones. The model produces clean, readable code that uses f-strings to build SQL queries from user input. It passes every test you throw at it. It also lets any malicious user delete your entire database with a single crafted input.
The Raw AI Draft
Here is what a model like GPT-4 or Claude typically generates on the first attempt. It works perfectly with honest inputs — and is a textbook SQL injection vulnerability.
import sqlite3
def create_database():
conn = sqlite3.connect("users.db")
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
email TEXT,
role TEXT
)
""")
conn.commit()
conn.close()
def add_user(name, email, role):
conn = sqlite3.connect("users.db")
cursor = conn.cursor()
cursor.execute(f"INSERT INTO users (name, email, role) VALUES ('{name}', '{email}', '{role}')")
conn.commit()
conn.close()
print(f"Added user {name}")
def search_users(name):
conn = sqlite3.connect("users.db")
cursor = conn.cursor()
cursor.execute(f"SELECT * FROM users WHERE name LIKE '%{name}%'")
results = cursor.fetchall()
conn.close()
return results
def delete_user(user_id):
conn = sqlite3.connect("users.db")
cursor = conn.cursor()
cursor.execute(f"DELETE FROM users WHERE id = {user_id}")
conn.commit()
conn.close()
print(f"Deleted user {user_id}")
create_database()
add_user("Alice", "alice@example.com", "admin")
print(search_users("Alice"))The Code Smells
- SQL injection via f-string interpolation —
f"INSERT INTO users ... VALUES ('{name}', '{email}', '{role}')"pastes user input directly into the SQL string. An attacker can passname = "'; DROP TABLE users; --"and execute arbitrary SQL. This is the number one vulnerability in web application security and has been documented since 1998. - No input validation on any field — The
roleparameter accepts any string. There is no check that it is a valid role. An attacker can insert arbitrary values like"superadmin"or use the field as an injection vector. - Database connection leaks on exceptions — Every function opens a connection with
sqlite3.connect()and manually callsconn.close(). If the SQL execution throws an error,close()is never called. Repeated failures exhaust the process's file descriptor limit. - No transaction management —
conn.commit()is called after each operation, but there is norollback()on failure. A partially completed multi-step operation leaves the database in an inconsistent state. - No column constraints in the schema — The
userstable allows NULL values in every column. A user with no name, no email, and no role can be inserted without error. There is no UNIQUE constraint on email, so duplicate registrations go undetected. - Raw tuples returned to the caller —
cursor.fetchall()returns a list of tuples with no column names. The caller has to know that index 0 is the ID, index 1 is the name, and so on. This is fragile and unreadable. - Hardcoded database path —
"users.db"is hardcoded in every function. Testing against a scratch database requires editing the source code. - No logging —
print()with no structure. In a production system you cannot distinguish informational messages from errors, and you have no timestamps for diagnosis.
The Best Practices
Parameterized Queries. The fundamental defense against SQL injection is to never build SQL strings with user input. Instead, use parameterized queries (also called prepared statements) where the SQL structure and data values are sent separately to the database engine. In Python's sqlite3 module, this means using ? placeholders: cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,)). The database driver handles escaping and quoting. SQL injection becomes structurally impossible because user input can never be interpreted as SQL syntax.
Input Validation at the Application Layer. Parameterized queries prevent injection, but they do not prevent invalid data. If only three roles are valid in your system (admin, editor, viewer), enforce that in code before the data reaches the database. Use a CHECK constraint in the schema as a safety net, and validate in the application layer as the primary gate. This "defense in depth" approach means a bug in one layer does not compromise the other.
The Principle of Least Privilege. Database columns should constrain their values as tightly as possible. Use NOT NULL to prevent empty fields. Use UNIQUE to prevent duplicates. Use CHECK to restrict values to known-good sets. These constraints are not optional hardening — they are the schema communicating its own invariants.
Context Managers for Connection Lifecycle. Wrap database connections in a context manager (with block) that commits on success and rolls back on exception. This guarantees that connections are always closed, transactions are always finalized, and a crash mid-operation never leaves the database in a half-written state.
Structured Results. Return dictionaries (or dataclasses, or Pydantic models) instead of raw tuples. Named fields are self-documenting, less error-prone to consume, and trivially serializable to JSON for APIs. The small overhead of dict(zip(columns, row)) pays for itself immediately in readability and maintainability.
Type Coercion as a Defense Layer. For parameters that must be integers (like user_id), explicitly coerce the input with int(user_id) before using it in a query. This is not a substitute for parameterized queries — it is an additional barrier that catches malformed input before it reaches the database layer at all.
The Refactored Code
import sqlite3
import os
import logging
from contextlib import contextmanager
logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
logger = logging.getLogger(__name__)
# Database path from environment — avoids hardcoding filesystem assumptions
DB_PATH = os.getenv("USER_DB_PATH", "users.db")
@contextmanager
def get_connection(db_path: str = DB_PATH):
"""Context manager for database connections — guarantees cleanup on error."""
conn = sqlite3.connect(db_path)
conn.execute("PRAGMA foreign_keys = ON") # Enforce referential integrity
conn.execute("PRAGMA journal_mode = WAL") # Better concurrent read performance
try:
yield conn
conn.commit()
except Exception:
conn.rollback()
raise
finally:
conn.close()
def create_database(db_path: str = DB_PATH) -> None:
"""Initialize the database schema with constrained column definitions."""
with get_connection(db_path) as conn:
conn.execute("""
CREATE TABLE IF NOT EXISTS users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
email TEXT NOT NULL UNIQUE,
role TEXT NOT NULL DEFAULT 'viewer'
CHECK(role IN ('admin', 'editor', 'viewer'))
)
""")
logger.info("Database initialized")
def add_user(name: str, email: str, role: str = "viewer",
db_path: str = DB_PATH) -> int:
"""Insert a new user using parameterized queries. Returns the new user ID."""
# Validate role before touching the database — fail fast
allowed_roles = {"admin", "editor", "viewer"}
if role not in allowed_roles:
raise ValueError(f"Invalid role '{role}'. Must be one of: {allowed_roles}")
with get_connection(db_path) as conn:
# Parameterized query: the ? placeholders are filled by the database driver,
# which escapes all special characters. SQL injection is structurally impossible.
cursor = conn.execute(
"INSERT INTO users (name, email, role) VALUES (?, ?, ?)",
(name, email, role),
)
user_id = cursor.lastrowid
logger.info(f"Added user '{name}' (id={user_id}, role={role})")
return user_id
def search_users(name: str, db_path: str = DB_PATH) -> list[dict]:
"""Search for users by name using a parameterized LIKE query."""
with get_connection(db_path) as conn:
# The % wildcards are part of the parameter value, not the SQL string.
# This prevents injection through the search term.
cursor = conn.execute(
"SELECT id, name, email, role FROM users WHERE name LIKE ?",
(f"%{name}%",),
)
columns = [desc[0] for desc in cursor.description]
results = [dict(zip(columns, row)) for row in cursor.fetchall()]
logger.info(f"Search for '{name}' returned {len(results)} result(s)")
return results
def delete_user(user_id: int, db_path: str = DB_PATH) -> bool:
"""Delete a user by ID. Returns True if a row was actually deleted."""
# Coerce to int — if someone passes "1; DROP TABLE users", this raises ValueError
user_id = int(user_id)
with get_connection(db_path) as conn:
cursor = conn.execute(
"DELETE FROM users WHERE id = ?",
(user_id,),
)
deleted = cursor.rowcount > 0
if deleted:
logger.info(f"Deleted user id={user_id}")
else:
logger.warning(f"No user found with id={user_id}")
return deleted
if __name__ == "__main__":
create_database()
uid = add_user("Alice", "alice@example.com", "admin")
results = search_users("Alice")
for user in results:
print(user)The Benchmarks
| Metric | Before | After | Improvement |
|---|---|---|---|
| SQL injection vulnerability | Wide open | Structurally impossible | Eliminated |
| Survives malicious input | No — data destroyed | Yes — input safely escaped | 100% protection |
| Connection leak on error | Leaks every time | Guaranteed cleanup | Eliminated |
| Duplicate email insertion | Silently succeeds | Raises unique constraint error | Data integrity enforced |
| Invalid role insertion | Accepts anything | Validates against allowed set | Defense in depth |
The Prompt Tip
Write a Python script that manages a user database with SQLite. Requirements: use parameterized queries with ? placeholders for ALL SQL operations — never use f-strings or string concatenation to build SQL. Add a context manager for database connections that commits on success, rolls back on exception, and always closes the connection. Define the schema with NOT NULL constraints on all columns, a UNIQUE constraint on email, and a CHECK constraint that restricts the role column to a fixed set of values (admin, editor, viewer). Validate the role in the application layer before executing the INSERT. Return query results as a list of dictionaries with named columns instead of raw tuples. Coerce integer parameters with int() before using them in queries. Read the database file path from an environment variable with a sensible default. Use Python's logging module instead of print.