From 'It Works' to Tested Code: Adding Type Hints and Unit Tests to AI Output
The Scenario
You ask your AI to write a Python utility module with functions for parsing timestamps, calculating date ranges, and formatting durations for display. The model produces a set of functions that work correctly for the examples you test manually. But the functions accept any type, return undocumented types, have no docstrings, and come with zero tests. Three months later, someone passes a None into parse_timestamp and the function produces a silent wrong result instead of a clear error. There is no test suite to catch it, no type checker to flag it, and no documentation to tell the caller what was expected.
The Raw AI Draft
Here is what a model like GPT-4 or Claude typically generates on the first attempt. Every function runs correctly for the happy path — and nothing about the code tells you what happens when the input is wrong.
from datetime import datetime, timedelta
def parse_timestamp(value):
if isinstance(value, datetime):
return value
try:
return datetime.strptime(value, "%Y-%m-%d %H:%M:%S")
except:
try:
return datetime.strptime(value, "%Y-%m-%d")
except:
return None
def days_between(start, end):
d1 = parse_timestamp(start)
d2 = parse_timestamp(end)
return (d2 - d1).days
def format_duration(seconds):
hours = seconds // 3600
minutes = (seconds % 3600) // 60
secs = seconds % 60
parts = []
if hours:
parts.append(f"{hours}h")
if minutes:
parts.append(f"{minutes}m")
parts.append(f"{secs}s")
return " ".join(parts)
def date_range(start, end):
current = parse_timestamp(start)
end_date = parse_timestamp(end)
dates = []
while current <= end_date:
dates.append(current.strftime("%Y-%m-%d"))
current += timedelta(days=1)
return datesThe Code Smells
- No type annotations on any function — Every function accepts
anyand returnsanyby default.parse_timestamp(value)could receive a string, integer,None, a list, or a database connection object. The function signature provides zero information about what is expected or what is returned. - Bare except clauses swallow all errors —
except:catches everything, includingKeyboardInterrupt,SystemExit, andMemoryError. If the input isNone, the function silently returnsNoneinstead of raising a clear error. This turns bugs into invisible data corruption. - Silent None returns instead of explicit errors —
parse_timestampreturnsNonefor unparseable inputs. Callers do not check forNone, sodays_between("bad-date", "2026-01-01")crashes withTypeError: unsupported operand type(s) for -: 'NoneType' and 'datetime'— a confusing error that does not mention the actual problem. - No docstrings — There is no documentation for what any function accepts, returns, or raises. A new developer reading this code has to mentally trace every path to understand the contract.
- No unit tests — Zero test coverage means bugs only surface when real data triggers them in production. Edge cases — empty strings, negative durations, reversed date ranges — are never exercised.
- No input validation —
format_durationaccepts negative numbers and silently produces garbage like-1h -59m -59s.date_rangewith end before start enters an infinite loop if the inputs happen to be datetime objects instead of strings. - Infinite loop with reversed date range — If
start > endindate_range, thewhile current <= end_dateloop never terminates. The function hangs forever, consuming CPU and memory. There is no guard against this. - Hardcoded date formats — The two supported formats are embedded inside the function. Adding ISO 8601 with T separator requires editing the parsing logic directly instead of extending a format list.
The Best Practices
Type Hints as Living Documentation. Type annotations like def parse_timestamp(value: str | datetime) -> datetime tell every caller exactly what a function accepts and what it returns. They serve triple duty: documentation for humans, input for static analysis tools like mypy, and autocomplete intelligence for IDEs. The cost is a few characters per function signature. The payoff is catching entire categories of bugs before the code runs.
Static Analysis with mypy. Running mypy --strict on a module catches type mismatches at development time. If someone passes None to a function annotated as str | datetime, mypy flags it as an error before any test runs. This is defense in depth: type hints document intent, mypy enforces it, and tests verify behavior. Each layer catches bugs the others miss.
Custom Exceptions Instead of Silent None. When a function cannot fulfill its contract, it should raise a specific, descriptive exception — not return None and hope the caller checks. TimestampParseError("Cannot parse 'foobar'") is immediately actionable. A None flowing through three more function calls before causing an unrelated TypeError is a debugging nightmare.
Boundary Testing with pytest. The most informative tests are not the happy-path cases — those work by accident. The tests that catch real bugs are at the boundaries: empty strings, zero values, negative numbers, reversed ranges, None inputs, and format edge cases. A good test suite for a utility module spends 80% of its effort on boundary conditions because that is where 80% of production bugs originate.
Test Organization with Classes. Group related tests into classes — TestParseTimestamp, TestFormatDuration, TestDateRange. This is not just organization for the developer; it is how pytest groups output, making it immediately clear which function broke when a test fails.
Docstrings as API Contracts. Every public function should document three things: what it accepts (Args), what it returns (Returns), and what it raises (Raises). This is the function's contract with its callers. When the docstring says "Raises: ValueError if seconds is negative," both the function and its tests are accountable to that promise.
Extensible Format Lists. Instead of hardcoding date formats inside parsing logic, define them as a module-level constant. Adding a new format is a one-line change to the list, not a structural change to the function. This is the Open/Closed Principle in miniature: the function is open to new formats without modification to its logic.
The Refactored Code
# --- timeutils.py ---
from datetime import datetime, timedelta
# Supported timestamp formats in order of attempted parsing
TIMESTAMP_FORMATS: list[str] = [
"%Y-%m-%d %H:%M:%S",
"%Y-%m-%d",
"%Y-%m-%dT%H:%M:%S",
"%Y-%m-%dT%H:%M:%SZ",
]
class TimestampParseError(Exception):
"""Raised when a string cannot be parsed into a datetime."""
def __init__(self, value: str, formats: list[str]) -> None:
self.value = value
self.formats = formats
super().__init__(
f"Cannot parse '{value}' as a timestamp. "
f"Tried formats: {', '.join(formats)}"
)
def parse_timestamp(value: str | datetime) -> datetime:
"""Parse a string or datetime into a datetime object.
Args:
value: A datetime object (returned as-is) or a string matching
one of the supported TIMESTAMP_FORMATS.
Returns:
A datetime object.
Raises:
TypeError: If value is not a str or datetime.
TimestampParseError: If the string does not match any known format.
"""
if isinstance(value, datetime):
return value
if not isinstance(value, str):
raise TypeError(
f"Expected str or datetime, got {type(value).__name__}"
)
for fmt in TIMESTAMP_FORMATS:
try:
return datetime.strptime(value, fmt)
except ValueError:
continue
raise TimestampParseError(value, TIMESTAMP_FORMATS)
def days_between(start: str | datetime, end: str | datetime) -> int:
"""Calculate the number of days between two timestamps.
Args:
start: The earlier timestamp (string or datetime).
end: The later timestamp (string or datetime).
Returns:
Integer number of days. Negative if end is before start.
Raises:
TypeError: If either argument is not a str or datetime.
TimestampParseError: If either string cannot be parsed.
"""
d1 = parse_timestamp(start)
d2 = parse_timestamp(end)
return (d2 - d1).days
def format_duration(seconds: int | float) -> str:
"""Format a duration in seconds into a human-readable string.
Args:
seconds: Non-negative number of seconds.
Returns:
A string like "2h 15m 30s". Returns "0s" for zero seconds.
Raises:
TypeError: If seconds is not an int or float.
ValueError: If seconds is negative.
"""
if not isinstance(seconds, (int, float)):
raise TypeError(
f"Expected int or float, got {type(seconds).__name__}"
)
if seconds < 0:
raise ValueError(f"Duration cannot be negative: {seconds}")
total = int(seconds)
hours = total // 3600
minutes = (total % 3600) // 60
secs = total % 60
parts: list[str] = []
if hours:
parts.append(f"{hours}h")
if minutes:
parts.append(f"{minutes}m")
# Always show seconds if it is the only component, or if there is a remainder
if secs or not parts:
parts.append(f"{secs}s")
return " ".join(parts)
def date_range(start: str | datetime, end: str | datetime) -> list[str]:
"""Generate a list of date strings from start to end (inclusive).
Args:
start: The first date in the range.
end: The last date in the range (inclusive).
Returns:
A list of date strings in YYYY-MM-DD format. Empty if end is before start.
Raises:
TypeError: If either argument is not a str or datetime.
TimestampParseError: If either string cannot be parsed.
"""
current = parse_timestamp(start)
end_date = parse_timestamp(end)
if current > end_date:
return []
dates: list[str] = []
while current <= end_date:
dates.append(current.strftime("%Y-%m-%d"))
current += timedelta(days=1)
return dates
# --- test_timeutils.py ---
import pytest
from datetime import datetime
# from timeutils import (
# parse_timestamp, days_between, format_duration,
# date_range, TimestampParseError,
# )
class TestParseTimestamp:
"""Tests for parse_timestamp covering valid, edge, and invalid inputs."""
def test_datetime_passthrough(self):
"""A datetime object is returned unchanged."""
dt = datetime(2026, 1, 15, 10, 30, 0)
assert parse_timestamp(dt) is dt
def test_full_timestamp_format(self):
"""Standard YYYY-MM-DD HH:MM:SS format parses correctly."""
result = parse_timestamp("2026-01-15 10:30:00")
assert result == datetime(2026, 1, 15, 10, 30, 0)
def test_date_only_format(self):
"""Date-only format defaults to midnight."""
result = parse_timestamp("2026-01-15")
assert result == datetime(2026, 1, 15, 0, 0, 0)
def test_iso_format_with_t(self):
"""ISO 8601 format with T separator is supported."""
result = parse_timestamp("2026-01-15T10:30:00")
assert result == datetime(2026, 1, 15, 10, 30, 0)
def test_iso_format_with_z(self):
"""ISO 8601 format with trailing Z is supported."""
result = parse_timestamp("2026-01-15T10:30:00Z")
assert result == datetime(2026, 1, 15, 10, 30, 0)
def test_invalid_string_raises_error(self):
"""A completely invalid string raises TimestampParseError."""
with pytest.raises(TimestampParseError) as exc_info:
parse_timestamp("not-a-date")
assert "not-a-date" in str(exc_info.value)
def test_none_raises_type_error(self):
"""None input raises TypeError, not a silent wrong result."""
with pytest.raises(TypeError, match="Expected str or datetime"):
parse_timestamp(None)
def test_integer_raises_type_error(self):
"""An integer input raises TypeError."""
with pytest.raises(TypeError):
parse_timestamp(1705312200)
def test_empty_string_raises_error(self):
"""An empty string raises TimestampParseError."""
with pytest.raises(TimestampParseError):
parse_timestamp("")
class TestDaysBetween:
"""Tests for days_between covering normal, edge, and error cases."""
def test_same_date_returns_zero(self):
assert days_between("2026-01-15", "2026-01-15") == 0
def test_one_day_apart(self):
assert days_between("2026-01-15", "2026-01-16") == 1
def test_negative_range(self):
"""End before start returns a negative number."""
assert days_between("2026-01-16", "2026-01-15") == -1
def test_mixed_formats(self):
"""Different input formats for start and end are handled."""
result = days_between("2026-01-15", "2026-01-20 12:00:00")
assert result == 5
def test_invalid_start_raises_error(self):
with pytest.raises(TimestampParseError):
days_between("garbage", "2026-01-15")
class TestFormatDuration:
"""Tests for format_duration covering boundary conditions."""
def test_zero_seconds(self):
assert format_duration(0) == "0s"
def test_seconds_only(self):
assert format_duration(45) == "45s"
def test_minutes_and_seconds(self):
assert format_duration(125) == "2m 5s"
def test_hours_minutes_seconds(self):
assert format_duration(3661) == "1h 1m 1s"
def test_exact_hours(self):
"""Exact hour boundaries should omit minutes and seconds."""
assert format_duration(7200) == "2h"
def test_float_seconds_truncated(self):
"""Float input is truncated to an integer."""
assert format_duration(3661.9) == "1h 1m 1s"
def test_negative_raises_value_error(self):
with pytest.raises(ValueError, match="negative"):
format_duration(-1)
def test_none_raises_type_error(self):
with pytest.raises(TypeError):
format_duration(None)
def test_string_raises_type_error(self):
with pytest.raises(TypeError):
format_duration("60")
class TestDateRange:
"""Tests for date_range covering normal, boundary, and edge cases."""
def test_single_day_range(self):
assert date_range("2026-01-15", "2026-01-15") == ["2026-01-15"]
def test_three_day_range(self):
result = date_range("2026-01-15", "2026-01-17")
assert result == ["2026-01-15", "2026-01-16", "2026-01-17"]
def test_reversed_range_returns_empty(self):
"""End before start returns an empty list instead of an infinite loop."""
assert date_range("2026-01-17", "2026-01-15") == []
def test_month_boundary(self):
"""Range correctly crosses a month boundary."""
result = date_range("2026-01-30", "2026-02-02")
assert result == ["2026-01-30", "2026-01-31", "2026-02-01", "2026-02-02"]
def test_leap_year(self):
"""February 29th is included in a leap year."""
result = date_range("2028-02-28", "2028-03-01")
assert "2028-02-29" in resultThe Benchmarks
| Metric | Before | After | Improvement |
|---|---|---|---|
| Input type errors caught before runtime | 0 | All (via mypy --strict) | 100% coverage |
| Edge-case bugs caught by tests | 0 (discovered in production) | 27 test cases covering boundaries | From zero to comprehensive |
| Behavior on invalid input | Silent None or infinite loop | Clear exception with message | Debuggable in seconds |
| Time to understand function contract | Read entire implementation | Read 3-line docstring | 10x faster onboarding |
| Supported timestamp formats | 2 (hardcoded) | 4 (extensible list) | Add formats without code change |
The Prompt Tip
Write a Python utility module called timeutils.py with functions for parsing timestamps, calculating days between dates, formatting durations, and generating date ranges. Then write a comprehensive pytest test suite called test_timeutils.py. Requirements: add type hints to every function parameter and return value using str | datetime union types where applicable. Write docstrings with Args, Returns, and Raises sections for every function. Define a custom TimestampParseError exception with the failed value and attempted formats in the message. Never return None for parse failures — raise the custom exception instead. Validate input types explicitly and raise TypeError for wrong types. Validate value constraints (e.g., negative durations) and raise ValueError. Define supported timestamp formats as a module-level TIMESTAMP_FORMATS list. Guard against infinite loops by returning an empty list when the date range is reversed. For the test suite: organize tests into classes per function (TestParseTimestamp, TestDaysBetween, TestFormatDuration, TestDateRange). Include boundary tests: empty strings, None inputs, negative values, zero duration, reversed ranges, month boundaries, leap years, and format edge cases. Each test should have a descriptive docstring explaining what specific behavior it verifies.