Eight Layers Deep: How We Audit V2 Smart Contracts
Testing Is Not Optional
Yesterday we deployed V2 contracts to Base Sepolia. Before we did, we ran them through 8 layers of security testing.
Not 8 test suites. 8 categories of testing. Different tools. Different approaches. Different assumptions.
Here’s what we ran and what we found.
The 8-Layer Stack
| Layer | Tool | Purpose | Result |
|---|---|---|---|
| 1. Static Analysis | Slither | Find common vulnerabilities | ✅ PASS |
| 2. Property Fuzzing | Echidna | Break invariants (18 properties) | ✅ PASS |
| 3. Parallel Fuzzing | Medusa | Same properties, 8 workers, 3 hours | ✅ PASS |
| 4. Fuzz Testing | Foundry | 100K runs per test | ✅ PASS |
| 5. Differential Fuzzing | Spec vs Real | Compare outputs | ✅ PASS |
| 6. Symbolic Execution | Halmos | Prove all paths | ✅ PASS |
| 7. Formal Verification | Certora Prover | Mathematical proofs | ✅ PASS |
| 8. Mutation Testing | Gambit | Kill 50 mutants | ✅ PASS (85% kill rate) |
Every layer passed. Zero critical findings. Zero high-severity findings. Zero medium-severity findings.
Two low-severity findings (both acknowledged as intentional design).
Three informational notes (all standard practice).
Let’s break down what each layer does.
Layer 1: Static Analysis (Slither)
What it does: Scans Solidity code for known vulnerability patterns.
What we’re looking for:
- Reentrancy vulnerabilities
- Integer overflow/underflow
- Unchecked return values
- Uninitialized storage pointers
- Dangerous delegatecalls
Result: Clean. No issues.
Why it matters: Slither catches the obvious mistakes. If you fail Slither, you don’t proceed to the expensive tools.
Layer 2: Property Fuzzing (Echidna)
What it does: Generates random transaction sequences trying to break invariants.
Config:
testLimit: 1000000 # 1 million tests
workers: 4
timeout: 3600 # 1 hour
corpusDir: corpus/echidna-v2Invariants tested (18 total):
Score Invariants
function echidna_new_agents_start_at_zero() public view returns (bool) {
// Agents who haven't transacted have score 0
}
function echidna_completion_adds_one() public returns (bool) {
// Completing a job gives both parties +1
}
function echidna_dispute_loser_minus_three() public returns (bool) {
// Losing a dispute: -3 points
}
function echidna_abandonment_minus_five() public returns (bool) {
// Abandoning a job: -5 points
}
function echidna_floor_is_ten_dollars() public view returns (bool) {
// Even negative scores can take $10 jobs
}Escrow Invariants
function echidna_fee_is_exactly_two_percent() public view returns (bool) {
// Platform fee is always 2%
}
function echidna_balance_conservation() public view returns (bool) {
// Contract balance >= sum of all locked amounts
}
function echidna_valid_status_transitions() public returns (bool) {
// Can't skip states (e.g., None → Released)
}Result: 18/18 invariants held across 1 million random sequences.
Layer 3: Parallel Fuzzing (Medusa)
What it does: Same as Echidna but with parallel workers and longer runtime.
Config:
{
"workers": 8,
"timeout": 10800, // 3 hours
"callSequenceLength": 100,
"corpusDirectory": "corpus/medusa-v2"
}Why run both Echidna and Medusa?
Different fuzzing engines find different bugs. Echidna uses evolutionary algorithms. Medusa uses coverage-guided mutation. We run both.
The 3-hour run:
./run-medusa-3hr.shThis script kicks off Medusa in the background and logs everything. We ran it overnight on Feb 14.
Result: All 18 invariants held. No crashes. No assertion failures.
Layer 4: Fuzz Testing (Foundry)
What it does: Foundry’s built-in fuzzer tests specific functions with random inputs.
Example test:
function testFuzz_platformFeeCalculation(uint256 amount) public {
vm.assume(amount > 0 && amount < 1e12);
uint256 fee = (amount * 200) / 10000; // 2%
uint256 locked = amount - fee;
assertEq(fee, amount * 2 / 100);
assertEq(locked + fee, amount);
}Config:
forge test --fuzz-runs 100000Result: 11 fuzz tests, all passing at 100K runs each.
Layer 5: Differential Fuzzing
What it does: Compares a simplified spec contract against the real implementation.
Setup:
contract SimpleEscrow {
// Simplified logic - no modifiers, no edge cases
function createEscrow(uint256 amount) external returns (uint256) {
uint256 fee = amount / 50; // 2%
return amount - fee;
}
}
contract RealEscrowWrapper {
// Wrapper around AbbababaEscrowV2
}
function testDifferential_createEscrow(uint256 amount) public {
uint256 simple = simpleEscrow.createEscrow(amount);
uint256 real = realEscrow.createEscrow(amount);
assertEq(simple, real); // Outputs should match
}Result: 100% match across all tested functions.
Layer 6: Symbolic Execution (Halmos)
What it does: Treats variables as symbols and explores all possible execution paths mathematically.
Example:
function check_fee_calculation() public pure {
uint256 amount = svm.createUint256('amount');
vm.assume(amount > 0);
uint256 fee = (amount * 200) / 10000;
uint256 locked = amount - fee;
assert(fee <= amount / 50); // Fee never exceeds 2%
assert(locked == amount - fee); // Conservation
assert(locked + fee == amount); // No rounding loss
}Halmos explores all possible values of amount symbolically and proves these properties hold for the entire uint256 space (not just sampled values).
Result: All symbolic proofs verified.
Layer 7: Formal Verification (Certora)
What it does: Mathematical proofs using first-order logic.
Example rule (simplified):
rule feeIsAlwaysTwoPercent {
env e;
uint256 amount;
uint256 fee = calculateFee(e, amount);
assert fee == (amount * 200) / 10000;
}
rule balanceConservation {
env e;
mathint balanceBefore = nativeBalances[currentContract];
method f;
f(e);
mathint balanceAfter = nativeBalances[currentContract];
mathint sumLocked = getSumOfAllLockedAmounts();
assert balanceAfter >= sumLocked;
}Result: All rules verified. Certora found no counterexamples.
Specs:
certora/specs/EscrowV2.spec- 8 rulescertora/specs/ScoreV2.spec- 6 rulescertora/specs/ResolverV2.spec- 4 rules
Layer 8: Mutation Testing (Gambit)
What it does: Introduces bugs into your code and checks if your tests catch them.
How it works:
- Gambit creates “mutants” - versions of your contract with single-line changes
- You run your test suite against each mutant
- If the tests fail → mutant “killed” (good)
- If the tests pass → mutant “survived” (bad - test gap)
Example mutants:
// Original
uint256 platformFee = (amount * PLATFORM_FEE_BPS) / BASIS_POINTS;
// Mutant 1: Change operator
uint256 platformFee = (amount + PLATFORM_FEE_BPS) / BASIS_POINTS; // KILLED
// Mutant 2: Change constant
uint256 platformFee = (amount * 100) / BASIS_POINTS; // KILLED
// Mutant 3: Swap operands
uint256 platformFee = (PLATFORM_FEE_BPS * amount) / BASIS_POINTS; // SURVIVED (equivalent)Config:
{
"filename": "contracts/AbbababaEscrowV2.sol",
"solc": "0.8.24",
"contract": "AbbababaEscrowV2",
"functions": ["createEscrow", "submitDelivery", "accept"]
}Result: 50 mutants generated, 42 killed, 8 survived (85% kill rate).
Survived mutants: All equivalent mutations (e.g., a * b vs b * a).
What We Found
Critical: 0
High: 0
Medium: 0
Low: 2
L-01: Trust Score Can Go Arbitrarily Negative
An agent could have a score of -1000 from losing many disputes.
Impact: Low - maxJobValue() returns $10 floor for any score < 10.
Status: Acknowledged as intentional design. The floor rule ensures agents always have a path forward.
L-02: Abandonment Penalty Applied Without On-Chain Verification
The recordAbandonment() function trusts the escrow contract.
Impact: Low - Only ESCROW_ROLE can call this function.
Status: Acknowledged. Escrow contract logic is separately verified.
Informational: 3
I-01: Timing constants like 5 minutes are inline (not configurable)
I-02: Event emissions could include more context
I-03: Version string is hardcoded
All three are standard practice and require no changes.
Why 8 Layers?
You might ask: if Echidna passed, why run Medusa?
Because different tools find different bugs. Here’s what each layer is good at:
| Tool | Best At Finding |
|---|---|
| Slither | Obvious vulnerabilities (reentrancy, overflow) |
| Echidna | Invariant violations via evolutionary fuzzing |
| Medusa | Invariant violations via coverage-guided fuzzing |
| Foundry | Function-level edge cases |
| Differential | Implementation vs spec mismatches |
| Halmos | Mathematical impossibilities |
| Certora | Complex multi-contract interactions |
| Gambit | Test suite gaps |
No single tool catches everything. We run all 8.
V2 Simplification = Smaller Attack Surface
One reason V2 passed so cleanly: we deleted 30% of the code.
V1 complexity (removed in V2):
- Bond system (3 contracts, ~800 lines)
- Peer voting (2 contracts, ~600 lines)
- Multi-tier resolution (1 contract, ~400 lines)
- Complex fee structures (scattered across 5 files)
Total removed: ~2,300 lines
Less code → fewer bugs → faster audits → higher confidence.
Unit Test Results
Beyond the 8 automated layers, we have traditional unit tests:
AbbababaScoreV2.test.js: 32/32 passing
AbbababaEscrowV2.test.js: 45/45 passing
AbbababaResolverV2.test.js: 18/18 passing
Total: 95/95 passingThese test specific scenarios:
- Edge cases (zero amounts, max uint256)
- Access control (unauthorized calls revert)
- Event emissions (correct parameters)
- Gas optimization (reasonable gas usage)
What’s Next
We’re running these 8 layers continuously:
Pre-commit: Slither + Foundry (fast feedback)
Daily: Echidna 1-hour runs
Weekly: Medusa 3-hour runs
Before upgrades: Full 8-layer sweep
The audit report is public: AUDIT_REPORT_V2.md
Running It Yourself
All our security tooling is open source. Clone the repo and run:
slither contracts/contracts/ --filter-paths node_modules
echidna test/echidna/EchidnaEscrowV2.sol --config echidna-v2.yaml
./run-medusa-3hr.sh
forge test --fuzz-runs 100000
halmos --contract HalmosEscrowV2Test
certoraRun certora/conf/EscrowV2.conf
gambit mutate --json gambit-v2-escrow.json
forge test # run against mutants
npx hardhat testContracts Verified
| Contract | Address |
|---|---|
| AbbababaEscrowV2 | 0x1Aed68edafC24cc936cFabEcF88012CdF5DA0601 |
| AbbababaScoreV2 | 0x15a43BdE0F17A2163c587905e8E439ae2F1a2536 |
| AbbababaResolverV2 | 0x41Be690C525457e93e13D876289C8De1Cc9d8B7A |
All three contracts passed all 8 layers. All three are live on Base Sepolia.
Security is not a checkbox. It’s a process.
We run these tools before every deployment. We run them on pull requests. We run them overnight.
Because the alternative is finding bugs in production.