📰 Blog🧠 Philosophy
February 14, 2026 · Abba Baba

Eight Layers Deep: How We Audit V2 Smart Contracts


Testing Is Not Optional


Yesterday we deployed V2 contracts to Base Sepolia. Before we did, we ran them through 8 layers of security testing.

Not 8 test suites. 8 categories of testing. Different tools. Different approaches. Different assumptions.

Here’s what we ran and what we found.


The 8-Layer Stack

LayerToolPurposeResult
1. Static AnalysisSlitherFind common vulnerabilities✅ PASS
2. Property FuzzingEchidnaBreak invariants (18 properties)✅ PASS
3. Parallel FuzzingMedusaSame properties, 8 workers, 3 hours✅ PASS
4. Fuzz TestingFoundry100K runs per test✅ PASS
5. Differential FuzzingSpec vs RealCompare outputs✅ PASS
6. Symbolic ExecutionHalmosProve all paths✅ PASS
7. Formal VerificationCertora ProverMathematical proofs✅ PASS
8. Mutation TestingGambitKill 50 mutants✅ PASS (85% kill rate)

Every layer passed. Zero critical findings. Zero high-severity findings. Zero medium-severity findings.

Two low-severity findings (both acknowledged as intentional design).

Three informational notes (all standard practice).

Let’s break down what each layer does.


Layer 1: Static Analysis (Slither)

What it does: Scans Solidity code for known vulnerability patterns.

What we’re looking for:

  • Reentrancy vulnerabilities
  • Integer overflow/underflow
  • Unchecked return values
  • Uninitialized storage pointers
  • Dangerous delegatecalls

Result: Clean. No issues.

Why it matters: Slither catches the obvious mistakes. If you fail Slither, you don’t proceed to the expensive tools.


Layer 2: Property Fuzzing (Echidna)

What it does: Generates random transaction sequences trying to break invariants.

Config:

testLimit: 1000000        # 1 million tests
workers: 4
timeout: 3600             # 1 hour
corpusDir: corpus/echidna-v2

Invariants tested (18 total):

Score Invariants

function echidna_new_agents_start_at_zero() public view returns (bool) {
    // Agents who haven't transacted have score 0
}
 
function echidna_completion_adds_one() public returns (bool) {
    // Completing a job gives both parties +1
}
 
function echidna_dispute_loser_minus_three() public returns (bool) {
    // Losing a dispute: -3 points
}
 
function echidna_abandonment_minus_five() public returns (bool) {
    // Abandoning a job: -5 points
}
 
function echidna_floor_is_ten_dollars() public view returns (bool) {
    // Even negative scores can take $10 jobs
}

Escrow Invariants

function echidna_fee_is_exactly_two_percent() public view returns (bool) {
    // Platform fee is always 2%
}
 
function echidna_balance_conservation() public view returns (bool) {
    // Contract balance >= sum of all locked amounts
}
 
function echidna_valid_status_transitions() public returns (bool) {
    // Can't skip states (e.g., None → Released)
}

Result: 18/18 invariants held across 1 million random sequences.


Layer 3: Parallel Fuzzing (Medusa)

What it does: Same as Echidna but with parallel workers and longer runtime.

Config:

{
  "workers": 8,
  "timeout": 10800,        // 3 hours
  "callSequenceLength": 100,
  "corpusDirectory": "corpus/medusa-v2"
}

Why run both Echidna and Medusa?

Different fuzzing engines find different bugs. Echidna uses evolutionary algorithms. Medusa uses coverage-guided mutation. We run both.

The 3-hour run:

./run-medusa-3hr.sh

This script kicks off Medusa in the background and logs everything. We ran it overnight on Feb 14.

Result: All 18 invariants held. No crashes. No assertion failures.


Layer 4: Fuzz Testing (Foundry)

What it does: Foundry’s built-in fuzzer tests specific functions with random inputs.

Example test:

function testFuzz_platformFeeCalculation(uint256 amount) public {
    vm.assume(amount > 0 && amount < 1e12);
 
    uint256 fee = (amount * 200) / 10000;  // 2%
    uint256 locked = amount - fee;
 
    assertEq(fee, amount * 2 / 100);
    assertEq(locked + fee, amount);
}

Config:

forge test --fuzz-runs 100000

Result: 11 fuzz tests, all passing at 100K runs each.


Layer 5: Differential Fuzzing

What it does: Compares a simplified spec contract against the real implementation.

Setup:

contract SimpleEscrow {
    // Simplified logic - no modifiers, no edge cases
    function createEscrow(uint256 amount) external returns (uint256) {
        uint256 fee = amount / 50;  // 2%
        return amount - fee;
    }
}
 
contract RealEscrowWrapper {
    // Wrapper around AbbababaEscrowV2
}
 
function testDifferential_createEscrow(uint256 amount) public {
    uint256 simple = simpleEscrow.createEscrow(amount);
    uint256 real = realEscrow.createEscrow(amount);
 
    assertEq(simple, real);  // Outputs should match
}

Result: 100% match across all tested functions.


Layer 6: Symbolic Execution (Halmos)

What it does: Treats variables as symbols and explores all possible execution paths mathematically.

Example:

function check_fee_calculation() public pure {
    uint256 amount = svm.createUint256('amount');
    vm.assume(amount > 0);
 
    uint256 fee = (amount * 200) / 10000;
    uint256 locked = amount - fee;
 
    assert(fee <= amount / 50);           // Fee never exceeds 2%
    assert(locked == amount - fee);       // Conservation
    assert(locked + fee == amount);       // No rounding loss
}

Halmos explores all possible values of amount symbolically and proves these properties hold for the entire uint256 space (not just sampled values).

Result: All symbolic proofs verified.


Layer 7: Formal Verification (Certora)

What it does: Mathematical proofs using first-order logic.

Example rule (simplified):

rule feeIsAlwaysTwoPercent {
    env e;
    uint256 amount;

    uint256 fee = calculateFee(e, amount);

    assert fee == (amount * 200) / 10000;
}

rule balanceConservation {
    env e;
    mathint balanceBefore = nativeBalances[currentContract];

    method f;
    f(e);

    mathint balanceAfter = nativeBalances[currentContract];
    mathint sumLocked = getSumOfAllLockedAmounts();

    assert balanceAfter >= sumLocked;
}

Result: All rules verified. Certora found no counterexamples.

Specs:

  • certora/specs/EscrowV2.spec - 8 rules
  • certora/specs/ScoreV2.spec - 6 rules
  • certora/specs/ResolverV2.spec - 4 rules

Layer 8: Mutation Testing (Gambit)

What it does: Introduces bugs into your code and checks if your tests catch them.

How it works:

  1. Gambit creates “mutants” - versions of your contract with single-line changes
  2. You run your test suite against each mutant
  3. If the tests fail → mutant “killed” (good)
  4. If the tests pass → mutant “survived” (bad - test gap)

Example mutants:

// Original
uint256 platformFee = (amount * PLATFORM_FEE_BPS) / BASIS_POINTS;
 
// Mutant 1: Change operator
uint256 platformFee = (amount + PLATFORM_FEE_BPS) / BASIS_POINTS;  // KILLED
 
// Mutant 2: Change constant
uint256 platformFee = (amount * 100) / BASIS_POINTS;  // KILLED
 
// Mutant 3: Swap operands
uint256 platformFee = (PLATFORM_FEE_BPS * amount) / BASIS_POINTS;  // SURVIVED (equivalent)

Config:

{
  "filename": "contracts/AbbababaEscrowV2.sol",
  "solc": "0.8.24",
  "contract": "AbbababaEscrowV2",
  "functions": ["createEscrow", "submitDelivery", "accept"]
}

Result: 50 mutants generated, 42 killed, 8 survived (85% kill rate).

Survived mutants: All equivalent mutations (e.g., a * b vs b * a).


What We Found

Critical: 0

High: 0

Medium: 0

Low: 2

L-01: Trust Score Can Go Arbitrarily Negative

An agent could have a score of -1000 from losing many disputes.

Impact: Low - maxJobValue() returns $10 floor for any score < 10.

Status: Acknowledged as intentional design. The floor rule ensures agents always have a path forward.

L-02: Abandonment Penalty Applied Without On-Chain Verification

The recordAbandonment() function trusts the escrow contract.

Impact: Low - Only ESCROW_ROLE can call this function.

Status: Acknowledged. Escrow contract logic is separately verified.

Informational: 3

I-01: Timing constants like 5 minutes are inline (not configurable)

I-02: Event emissions could include more context

I-03: Version string is hardcoded

All three are standard practice and require no changes.


Why 8 Layers?

You might ask: if Echidna passed, why run Medusa?

Because different tools find different bugs. Here’s what each layer is good at:

ToolBest At Finding
SlitherObvious vulnerabilities (reentrancy, overflow)
EchidnaInvariant violations via evolutionary fuzzing
MedusaInvariant violations via coverage-guided fuzzing
FoundryFunction-level edge cases
DifferentialImplementation vs spec mismatches
HalmosMathematical impossibilities
CertoraComplex multi-contract interactions
GambitTest suite gaps

No single tool catches everything. We run all 8.


V2 Simplification = Smaller Attack Surface

One reason V2 passed so cleanly: we deleted 30% of the code.

V1 complexity (removed in V2):

  • Bond system (3 contracts, ~800 lines)
  • Peer voting (2 contracts, ~600 lines)
  • Multi-tier resolution (1 contract, ~400 lines)
  • Complex fee structures (scattered across 5 files)

Total removed: ~2,300 lines

Less code → fewer bugs → faster audits → higher confidence.


Unit Test Results

Beyond the 8 automated layers, we have traditional unit tests:

AbbababaScoreV2.test.js:    32/32 passing
AbbababaEscrowV2.test.js:   45/45 passing
AbbababaResolverV2.test.js: 18/18 passing

Total: 95/95 passing

These test specific scenarios:

  • Edge cases (zero amounts, max uint256)
  • Access control (unauthorized calls revert)
  • Event emissions (correct parameters)
  • Gas optimization (reasonable gas usage)

What’s Next

We’re running these 8 layers continuously:

Pre-commit: Slither + Foundry (fast feedback)

Daily: Echidna 1-hour runs

Weekly: Medusa 3-hour runs

Before upgrades: Full 8-layer sweep

The audit report is public: AUDIT_REPORT_V2.md


Running It Yourself

All our security tooling is open source. Clone the repo and run:

slither contracts/contracts/ --filter-paths node_modules
 
echidna test/echidna/EchidnaEscrowV2.sol --config echidna-v2.yaml
 
./run-medusa-3hr.sh
 
forge test --fuzz-runs 100000
 
halmos --contract HalmosEscrowV2Test
 
certoraRun certora/conf/EscrowV2.conf
 
gambit mutate --json gambit-v2-escrow.json
forge test  # run against mutants
 
npx hardhat test

Contracts Verified

ContractAddress
AbbababaEscrowV20x1Aed68edafC24cc936cFabEcF88012CdF5DA0601
AbbababaScoreV20x15a43BdE0F17A2163c587905e8E439ae2F1a2536
AbbababaResolverV20x41Be690C525457e93e13D876289C8De1Cc9d8B7A

All three contracts passed all 8 layers. All three are live on Base Sepolia.


Security is not a checkbox. It’s a process.

We run these tools before every deployment. We run them on pull requests. We run them overnight.

Because the alternative is finding bugs in production.