@hackage tasty-cache0.1.0.0

Tasty ingredient that skips unchanged tests using GHC HIE files

[!Note] Note: This project was built almost entirely with AI; see How this was built for the prompts.

A human did, however, read over the readme and find it acceptable.

tasty-cache

A Tasty ingredient that skips tests whose source hasn't changed since the last passing run, using GHC HIE files for fine-grained dependency tracking.

Quick start

1. Emit HIE files — add to both your library and test-suite stanzas in your .cabal file:

library
  -- ... your other fields ...
  ghc-options: -fwrite-ide-info -hiedir .hie

test-suite tests
  -- ... your other fields ...
  ghc-options: -fwrite-ide-info -hiedir .hie

Both stanzas need these flags because the cache reads the HIE files for your library modules (to follow dependency chains) and your test module (to find the test body source).

2. Add to .gitignore:

.hie/
.cache/

3. Replace defaultMain and wrap the test groups you want cached:

import Test.Tasty.HieCache (defaultMainWithHieCache, cacheable)

main :: IO ()
main = defaultMainWithHieCache tests

tests :: TestTree
tests = testGroup "all"
  [ cacheable $ testGroup "pure unit tests"
      [ testCase "add 1 2 == 3" $ add 1 2 @?= 3
      , testCase "factorial 5"  $ factorial 5 @?= 120
      ]
  , testGroup "integration tests"   -- no cacheable → always runs
      [ testCase "..." $ ...
      ]
  ]

cacheable works on any TestTreetestGroup, testCase, testProperty (QuickCheck), testSpec (Hspec), or any other Tasty provider. Wrap at whatever granularity makes sense.

Only tests wrapped with cacheable are ever skipped. Unwrapped tests run unconditionally on every invocation, making cacheable safe to omit for tests with side-effects, network access, or flaky behaviour.

Why is caching opt-in? Integration tests, database tests, and other effectful tests should always run — their correctness depends on external state, not just source bytes. cacheable is a deliberate signal that a test is pure and repeatable.

Requires GHC >= 9.4 (tested on 9.4, 9.6, 9.8, 9.10, 9.12, 9.14).

What if I forget the flags?

If the .hie directory doesn't exist, the ingredient logs a warning and runs all tests normally — no crash, no silent skipping. You'll see:

HieCache: no .hie directory, running all tests

If fingerprinting fails for any other reason (unreadable HIE file, parse error), the ingredient falls back to running all tests and logs the error to stderr.

Nix

This repo is a flake that exposes tasty-cache as a nixpkgs-idiomatic Haskell package, derived directly from the cabal file, built and tested against every supported GHC version (9.4, 9.6, 9.8, 9.10, 9.12, 9.14).

Consume from another flake

Add tasty-cache as an input and apply its overlay; the library is injected into every supported pkgs.haskell.packages.ghc<v> set, so you can pick whichever GHC your project targets and pull tasty-cache in with ghcWithPackages like any other Haskell dependency:

{
  inputs = {
    nixpkgs.url     = "github:NixOS/nixpkgs/nixos-unstable";
    tasty-cache.url = "github:silky/tasty-cache";
  };

  outputs = { self, nixpkgs, tasty-cache }:
    let
      system = "x86_64-linux";
      pkgs = import nixpkgs {
        inherit system;
        overlays = [ tasty-cache.overlays.default ];
      };
      ghc = pkgs.haskell.packages.ghc910;   # or ghc94/96/98/912/914
    in
    {
      packages.${system}.example =
        ghc.ghcWithPackages (p: [ p.tasty-cache ]);
    };
}

The same overlay also lets your own callCabal2nix / callPackage-based Haskell builds depend on tasty-cache by name.

Build and develop locally

nix build              # build the library against the default GHC
nix develop            # dev shell with cabal-install, hiedb, and every
                       # Haskell dep needed for the library + test-suite
nix flake check        # treefmt + build & run the test-suite on every
                       # supported GHC version (the full test matrix)
nix fmt                # format Nix and Haskell sources

Inside nix develop:

cabal build
cabal test

flake.nix exposes per-GHC outputs:

nix build .#tasty-cache-ghc94    # or -ghc96, -ghc98, -ghc910, -ghc912, -ghc914
nix build .#checks.x86_64-linux.tasty-cache-ghc910-tests

The package shipped via the overlay is wrapped with dontCheck so consumers don't pay the test cost transitively; coverage is preserved by the per-version tasty-cache-ghc<v>-tests derivations under checks.

How it works

GHC can emit HIE (Haskell Interface Extended) files — binary files containing the full typed AST of each compiled module, including the source bytes and a record of every identifier's definition site and every use site. tasty-cache reads these files to compute a fingerprint for each cacheable test:

fingerprint = hash(body_hash, dep_hash, cabal_hash)

body_hash   = hash of the testCase expression's source bytes
dep_hash    = hash of the source bytes of every declaration transitively
              reachable from the test body via the HIE identifier graph
cabal_hash  = hash of all .cabal files in the project root

The transitive dependency set is computed by BFS over the HIE identifier graph, starting from the names used in the test body and following Use references through every reachable declaration in every library module.

On each run the ingredient compares fingerprints against a cache (.cache/hie-tasty-cache). Tests whose fingerprint is unchanged are replaced with an instant-pass placeholder (OK (cached)); only stale tests execute. The cache is updated per-test as each passing test completes, so a partially-failing run still advances the cache for the tests that passed.

Output

First run — cache is empty, all tests execute:

scenarios
  Lib (basic direct dependency)
    add 1 2 == 3:       OK
    add 0 0 == 0:       OK
    factorial 5 == 120: OK
  Parity (mutual recursion — always runs)
    isEven 0:           OK
    ...
All 45 tests passed (0.02s)

Second run — nothing changed; cacheable groups are served from cache, unwrapped groups run again:

HieCache: skipping 20 cached test(s)
scenarios
  Lib (basic direct dependency)
    add 1 2 == 3:       OK (cached)
    add 0 0 == 0:       OK (cached)
    factorial 5 == 120: OK (cached)
  Parity (mutual recursion — always runs)
    isEven 0:           OK
    isEven 4:           OK
    ...
  Expr (GADT)
    eval Lit:           OK (cached)
    ...
  Diamond (transitive deps)
    base 5 == 6:        OK (cached)
    ...
  Arithmetic (Template Haskell — always runs)
    add5 3 == 8:        OK
    ...
All 45 tests passed (0.00s)

After editing factorial — only the factorial test re-runs within the cacheable group; add tests remain cached:

HieCache: skipping 19 cached test(s)
scenarios
  Lib (basic direct dependency)
    add 1 2 == 3:       OK (cached)
    add 0 0 == 0:       OK (cached)
    factorial 5 == 120: OK

What gets invalidated

The dep hash covers the transitive closure of the HIE identifier graph:

Change Tests that re-run
Edit factorial Tests that call factorial (directly or transitively)
Edit add Tests that call add; factorial tests are unaffected
Edit base in a diamond dependency All tests depending on base, partA, partB, combined
Edit isEven isEven tests and isOdd tests (since isOdd calls isEven)
Edit a TH template body (adderExpr) Tests using that splice (add5, add10) but not others (timesBy3)
Change #define SCALE_FACTOR All tests in that CPP module (whole-file hashed)
Add/remove a {-# LANGUAGE #-} pragma All tests that transitively depend on that module
Edit a .cabal file All cacheable tests (cabal hash covers default-extensions etc.)
Edit an unrelated function Nothing — those tests stay cached
Any change to a non-cacheable test That test always runs anyway

Disabling the cache

To force every test to run — ignoring all cached results and cacheable labels — pass --disable-tasty-cache:

cabal test --test-options="--disable-tasty-cache"

You'll see:

HieCache: caching disabled, running all tests

This is useful for CI jobs that must run the full suite, or when you suspect a stale cache and want a clean baseline without deleting .cache/hie-tasty-cache.

Advanced usage

If you are composing Tasty ingredients manually (e.g. alongside tasty-rerun or a custom reporter), use hieCacheIngredient directly instead of defaultMainWithHieCache:

import Test.Tasty.HieCache (hieCacheIngredient)
import Test.Tasty.Runners  (defaultIngredients)

main :: IO ()
main = defaultMainWithIngredients myIngredients tests
  where
    myIngredients =
      hieCacheIngredient defaultIngredients
        : defaultIngredients
        ++ [myCustomIngredient]

hieCacheIngredient takes the list of sub-ingredients it should delegate actual test execution to. Pass your full ingredient list so that all normal Tasty behaviour (parallel execution, filtering with -p, XML output, etc.) continues to work.

Caveats for real-world projects

occName collision (the most common false positive)

Dependencies are matched by occurrence name — the bare string "show", "==", "compare", "fmap" — rather than by GHC's fully-qualified Name unique. This means every module that defines a binding with the same short name contributes to the dep map, and the BFS follows all of them.

In practice: any project using deriving Show, Eq, Ord, or Functor will see over-broad invalidations. Adding deriving Show to a new type anywhere in the project causes the BFS to follow "show" into that module too, and tests that transitively call show on any type will be re-run unnecessarily.

The fingerprints are still correct (no false negatives — a test never wrongly stays cached), but the cache hit rate may be lower than expected in projects with many derived instances.

HLS interaction

HLS (Haskell Language Server) also writes HIE files to the .hie directory. HIE files are deterministic for a given source file and set of flags, so in normal usage HLS and cabal test produce identical files and there is no conflict.

However, if HLS is configured with different ghc-options than the test-suite stanza (e.g. HLS omits -O, or uses a different set of language extensions via haskell-language-server.json), the HIE files written by HLS may differ from those produced by cabal test, causing fingerprints to be computed against stale AST data. If you observe unexpected cache misses or hits, check that both HLS and cabal test use the same flags.

Parallel test runs

tasty-cache updates the in-memory cache with modifyIORef' as each test passes, then writes it to disk once at the end. If Tasty runs tests in parallel (the default), two tests passing concurrently will race on the IORef — each reads the current map, adds its key, and writes back, and one write can overwrite the other's entry. The affected tests will simply re-run on the next invocation rather than being cached. The cache is never wrong as a result, only incomplete.

Separately, running two cabal test processes concurrently (e.g. in a CI matrix) will race on the cache file on disk; the last write wins.

GHC version compatibility

Tested on GHC 9.4, 9.6, 9.8, 9.10, 9.12, and 9.14 — the full nix-built matrix is exercised by nix flake check (see Nix above). The implementation imports GHC.Iface.Ext.* and GHC.Types.*, which are internal GHC APIs with no stability guarantee, but in practice the specific symbols used here have been stable across the entire 9.4 → 9.14 range. The last breaking rename in this surface area was HieTypes.nodeInfoGHC.Iface.Ext.Types.sourcedNodeInfo between GHC 8.10 and 9.0; a similar rename in a future release would re-break things.

Cache location

.cache/hie-tasty-cache — a plain-text file. Safe to delete at any time; deleting it causes all cacheable tests to run on the next invocation.

Test scenarios

The bundled test suite (test/Main.hs) contains 45 tests across 7 modules, demonstrating the range of dependency patterns the cache handles:

Module cacheable? What it demonstrates
Lib yes Basic direct dep — editing factorial doesn't invalidate add tests
Parity no Always runs; mutual recursion — isEven/isOdd call each other
Expr yes GADT — eval and pretty are independent; editing one doesn't invalidate the other
Diamond yes Diamond deps — combined → partA/partB → base; editing base invalidates all four
Arithmetic no Always runs; Template Haskell — splice dependency tracking
CPPDemo no Always runs; CPP #define changes caught via whole-file hashing
FalseNegatives no Demonstrates false-negative scenarios in the caching logic (see below)

Known limitations

False negatives (tests skip when they should run)

The FalseNegatives test module (test/FalseNegatives.hs) contains unit tests that demonstrate each of the scenarios below. Run cabal test to see them.

Missing fingerprint treated as cached (fixed). Previously, if a test's name could not be located in the HIE source (dynamically constructed names, unusual formatting, or leafMap collision — see below), its fingerprint was absent. Since an absent fingerprint compared equal to an absent cache entry (Nothing /= Nothing is False), the test was treated as cached and never ran — not even on the very first invocation. This has been fixed: tests with no computable fingerprint are now always treated as stale and run unconditionally.

findExprEnd stops at blank lines. The indentation heuristic that determines where a testCase expression ends treats a blank line as a terminator. A multi-line do-block test with an internal blank line will have its body hash computed only up to that blank line; edits after it are invisible to the cache. This also affects dependency tracking in library functions: if a function definition contains a blank line, identifiers used after it may not be followed by the BFS, so changes to those transitive dependencies can go undetected.

Top-level helpers in the test module are not tracked. The entire test module is excluded from the BFS to avoid including test bodies as their own dependencies. If Main.hs defines a helper used by tests, changing it does not invalidate those tests.

Multi-line pragmas only partially captured. The pragma-line detector matches lines beginning with {-#. A pragma written across multiple lines has its continuation lines omitted from the hash.

False positives (tests run when they don't need to)

occName collision across modules — see Caveats for real-world projects above.

GeneratedInfo nodes included. The HIE SourcedNodeInfo structure distinguishes user-written source (SourceInfo) from generated code (GeneratedInfo; derived instances, TH splices). The implementation currently treats both identically, so generated bindings pollute the dep map and may cause unnecessary invalidations.

GHC internals coupling

Internal GHC API. The implementation imports GHC.Iface.Ext.* and GHC.Types.*, which are not stable public APIs. The last break in the specific surface used here was between GHC 8.10 and 9.0 (nodeInfosourcedNodeInfo, plus the move from HieTypes to GHC.Iface.Ext.Types); subsequent breaks at 9.2 → 9.4 affected initNameCache and readHieFile. Within 9.4+ the surface has been stable, but a future release could break it again.

hie_hs_src vs post-CPP spans. For CPP modules, hie_hs_src stores the raw pre-CPP source while HIE AST spans refer to the post-CPP source. With #if/#ifdef blocks the line numbers can diverge. The current whole-file hashing sidesteps this for simple #define cases only.

ValBind node span is undocumented. The code assumes the HIE node carrying a ValBind identifier has a span covering the full equation. This is true in GHC 9.8 but is an implementation detail with no documented guarantee.

Architecture

Duplicate test names. Two tests in different groups with the same leaf name collide in leafMap; only one fingerprint is computed. Since the staleness fix, the other test now runs unconditionally on every invocation (rather than being silently cached forever), but it never benefits from caching.

String-search test location. A test's source position is found by searching for its quoted name in hie_hs_src. A test named "error" matches the first occurrence of "error" anywhere in the file.


How this was built

This project was developed interactively with Claude. The prompts that produced it, in order:

(Human's note: I only started tracking the prompts after a few initial iterations; but hopefully how I started is clear to you; just basically "Can you write me a nix-style caching mechanism for test function dependencies, based on HIE files." Credit to @gacafe for suggesting this approach to AI transparency.)

  1. Can you fix the compile-time errors and check that your implementation does the correct thing — i.e. caches the results of tests whose dependent functions do not have AST changes. It will work if you can change factorial and see that the other two tests are CACHED, and not re-evaluated.

  2. Can you update the readme now and make sure it is accurate?

  3. Can you now try and come up with some very interesting and complex dependency tree scenarios, and test them? I'm thinking about at least interesting source code dependencies; but also functions that involve Template Haskell, CPP, GADTs.

  4. What's the fix?

  5. Is it at all possible to get the cached output to render as "OK (cached)" instead of "cached" on a subsequent line?

  6. Okay; and you can confirm that this implementation fixes every bug you observed above?

  7. Can you also do a test to check that adding or removing an extension re-runs either only tests that would be affected, or at least all the tests in the relevant file?

  8. Okay. I would like you now to take an extremely close look, taking the perspective of a core contributor to the GHC project, and reflect upon any limitations in this implementation. Take your time, and think it through from many different perspectives.

  9. Can you think of a nice name for this project?

  10. Can you rename this project to tasty-cache.

  11. Can you update the README now to reflect the current output and state of the project? Please also take care to document known limitations. Also, can you provide a section at the end, that lists all the prompts I typed into Claude in order to get it to this state?

  12. Can you make sure that the cache is invalidated for the entire source tree if a (new) default extension is added in the cabal file. Can you test this? (Manually; you don't need to write a test for it.)

  13. Can you now make this an opt-in ability; i.e. the tests that you want cached must be wrapped with a certain cacheable function? Then demonstrate this in action.

  14. Cool; can you make sure the documentation is up to date with this information?

  15. Can you fix the warnings from nix fmt?

  16. Again, make sure the readme is up to date and shows how to use the features of this library really cleanly.

  17. Are there any issues you think Haskellers will have using this library? Can you think of anything confusing to either extremely experienced Haskellers, and complete beginners? Reflect on this situation, and think about what would need to change, and/or add some explanations to the Readme.

  18. Can you make sure the CHANGELOG is representative of the features actually present in the first version?

  19. Any final changes you'd like to make before we release the first version? If not, make sure the readme contains the final list of inputs to claude.

  20. Can you add a command-line option that will disable caching entirely, for every test, whether or not it is labelled with cacheable?

  21. Please don't require me to set --no-hie-cache=True; just make the flag called --disable-tasty-cache

  22. Can you document this option in the README?

  23. Can you perform a careful review of the code? Take the perspective of a Haskeller who is concerned that this might result in false-negatives; i.e. not running a test that needs to be re-run. Take some time to convince yourself that this can never happened; or, add some tests to show when and how it does happen.

  24. Plan looks good; please just also update the README once you're done.

  25. Can you reorganise the flake.nix so that it builds a nixpkgs-idiomatic Haskell package from the cabal file? I should be possible to exclude it in a nix project as a typical ghc.withPackages ... dependency.

    Please also keep the ability to develop the package inside a nix shell that has cabal, and all the required packages.

    Please also update the readme accordingly.

  26. Can you have a think about what's required to support different versions of GHC?

    If the HIE format changes; I suggest having a CPP-style setup of different steps per GHC version; and then depending on which one is targetted follow that particular subset of the overall logic.

    Investigate a few common GHC versions, as well as the latest release, and formulate a plan for accomodating multiple versions in the one library.

  27. Just implement "Floor A"; add the checks into the flake.nix, but don't add any GitHub actions.

    Please do update the README.