Trail of Bits Blog
Follow
Trailmark turns code into graphs
Trailmark is a new open-source library that transforms source code into a queryable call graph. This graph represents functions, classes, and their relationships, along with semantic metadata. Claude skills can directly interact with this graph through a Python API. Traditional security analysis often relies on lists of findings, but attackers think in graphs, creating a disadvantage for defenders. Trailmark aims to provide AI models like Claude with this graph-based reasoning capability.
Mutation testing, a method to assess test quality by making small code changes, generates many surviving mutants. A flat list of these mutants doesn't distinguish between equivalent, dead code, or genuinely significant ones. Trailmark enables Claude to triage these mutants based on security relevance, such as reachability from untrusted input. The library processes code in three phases: parsing with tree-sitter for ASTs, indexing into a high-performance graph, and querying for information like callers, callees, and attack surfaces.
Trailmark supports seventeen programming languages and offers eight pre-built Claude Code skills. These skills assist in tasks like mutation triage, test vector generation, and protocol diagramming. For instance, the "genotoxic" skill uses graph analysis to classify surviving mutants. Similarly, "vector-forge" generates test vectors to close identified coverage gaps. Trailmark also integrates findings from static analyzers and audit tools, mapping them onto the code graph.
Internal use on cryptographic libraries revealed that equivalent mutants often constitute the majority in well-tested code, a detail missed by flat lists. Graph analysis also highlighted architectural bottlenecks, such as a single permutation primitive in libhydrogen that impacts all cryptographic operations. Mutation testing proves valuable for novel constructions lacking standardized test vectors, by identifying where tests fail to constrain code behavior. Across various codebases, common patterns emerged: arithmetic modules have high blast radii, codec parsers are prime fuzzing targets, and property-based testing is often sparse. Ultimately, Trailmark serves as a connective tissue, linking different analysis tools and enabling more targeted security assessments.