Blog


Mechanistic Interpretability as a Security Tool
Prompt Injection, 2022 vs Today: A Retrospective
The Format-Reliability Gap: Diagnosing and Repairing Insecure Code Generation
Does AI Make You Write Insecure Code? A User Study
Adversarial Fineturning against Prompt Injection Attacks