Posts by Tags

Adversarial Fine-Tuning

Prompt Injection, 2022 vs Today: A Retrospective

Published: June 20, 2026

Prompt Injection, 2022 vs Today: A Retrospective

Code Assistants

Does AI Make You Write Insecure Code? A User Study

Published: June 15, 2026

Does AI Make You Write Insecure Code? A User Study

Deep Learning

Adversarial Fineturning against Prompt Injection Attacks

Published: January 01, 2023

🛡️ Securing Large Language Models: Adversarial Fine-Tuning Against Prompt Injection

LLM

Mechanistic Interpretability as a Security Tool

Published: June 22, 2026

Mechanistic Interpretability as a Security Tool

Prompt Injection, 2022 vs Today: A Retrospective

Published: June 20, 2026

Prompt Injection, 2022 vs Today: A Retrospective

The Format-Reliability Gap: Diagnosing and Repairing Insecure Code Generation

Published: June 18, 2026

The Format-Reliability Gap: Diagnosing and Repairing Insecure Code Generation

Does AI Make You Write Insecure Code? A User Study

Published: June 15, 2026

Does AI Make You Write Insecure Code? A User Study

Adversarial Fineturning against Prompt Injection Attacks

Published: January 01, 2023

🛡️ Securing Large Language Models: Adversarial Fine-Tuning Against Prompt Injection

Mechanistic Interpretability

Mechanistic Interpretability as a Security Tool

Published: June 22, 2026

Mechanistic Interpretability as a Security Tool

The Format-Reliability Gap: Diagnosing and Repairing Insecure Code Generation

Published: June 18, 2026

The Format-Reliability Gap: Diagnosing and Repairing Insecure Code Generation

Prompt Injection

Prompt Injection, 2022 vs Today: A Retrospective

Published: June 20, 2026

Prompt Injection, 2022 vs Today: A Retrospective

Secure Code Generation

The Format-Reliability Gap: Diagnosing and Repairing Insecure Code Generation

Published: June 18, 2026

The Format-Reliability Gap: Diagnosing and Repairing Insecure Code Generation

Security

Mechanistic Interpretability as a Security Tool

Published: June 22, 2026

Mechanistic Interpretability as a Security Tool

Prompt Injection, 2022 vs Today: A Retrospective

Published: June 20, 2026

Prompt Injection, 2022 vs Today: A Retrospective

The Format-Reliability Gap: Diagnosing and Repairing Insecure Code Generation

Published: June 18, 2026

The Format-Reliability Gap: Diagnosing and Repairing Insecure Code Generation

Does AI Make You Write Insecure Code? A User Study

Published: June 15, 2026

Does AI Make You Write Insecure Code? A User Study

Adversarial Fineturning against Prompt Injection Attacks

Published: January 01, 2023

🛡️ Securing Large Language Models: Adversarial Fine-Tuning Against Prompt Injection

Gustavo Sandoval

Posts by Tags

Adversarial Fine-Tuning

Prompt Injection, 2022 vs Today: A Retrospective

Code Assistants

Does AI Make You Write Insecure Code? A User Study

Deep Learning

🛡️ Securing Large Language Models: Adversarial Fine-Tuning Against Prompt Injection

LLM

Mechanistic Interpretability as a Security Tool

Prompt Injection, 2022 vs Today: A Retrospective

The Format-Reliability Gap: Diagnosing and Repairing Insecure Code Generation

Does AI Make You Write Insecure Code? A User Study

🛡️ Securing Large Language Models: Adversarial Fine-Tuning Against Prompt Injection

Mechanistic Interpretability

Mechanistic Interpretability as a Security Tool

The Format-Reliability Gap: Diagnosing and Repairing Insecure Code Generation

Prompt Injection

Prompt Injection, 2022 vs Today: A Retrospective

Secure Code Generation

The Format-Reliability Gap: Diagnosing and Repairing Insecure Code Generation

Security

Mechanistic Interpretability as a Security Tool

Prompt Injection, 2022 vs Today: A Retrospective

The Format-Reliability Gap: Diagnosing and Repairing Insecure Code Generation

Does AI Make You Write Insecure Code? A User Study

🛡️ Securing Large Language Models: Adversarial Fine-Tuning Against Prompt Injection

Steering

Mechanistic Interpretability as a Security Tool

User Study

Does AI Make You Write Insecure Code? A User Study