a public good project by the
Synthesis
Company
of California

© 2026

Daniel Paleka | doi.page

0 works0 citations0 h-index

Google Scholar OpenAlex

Daniel Paleka

Publications by Year

Research Areas

Topic Modeling, Natural Language Processing Techniques, Adversarial Robustness in Machine Learning, Spam and Phishing Detection, Explainable Artificial Intelligence (XAI)

Most-Cited Works

→ Poisoning Web-Scale Training Datasets is Practical(2024)78 cited
→ Red-Teaming the Stable Diffusion Safety Filter(2022)25 cited
→ Foundational Challenges in Assuring Alignment and Safety of Large Language Models(2024)14 cited
→ ARB: Advanced Reasoning Benchmark for Large Language Models(2023)14 cited
→ Refusal in Language Models Is Mediated by a Single Direction(2024)9 cited
→ Evaluating Superhuman Models with Consistency Checks(2024)8 cited
→ Stealing Part of a Production Language Model(2024)7 cited