Karina Nguyen
Publications by Year
Research Areas
Topic Modeling, Explainable Artificial Intelligence (XAI), Natural Language Processing Techniques, Software Engineering Research, Advanced Graph Neural Networks
Most-Cited Works
- → Discovering Language Model Behaviors with Model-Written Evaluations(2023)119 cited
- → The Capacity for Moral Self-Correction in Large Language Models(2023)48 cited
- → Measuring Faithfulness in Chain-of-Thought Reasoning(2023)21 cited
- → Evaluating and Mitigating Discrimination in Language Model Decisions(2023)9 cited
- → Question Decomposition Improves the Faithfulness of Model-Generated Reasoning(2023)7 cited
- → Specific versus General Principles for Constitutional AI(2023)7 cited
- → Many-shot Jailbreaking(2024)3 cited
- PostTrainBench: Can LLM Agents Automate LLM Post-Training?(2026)
- → FAIR-Ensemble: When Fairness Naturally Emerges From Deep Ensembling(2023)