With the ongoing rapid adoption of Artificial Intelligence (AI) based systems in high-stakes domains such as financial services, healthcare and life sciences, hiring and human resources, education, societal infrastructure, and national security, ensuring the trustworthiness, safety, and observability of these systems has become a crucial necessity. Hence, it is essential to ensure that the AI systems are evaluated and monitored for not only accuracy and quality related metrics, but also for robustness against adversarial attacks, robustness under distribution shift, bias and discrimination against underrepresented groups, security and privacy protection, interpretability, and other responsible AI related dimensions. Our focus is on large language models (LLMs) and other generative AI models and applications, for which there are additional challenges such as hallucinations (and other ungrounded or low-quality outputs), harmful content (such as sexual, racist, and hateful responses), jailbreaks on safety and alignment mechanisms, prompt injection attacks, misinformation and disinformation, fake, misleading, and manipulative content, and copyright infringement.
In this tutorial, we first highlight key harms associated with generative AI systems with a focus on ungrounded answers (hallucinations), jailbreaks and prompt injection attacks, harmful content, and copyright infringement. Then, we discuss how to effectively address potential risks and challenges, following the framework of identification, measurement, mitigation (with four mitigation layers at model, safety system, application, and positioning), and operationalization. We present real-world LLM use cases, practical challenges, best practices, lessons learned from deploying solution approaches in industry, and key open problems. Our goal is to stimulate further research on grounding and evaluation of LLMs, and enable researchers and practitioners to build more robust and trustworthy LLM applications.