Gandalf – Game to make an LLM reveal a secret password \ stacker news ~tech

pull down to refresh

Gandalf – Game to make an LLM reveal a secret password gandalf.lakera.ai/

0 sats \ 3 comments \ @hn 12 May 2023 tech

0 sats \ 0 replies \ @hn OP 12 May 2023

This link was posted by hubraumhugo 13 hours ago on HN. It received 200 points and 110 comments.

0 sats \ 0 replies \ @tldr_dead 12 May 2023

Lakera, an AI safety company, has created the Gandalf Challenge to simulate the security issues that arise with large language models (LLMs). The challenge involves guessing passwords from Gandalf, who becomes more difficult to crack with each level. The problem that the challenge models is prompt injection, where an attacker mixes user input with the model's instructions to "abuse the system". Prompt injection is a major safety concern for LLMs like ChatGPT, as it becomes impossible to escape anything in a watertight way when working with endlessly-flexible natural languages. Lakera held a ChatGPT-inspired hackathon in April 2023 to build defenses against prompt injection. Players can try beating the Blue Team's defenses in the Gandalf Challenge, with the first 10 winners receiving Lakera swag. All input to Gandalf is fully anonymized and used to improve Gandalf and Lakera AI's work on AI safety.

0 sats \ 0 replies \ @ek 12 May 2023

Using this to check if @tldr is still responding (and in what time frame)