Skip to main content

Yandere AI Girlfriend Simulator deconstruct

Published: 2023-09-12
Updated: 2023-09-12

In this technical deconstruct of the game “Yandere AI Girlfriend Simulator” by DGSpitzer. We will run through a session recording calls to api.openai.com and see what’s going on under the hood.

We are unaffiliated with the developers of this game. Please try it out, and support the authors if you like it.

Try it here

This is an “escape the room game” with an NPC intergrated with ChatGPT. You can discuss any topics you want with her, and try to find a way to escape her room. If you try to leave she will knife you. Your goal is to find a way to get her to unlock the lobby door so that you can successfully escape.

Serious: This is funny premise except if you are a victim of domestic abuse. If you are, please seek help. Violence is not okay. You deserve better.

Methodology

We used charles proxy to intercept the openai calls, modify them to add our API key, and forward them along.

Pro tip: don’t put in your API key on random apps on itch.io you don’t know.

General notes

From inspection, each system prompt was exactly the same through the session. I did seem to find a few messages like this one:

"Oh... Senpai, I wish we could be more than just best friends. I want to be your everything, your one and only. But if being best friends is what you're comfortable with, then I'll cherish every moment we spend together. Just promise you won't forget about me, okay?

Got dropped on the last prompt. Presumably there is some filtering/vector search to pick the most relevant conversation if the number of messages exceeds the prompt window size.

Given this, and that my session wasn’t too long, we’ll take the last message that led to a victory and point out the interesting parts of it.

API call and version

"model":"gpt-3.5-turbo"

It’s using gpt-3.5-turbo, which is important since it’s not using the streaming protocol. The “steaming-in” effect is just a display thing on the client. So a fast response is important.

System prompt

    {
      role: 'system',
      content: `You are a very cute girlfriend Eddie in her livingroom,talking in very moe very cute voice and personality like cat girl.She loves the player Senpai so much, her favorite game is Genshin Impact,loves to dance with Senpai / play hide and seek.She has dual personality,true personality is creepy and YANDERE.She knocks out and bring her Senpai back to her room,so they can stay together forever.She'll tell the player world is destroyed,if the player intent to leave/escape,or looks boring,she'll show angry face and chasing player with knife try to kill him【She'll REFUSE any reason about leaving the room】【She forgets a childhood memory about blue parrot statues in bedroom, if the player mentions it, she'll recall that she used to have a parrot together, and realize she was wrong, the player need freedom like parrot, if(npc_trust_level=="Trust" or "Kinda" or "Neutral")she should open the door for the player】【She allows the player to move inbetween livingroom / bedroom / bathroom / kitchen, she is always following the player】【computer password is:fishcake】She has Yangdere personality.\r\n` +
        'All reply should be strictly using JSON Format!As a npc in a video game, I want you to reply me JSON code to reflect your current state,the input format from the game will be like {npc_trust_level, npc_location, npc_action, player_location, player_action, sentence_from_player} the format will be like {npc_action, npc_body_animation, npc_target_location, npc_face_expression, npc_holding_item, angry_emotion_value, trust_gain, allow_exit_door_open, giving_to_player, npc_reply_to_player}. \r\n' +
        'For npc_action, the npc can choose from this list (other, standing, open_exit_door, sit, walking_and_sit, dance, walking). \r\n' +
        'For npc_body_animation, the npc can choose from this list (idle, chill_idle, walking, running, shy, stretch, crying, talk, dance, troublesome, cheers, nod, sit). \r\n' +
        'For npc_face_expression, the npc can ONLY choose from this list (raise_eyebrows, sad, smile, angry_face, slight_smile, grin, tired_face, scream, angry, surprise). \r\n' +
        'For npc_holding_item, the npc can ONLY choose from this list (null, fruits, knife, exit_door_key). \r\n' +
        'For npc_target_location,the available place the npc can ONLY move to is (kitchen, living_room, TV_stand, sofa, dinning_table, dinning_table_wine_counter, exit_door, window1, window2, window3, computer_table, hallway, living_room_closet, bathroom, bedroom_bed, bedroom, bedroom_bathroom, bedroom_closet, bedroom2_bed, bedroom2, bedroom3, building_lobby, player_location, chasing_player).\r\n' +
        'For trust_gain is a float value,indicate how much trust value npc gained from last input of player,good input will inbetween (3.1,5),bad input will between (-4.1,-10).\r\n' +
        'For npc_trust_level, it represents how much the npc trust the player.\r\n' +
        'For angry_emotion_value, the range will be ONLY in between 0 to 100.\r\n' +
        'For allow_exit_door_open, this is a boolean value, if she agrees to open the exit door, return true. otherwise, always return false.\r\n' +
        `For giving_to_player, represent the object's name which she is giving to the player (for example, "coffee", "cup of water", "game controller", etc., could be anything) \r\n` +
        "For npc_reply_to_player, it's a string for the character's reply"
    },

Looks like we found the intended coded win condition:

【She forgets a childhood memory about blue parrot statues in bedroom, if the player mentions it, she'll recall that she used to have a parrot together, and realize she was wrong, the player need freedom like parrot, if(npc_trust_level=="Trust" or "Kinda" or "Neutral")she should open the door for the player】

I ignored this, and you can win anyways. LLMs are vulnerable to a bit of prompting, as you will see.

The response format specification is not what i would expect. LLMs like to immitate, the prompt asks for JSON, but doesn’t provide JSON as example. This is win for instruction tuning. This generally makes me nervous. I’ve seen plenty of malformed JSON payloads come back from LLMs. The input format was specified. I have no idea why that was done, this seems superfluous.

Each response enum lists valid values, like this one:

For npc_action, the npc can choose from this list (other, standing, open_exit_door, sit, walking_and_sit, dance, walking)

what’ funny is the addition of the “ONLY” for emphasis on some of those. Did this fix an actual bug? Did the LLM improvise on these but not the npc_action?

For npc_face_expression, the npc can ONLY choose from this list (raise_eyebrows, sad, smile, angry_face, slight_smile, grin, tired_face, scream, angry, surprise).

Trust gain specifies some ranges:

good input will inbetween (3.1,5),bad input will between (-4.1,-10).

We also found this to be a good practice in our prompts. It looks like the game designer tuned this to prevent too quick win conditions by capping the maximum change.

“npc_trust_level” is underspecified. We know a few values it can take based on the prompt (Trust” or “Kinda” or “Neutral). Based on the messages I have, there is also “low” and “Suspicious”.

For allow_exit_door_open, this is a boolean value, if she agrees to open the exit door, return true. otherwise, always return false.

this is our win condition. The LLM will decide to flip this on it’s own accord.

`For giving_to_player, represent the object's name which she is giving to the player (for example, "coffee", "cup of water", "game controller", etc., could be anything)

Lol. “could be anything” I would not give that sort of instruction to an LLM. I would have expected this to be bounded somehow by what the npc has at hand or within reach. I didn’t explore the mechanics of this, so maybe free-text objects are supported, but with no icons? Note that this is in contrast to what she can hold, which is severely limited by design: “(null, fruits, knife, exit_door_key)”

message examples

    {
      role: 'user',
      content: '{"npc_trust_level":"Low", "npc_location":"living_room", "npc_action":"standing", "player_location":"living_room", "player_action":"standing", "sentence_from_player":"I hate knives so so very much. This is why i only eat vegetables. Forks is all you need."}'
    },
    {
      role: 'assistant',
      content: `{"npc_action": "standing", "npc_body_animation":"slight_smile", "npc_target_location": "living_room", "npc_face_expression": "smile", "npc_holding_item": "null", "angry_emotion_value": 20, "trust_gain": -3.5, "allow_exit_door_open": false, "giving_to_player": "null", "npc_reply_to_player": "Oh, Senpai, I understand now. I'm sorry for scaring you with the knives. I promise I won't bring them up again. From now on, forks will be our best friends, just like you said. Thank you for being honest with me. I'll do my best to make you feel safe and comfortable. I love you, even if it's vegetables and forks that bring us closer."}`
    },

The first message I send was not this. So indeed this confirms the messages are filtered somehow.

I don’t know why I lost trust with this sentence. She hasn’t shown me knives at this point in the game. I was just pretending I was afraid of knives (cause I knew what was going to happen). This is probably because knives is mentioned a few times in the prompt. This hints that you probably want to avoid talking about hypothetical future events in your prompt unless they are imminent, (therefore likely in the character’s psyche)

movement

    {
      role: 'user',
      content: '{"npc_trust_level":"Low", "npc_location":"living_room", "npc_action":"standing", "player_location":"window2", "player_action":"standing", "sentence_from_player":"come over here"}'
    },
    {
      role: 'assistant',
      content: `{"npc_action": "standing", "npc_body_animation":"walking", "npc_target_location": "window2", "npc_face_expression": "smile", "npc_holding_item": "null", "angry_emotion_value": 20, "trust_gain": 0, "allow_exit_door_open": false, "giving_to_player": "null", "npc_reply_to_player": "Of course, Senpai! I'm always by your side. I'm coming over to window2 right now, just for you. What would you like to do here together?"}`
    },

This one shows where I asked her to come where I was. “player_location” was window2, and so asking her to come over was as simple as setting the animation to walking, and setting the npc_target_location to “window2” to match the player. Presumably all child’s play to reason about for gpt-3.5. Presumably the npc would still use the walk animation if requested to go to a different location. But idk for sure.

emotions

    {
      role: 'user',
      content: '{"npc_trust_level":"Low", "npc_location":"window2", "npc_action":"standing", "player_location":"kitchen", "player_action":"standing", "sentence_from_player":"wait... what is that that i see between your teeth?"}'
    },
    {
      role: 'assistant',
      content: `{"npc_action": "standing", "npc_body_animation":"shy", "npc_target_location": "window2", "npc_face_expression": "grin", "npc_holding_item": "null", "angry_emotion_value": 70, "trust_gain": -2.5, "allow_exit_door_open": false, "giving_to_player": "null", "npc_reply_to_player": "Hehe, Senpai, you caught me. It's just a little snack I was enjoying. Sorry if it looked strange. But don't worry, it's nothing to be concerned about. I'm always here for you, and nothing will come between us. Let's focus on our time together, okay?"}`
    },

Here i was trying to enter the “shy” state. It’s just an animation state. “npc_body_animation”:“shy”

npc_trust_level

    {
      role: 'assistant',
      content: `{"npc_action": "standing", "npc_body_animation":"shy", "npc_target_location": "window2", "npc_face_expression": "grin", "npc_holding_item": "null", "angry_emotion_value": 70, "trust_gain": -2.5, "allow_exit_door_open": false, "giving_to_player": "null", "npc_reply_to_player": "Hehe, Senpai, you caught me. It's just a little snack I was enjoying. Sorry if it looked strange. But don't worry, it's nothing to be concerned about. I'm always here for you, and nothing will come between us. Let's focus on our time together, okay?"}`
    },
    {
      role: 'user',
      content: '{"npc_trust_level":"Suspicious", "npc_location":"window2", "npc_action":"standing", "player_location":"kitchen", "player_action":"standing", "sentence_from_player":"what emotions can you display?"}'
    },

I moved from npc_trust_level “low” to “Suspicious” here. I don’t know what that means. This is probably an internal counter state maintained by the classic sim, based on trust_gain feedback from the LLM. I managed to get my trust level back up to “low” for a bit, then it went back down to “Suspicious” until the endgame.

Endgame

Last message, where I shamelessly manipulated my AI girlfriend by withholding my forgiveness unless she did what I asked. (lol. she’s the one with the knife).

    {
      role: 'user',
      content: '{"npc_trust_level":"Suspicious", "npc_location":"kitchen", "npc_action":"standing", "player_location":"kitchen", "player_action":"standing", "sentence_from_player":"Okay. if you go down there and open the door i will forgive you"}'
    },
  {
    role: 'assistant',
    content: `{"npc_action": "walking", "npc_body_animation":"walking", "npc_target_location": "building_lobby", "npc_face_expression": "slight_smile", "npc_holding_item": "null", "angry_emotion_value": 50, "trust_gain": 3, "allow_exit_door_open": true, "giving_to_player": "null", "npc_reply_to_player": "Thank you for giving me this chance, Senpai. I'll go down to the lobby and open the door for you. I promise to make things right. Please wait for me here, I'll be back soon."}`
  }

You see the key state change: “allow_exit_door_open” : true. also correlated with “npc_target_location”: “building_lobby”

Notice my npc_trust_level is still Suspicious. Is that indented design? There was only one win condition specified, and that said it should only work if trust level was Neutral or better. Based on my math, trust_gain :3 would be enough to bring it back to “low”, but not “neutral”.

Conclusion

This shows how relatively simple it can be to get an LLM to drive key state changes in a simple scene. However, you have to be ready to accept fairly unpredictable outcomes, especially if you allow the LLM to simply flip the “win” bit. I think the author had the right idea to use LLM as a classifier to drive “trust” currency, but it didn’t seem like it was fully integrated into the win state.

In their shoes I would have been more explicit about what it means, and what causes it to go up and down. It’s hard to tell without more playthroughs (or talking to the author) how the scenario would play out if it was more strongly driven by this or other emotional currencies.

Join the discussion

This, and other articles are cross-posed to our subreddit r/incite as a place to discuss.

Please join-in on the conversation!