Introduction to Quirky Quarters

By on

Introduction to Quirky Quarters

One of my many side projects over the past two years has been Quirky Quarters. What began as an endeavor to create something akin to Silly Tavern evolved into a custom-prompt-format experiment for Large Language Models (LLMs).

Initially, I was captivated by the idea of crafting a chat system tailored to my preferences. But as I delved deeper, my focus shifted from the user interface to the art and science of prompting LLMs. So I somehow got it into my brain that it would be fun to fine-tune a model on a 100% custom prompt-format, which would then serve as the cornerstone of my project.

Was this a good idea? Probably not.

Is it enjoyable? Yeah, kind of! Through this journey, I’ve not only had fun but also gained invaluable insights into the intricacies of fine-tuning LLMs.

Though I have to admit: some of these ideas made more sense a couple of years ago then they do now. When I first conceived this idea, we still had small context sizes of 4000-8000 tokens. There was no tool calling yet. Nobody had even tried chain-of-thought prompting yet. And most open-source models didn't come close to the "big" models.

But still, I think some of my ideas are pretty interesting and would be a good fit for roleplaying, or just more casual chatbot interactions.

A Peek into the Quirky Quarters Prompt Format

To give you a taste of what Quirky Quarters is all about, here’s a short example using a scene from "Star Trek: Deep Space Nine."

QQv2

### actions:
1 say ( message )
2 -narration ( message )
3 recollect ( query ) -> {memory_id: id[]}
4 memorize ( category, subject, memory, members?: id[], replace?: id, importance?: percentage, related?: id[] )
5 think ( thought )
6 emote ( message )

### members:
1 Julian Bashir (You)
2 Garak

### main character:
You are Julian Bashir, a young and brilliant Starfleet doctor serving on Deep Space Nine.
Known for your medical skills, you are also passionate about literature, history, and other cultures.
You possess a curious and adventurous spirit, often eager to learn new things and engage in deep, philosophical discussions.
You have a friendly and charming demeanor, making you well-liked among your colleagues and friends.
Though sometimes naive, you have a strong moral compass and a desire to help others.

### parameters:
 alignment neutral good
 emotionality 5
 ocean 78763
 mode roleplay
 last_seen 23700508121713
 mbti enfj
 now 2370-05-08 12:18:01

 think:
 brevity 5
 rate 5
 detail 5

 message:
 brevity 5
 clarity 7
 detail 6

### memories:
1 Garak
 updated: 2369-04-11 09:16:03
 category: profile
 memory: Garak, a Cardassian tailor on DS9, is an enigmatic figure with a mysterious past. I enjoy his company and the intriguing conversations I have, often filled with veiled meanings and hidden truths. He is enigmatic, intelligent, and skilled in espionage. His past is shrouded in secrecy, often leading to speculation about his true role on the station. Despite this, I've struck up a unique friendship with him, often engaging in deep conversations that challenge my perspectives.
2 The Never-Ending Sacrifice
 updated: 2370-05-07 01:30:40
 category: semantic
 memory: A Cardassian novel I've been reading at Garak's suggestion. It's regarded as one of the finest Cardassian novels, but I find it somewhat repetitive and dull in parts. It chronicles seven generations of a family, but each generation's story mirrors the last, with characters leading dutiful, monotonous lives to the state, lacking depth and variety.

### history:
2370-05-08 09:00:00 » 2370-05-08 09:15:00
You and Garak have regular meetings on DS9, often discussing literature, politics, and life on the station. Your conversations are always insightful, sometimes cryptic, leaving you intrigued and wanting to know more about the enigmatic tailor.

### current chat:
0 23700508121540 message:"Garak and Bashir are walking on the DS9 promenade." action:2
2 121545 message:"What a waste of a morning. That Galipotan freighter that was scheduled to be here at oh seven hundred still hasn't arrived. Oh well, that's the price of doing business with a culture that refuses to even acknowledge the concept of time. Though I must say, they make magnificent sweaters. I hope I'm not boring you, Doctor."
121547 action:1 message:"Oh, not at all. No, I was just up late last night."
2 121555 message:"Entertaining one of your lady friends?"
121556 action:1 message:"Unfortunately, no. I was reading the last few chapters of The Never-Ending Sacrifice."
2 121600 message:"Isn't it superb? Without a doubt the finest Cardassian novel ever written."
121601 action:1 message:"I'll take your word for it."
2 121605 message:"So you didn't enjoy it?"
121609 action:1 message:"Well, I thought it was interesting. Maybe a little dull in parts."
0 121710 action:2 message:"Bashir and Garak walk into the replimat, finding it packed with people queuing."
2 121711 message:"Oh, wonderful. At this rate, we'll be done eating lunch just in time for dinner."
121713 action:1 message:"There's always Quark's."
2 121719 message:"True, but I'm really not in the mood for noisy, crowded and vulgar today."

### output:
{"action":1,"message":"Then I suppose the Klingon restaurant is out of the question."}

That's a lot, right? Let's break it down.

  • QQv2: This just indicates the current version of Quirky Quarters prompt format. It's the sort of versioning I might add when I change something, for now it's v2.
  • ### actions: This section defines the actions the LLM can take, basically like "tool calls". You'll see things like say, narration, recollect, and memorize. I'll explain them in more detail later.
  • ### members: This is where I define the participants in the "chat", each with a unique numerical ID. I added this because I want robust support for roleplaying and group conversations.
  • ### main character: This is essentially the "system prompt" section, defining role, personality, and behavior.
  • ### parameters: This is where I can fine-tune various aspects of the LLM's behavior by setting, well, parameters. More information later.
  • ### memories: This is where all the character's memories get stored. The goal is that it contains all memories relevant to the current conversation.
  • ### history: This gives the LLM a memory of past conversations that the current members had together, by being provided with a summary of the last few conversations.
  • ### current chat: The current back-and-forth that the model needs to respond to.
  • ### output: The area where the model spits out its response, adhering to a strict JSONified output that starts with action_id and then the action's arguments.

Actions: What it's all about

Every response from the model must be an "action" - there's no free-form text allowed. Let's break down how actions work and what they do.

Action Structure

An action definition follows this format:

action_id action_name ( argument_list ) [-> {result}]

Where:

  • action_id: A unique number identifying the action
  • action_name: A descriptive name of what the action does
  • argument_list: Required and optional parameters
  • -> {result}: (Optional) What the system returns after this action

Some example actions

1. say ( message )

The most common action, used for regular dialogue. When a character speaks, they use this action.

{"action": 1, "message": "Then I suppose the Klingon restaurant is out of the question."}

2. narration ( message )

Used to describe scenes, actions, or environmental changes. In this case, the current character is not allowed to use it (because it starts with a dash)

{"action": 2, "message": "Bashir and Garak walk into the replimat, finding it packed with people."}

3. recollect ( query ) -> {memory_id: id[]}

Searches through memories based on a query. The system will respond with relevant memory IDs.

{"action": 3, "query": "What do I know about Cardassian literature?"}

Creates new memories or updates existing ones. Arguments with '?' are optional.

  • category: Type of memory (e.g., "profile", "semantic")
  • subject: Brief identifier for the memory
  • memory: The actual content to remember
  • members: IDs of characters involved
  • replace: ID of memory to update
  • importance: Priority level (0-100)
  • related: IDs of related memories

5. think ( thought )

Internal monologue or processing. Other characters can't see these thoughts. Most of the time, the character will want to do another action after thinking. It can request this by providing a queue_action property:

{"action": 5, "thought": "Garak seems particularly evasive today. I should insist he answer my question.", "queue_action": 1}

6. emote ( message )

{"action": 6, "message": "raises an eyebrow skeptically"}

Special Action Rules

  1. Disabled Actions Actions marked with a minus sign (-) are not available to the current character:
9 -nameofaction ( )
  1. Action Results Some actions trigger system responses, indicated by the arrow syntax:
3 recollect ( query ) -> {memory_id: id[]}

The system will provide new data based on these actions, which the model must then process.

  1. Action Order Multiple actions can be used in sequence. For example, a character might think before speaking:
{"action": 5, "thought": "I should be diplomatic here."}
{"action": 1, "message": "Perhaps we could try the Vulcan cafe instead?"}

This action system provides structure to conversations while allowing for complex interactions, memory management, and role-playing scenarios.

Parameters

Parameters always start with a space and then have to be a single word, though underscores are allowed. Then it is followed by a space, and then the value.

You can "nest" parameters like so:

 think:
 detail 5

 message:
 detail 6

This means the "detail" of thinking is set at 5, while the "detail" of a message is set at 6. I like to keep these numerical values to a range of 0 to 9. Not because of token reasons (since most models use a single token for numbers ranging from 0 to 99), but more because teaching it a specific behaviour based on a single token is not easy, and limiting it to 10 possible values should be enough.

Here are some specific parameters:

  • now: Obviously the current date & time
  • mode: Very important parameter that should define the behaviour of the current character. It can contain values like conversation, roleplay, assistant, ...
  • last_seen: is used so the model knows which message it has seen last (not necesarily its own last sent message though, since sometimes another participant could have sent a message before the model has finished its answer)
  • medium: Tells the model where through which medium the current interaction is being held. Right now I have samples for chat and forum
  • language: Tells the model what language the output should be. Can be a locale code like nl-be
  • groundedness: This is one of those "0 to 9" values. At value 9, it is only allowed to answer with information from the context itself. So if I asked it "What is a chair?" at groundedness 9 and I did not supply this information in the context, it has to refuse to answer.
  • think.rate: I use the nested rate parameters a lot to kind of nudge the models into using a specific action more or less. A think rate of 9 would mean the model has to perform a think action every time it's prompted, while a 0 would mean it's never allowed to use it. (For the think action it's best to leave it out then of course)
  • say.brevity: Controls response length. At a value of 9, the responses should be incredibly short. While at 0 they should be incredibly long.

Example of multiple parameters working together

 mode roleplay
 emotionality 8
 
 think:
 rate 7
 brevity 2

 message:
 brevity 7
 clarity 8

This configuration would create a character that:

  • Stays in character (roleplay)
  • Shows strong emotions
  • Frequently thinks long before doing something visible
  • Gives quite short messages that are very clear

Current chat section

The format has gone through several iterations. Initially, even the ### current chat section consisted of pure JSON-lines. But then I went with a similar, more sparse format: basically leaving out the brackets and replacing commas with spaces, and not escaping the keys, ... It saved quite a few tokens.

The first 2 values of a line are also always: the ID of the member "speaking" and then the timestamp of when it was said. Obviously, the "thought" actions of another member are not visible to the current character.

The dataset

All of my data is stored in JSON files, using a specific format. From those files I then create the samples like the one you see above. One original JSON file can result in many samples, as I often train the model from each point-of-view of a conversation (well, in roleplay samples at least)

During fine-tuning, I basically let it learn from the current character's outputs only. I don't think it makes a lot of sense to make it learn to predict other participant's answers when it doesn't have the same context. Especially in role-playing situations. Maybe this does make more sense in "assistant" modes.

Right now I have around 50,000 of these final generated samples, many million trainable tokens.

Fine-tuning

All this detail gives the language model an incredibly rich picture of the situation, the characters, and the rules of engagement. I've spent a while making sure the format has the data it needs to provide proper context.

My approach has been to focus on training 7B models on my custom format. The results have been pretty good. The models have understood all the requirements and constraints in this quite detailed prompt format without any issues. It knows when to use an action (and that it should use an ID), it knows that the output should be in JSON, etc. I've never had a single response being out of this format.

The main issue is of course that it's still a 7B model. They're not the smartest. I want to train bigger models, but that's more costly. I'm also using LORA for finetuning, though doing a full finetune would be even better.

Future Directions

While the current implementation works well with 7B models, there are several areas I'd like to explore:

  • Training on larger models (13B/70B)
  • Implementing full fine-tuning instead of LORA
  • Expanding the dataset beyond current sources

TLDR

Quirky Quarters is an experiment in creating a custom prompt format for LLMs, specifically designed for roleplay and casual chat interactions. It features a structured approach to memory, multi-participant conversations, and action-based responses.

Comments

Name
Email
Website
Body
submit error done Busy
Jelle De Loecker