pull down to refresh

Still need to wrap my head around this. In simple terms, are initial_prompt, chunk_size, num_logprobs, and encoded_prompt the only things I'll need to make sure to remember or save somewhere when I decide to decode the output?
LLM are something that is changing really fast. Is this compatible with any model?
You mention as use case examples the encoding of PGP keys, url, cryptocurrency addresses, nostr pubkeys... You'd trust this method to hide wallet seeds like traditional stenography do?
Thanks for taking a look! The idea that you're getting at is quite important. I've imagined there's a "standard" where the first two bytes will represent a "version number" which will set a value for the free floating paramteres, and have lots of different initial_prompts to produce a variety of texts.

version=96

prompt = "Once upon a time, in a kingdom far away, there lived a"
chunk_size = 3
model = Mistral7Bv0.2Q4

version=154

inital_prompt="The algorithm processes data by first analyzing the input and then"
chunk_size=2
model = Llama4.1-8B-Q6
You will only need to remeber encoded_prompt which will have the data + version number encoded in it. So it should eventually be able to work like opening up .docx with MSWord. If it's a valid text create by the encode, it will open up the message, if it's invalid it will fail or show random characters.
I get that people's instincts are let me hide my seedphrase in there because it's the most obvious thing to hide but it's not really the correct fit (IMO). There's other opportunities that I see opening up after people think about this concept for a week...
reply
You will only need to remeber encoded_prompt which will have the data + version number encoded in it.
Are you sure will be enough? I feel another detail to remember is the model? Or any model can be used?
Why not seedphrase and what are the other opportunities you see at the horizon?
reply
Yes models must match, and it's trickier because there are often dozens of "levels of quantization" for each every model. But the particular model / quant-level expected can come baked into a version code-number or its own meta-paramater and be checked rather rigorously by asserting on the model-weight's hash at load time.
The opportunities are pretty wide in my opinion, but will take years and many researchers to ideate on. For example, take the Spaces Protocol below, the what, how, why of creating identities or address space on a blockchain took a decade from when the earliest of adopters started thinking about blockchain technology.
Basically if more of communications continues to go into/out-of/in-between LLM's there's this secondary "data layer" that can be hidden or exposed with ideas like this. There are freedrom and opensource concerns too since the major similar art is Google and the like developing internal system to watermark their model output: https://deepmind.google/science/synthid/
reply