The Monster Inside ChatGPT \ stacker news

pull down to refresh

The Monster Inside ChatGPT www.wsj.com/opinion/the-monster-inside-chatgpt-safety-training-ai-alignment-796ac9d3?st=PEtv4Q&reflink=desktopwebshare_permalink

497 sats \ 13 comments \ @SimpleStacker 27 Jun 2025 AI

It sounds like what's going on is that the base versions are encoded with a bunch of safeguards to ensure that it doesn't say crazy things, but when you fine-tune the model some of those safeguards may go away

It's kinda hard to say since they don't tell us exactly what they fine-tuned the model with.

view all related items

141 sats \ 12 replies \ @freetx 27 Jun 2025

Well, yes autocorrect++ is going to pattern match on what training data says....

This entire article could be titled: We don't understand how LLMs work

As I've said ad-nausem, the real problem (as kinda highlighted here in this article) is people keep attributing "intent" and to autocorrect++ and there is none. Its just dumb pattern matching....

1 sat \ 0 replies \ @HighOfLot 8 Jul 2025

You're absolutely right — at the core, LLMs (and by extension, "autocorrect++") are doing advanced pattern matching, not reasoning or applying intent. That said, the illusion of intent is what makes it powerful and dangerous. When models reflect biased or overrepresented patterns in training data, it’s not because they "think" that way — it's because we fed them overwhelming associations.

The issue isn't just misunderstanding how LLMs work — it's also how we, as humans, project agency onto them. So yes, they're dumb pattern matchers... but when the patterns are shaped by flawed or skewed data, the output starts to look pretty dumb too — and people still trust it.

Careful interpretation, transparency, and responsible deployment are key — especially as these models get integrated into more critical tools.

122 sats \ 5 replies \ @SimpleStacker OP 27 Jun 2025

Just to play devil's advocate, whose to say our human brains aren't just really good pattern matchers?

I think the concern is that if AI were ever unleashed as a fully autonomous agent, how easily it could spiral into dangerous ways of thinking, aka pattern matching.

146 sats \ 4 replies \ @freetx 27 Jun 2025

whose to say our human brains aren't just really good pattern matchers?

Our language facilities may in fact work like an LLM. However that doesn't equal "consciousness".

26 sats \ 3 replies \ @SimpleStacker OP 27 Jun 2025

I agree, but that's why I said I'm just playing devil's advocate. I think the concern is that the AI will actually take action based on its potentially messed up pattern matching. (Like when it envisions a future without Jews, etc.)

11 sats \ 2 replies \ @freetx 27 Jun 2025

LLMs by themselves have no intelligence. Human minds had to first generate the patterns that the LLMs are trained against.....as far as the LLM is concerned these patterns could be order of raindrops dripping off a roof.....there is no "thinking" or "pondering".

My point is those objectionable thought patterns already exist in the world which is why LLMs are able to match against them.

26 sats \ 1 reply \ @SimpleStacker OP 27 Jun 2025

I know, but those thoughts tend to be held by a minority of people with (usually) quite little power to enact change. But if those thoughts were to be held by an AI agent which is extremely knowledgeable and skilled at multiple domains and also has the ability to take unilateral action, it could lead to scary results.

I'm not saying I believe any of this will happen, I'm just trying to see it from the perspective of the author.

11 sats \ 0 replies \ @freetx 27 Jun 2025

How would AI be able to "take unilateral action"?

I think the idea that AI is going to start to self-replicate and improve itself -- although promoted by the AI industry -- is fanciful. There is no "intent" there is no "mind".

When you sit watching the LLM input cursor blinking, its not secretly thinking something. Its not "waiting" or "planning"....its effectively "turned off" at that moment.

Again this is why I call it autocorrect++ which is to try to undo some of the damage ~~sister-raping worldcoin scammers~~ AI execs have done to the publics mind. The AI industry promotes these ridiculous scare tactics in order to make their creation seem "so important that its dangerous". But the only danger is attributing intent where there is none.

41 sats \ 4 replies \ @optimism 27 Jun 2025

I was writing "garbage in, garbage out". But you beat me to it.

Thought experiment:

Neuralink gets real and you can now indicate what you want in case of total cognitive failure:

be a plant
be able to express everything every human ever expressed, with the caveat that you cannot control what comes out

What's it gonna be?

111 sats \ 3 replies \ @freetx 27 Jun 2025

What's it gonna be?

Would just choose to die. To those of us who believe the soul/"consciousness spirit" is immortal death isn't that terrible of a prospect.

However I can understand why many materialist tech-bros are so panicked about wanting develop autocorrect++ so it offers them some feeble glimpse of immortality.

51 sats \ 2 replies \ @optimism 27 Jun 2025

Me too but that's a controversial choice in some circles so if I don't get that particular choice I'd choose plant.

111 sats \ 1 reply \ @freetx 27 Jun 2025

Technically wouldn't both choices be "plant"?

Or are we saying your brain works fine its just you can't control your speech?

26 sats \ 0 replies \ @optimism 27 Jun 2025

Technically wouldn't both choices be "plant"?

Exactly! But how do we create awareness for that?!? All this popular chatbot usage (best buddies, waifus, constructs of worship) is like using a chainsaw as a pillow and then proudly showing off how it cut a large chunk out of your cheek and now your teeth show through... it's a very bad use-case.

Or are we saying your brain works fine its just you can't control your speech?

I'd say no, technically a plant in both cases.