Ha, you are right, it was listed as additional option, I missed that. Thanks for pointing that out, here we go, pretty close to yours... Wow it is much better...

![](https://m.stacker.news/84182)

Try logging out and back in, and make sure 4o is selected so that it says this:

![](https://m.stacker.news/84142)

not sure, this is what I got with the same prompt....

![](https://m.stacker.news/84136)

JesseJames

How about this one 😂
![](https://m.stacker.news/84130)

The perfect angle of the bridge in the background is the giveaway 

WeAreAllSatoshi

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The text reads:

(left)
"Transfer between Modalities:

Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.

Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack

Cons:
* varying bit-rate across modalities
* compute not adaptive"

(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"

On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"

Introducing OpenAI's 4o Image Generation

carter

That's pretty wild.

**Image**:  ![](https://m.stacker.news/84107)

**Prompt**: 
```
A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The text reads:

(left)
"Transfer between Modalities:

Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.

Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack

Cons:
* varying bit-rate across modalities
* compute not adaptive"

(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"

On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"
```