what I do to simply test a `safetensors` model (though this one is huge so you need proper hardware) through hf/torch:

**env, prereqs and model download:**

```bash
uv venv
. .venv/bin/activate
uv pip install torch transformers accelerate
# optional: hf auth login
hf download "<org/repo>" # i.e. "google/gemma-3-270m-it"
```

**example usage:**

```python
import torch
from transformers import pipeline

model_name = "google/gemma-3-270m-it" # org/repo format as used in hf download

chat = [
  {"role": "system", "content": "You're a helpful assistant."},
  {"role": "user", "content": "Explain consciousness in simple, concise terms."},
]

pipeline = pipeline(task="text-generation", model=model_name, device_map="auto")
response = pipeline(chat, max_new_tokens=512)
print(response[0]["generated_text"][-1]["content"])
```

**example output: `uv run yourfile.py`**

```shell
% uv run test.py
Consciousness is the state of being aware of yourself and your surroundings. 
It's like having a personal identity and internal world.
```

I really need to stop being so lazy and figure out how to run these myself and not just wait for ollama to implement it