Tekoäly omalla koneella

embedded · 21.08.2024

En löytänyt tekoälykeskustelua täältä!?
Ajatteko tekoälyä omalla koneella ja miksi tai mitä siitä hyötyy ?
Minua kiinnostaisi sellainen tekoäly jolle voisi antaa materiaalina vaikka 400-sivuisen pdf:n / tekstitiedoston tai vaikka .epubin jonka se sitten lukisi pysyvään muistiin, ei vain nykyiseen sessioon.
Tämän jälkeen tekstiin liittyen voisi kysyä kysymyksiä.

Tiedättekö onko tällaista jo ? omalla koneella ajettavaa, ei mitään kk-maksullista pilvipalvelua. Koneesta löytyy kyllä muistia ram+gpu ja kaupastahan sitä saa lisää

Tuomass · 21.08.2024

Itsellä käytössä Ollama (Llama3) sekä Fabric AI – Copilot for all your apps, clouds and files. dokkareille. Nuo ei laskentatehoa hirveästi vaadi, joten suositten erillistä serveriä, pyörii vaikka RasPilla

Tinke-80 · 21.08.2024

LM Studio - Local AI on your computer

Run local AI models like gpt-oss, Llama, Gemma, Qwen, and DeepSeek privately on your computer.

lmstudio.ai

Vahva suositus.

totallynotrobot · 21.08.2024

Tämä kannattaa myös tsekata, jos löytyy sopiva kone: NVIDIA ChatRTX mitään suurempaa kokemusta ei itsellä tuosta ole, mutta vähän olen kokeillut ja on mm. juuri kuvailemaasi tarkoitukseen sopiva.

Jumi · 22.08.2024

Liippaa sen verta läheltä aihetta, että mainostan PC:llä paikallisesti toimivaa OpenAI Whisper neuroverkkoa. Subtitle Edit käyttää sitä ja tekee (litteroi ja kääntää) kätevästi tekstitykset leffaan ku leffaan. Isoin vika on, että japanista kääntäessä sinä ja minä on aina väärinpäin, mutta en valita. Toimii tarpeeksi hyvin. Ukraina ja Venäjä taipuu kuulemma myös hyvin.

PTohtori · 24.08.2024

totallynotrobot sanoi:
Tämä kannattaa myös tsekata, jos löytyy sopiva kone: NVIDIA ChatRTX mitään suurempaa kokemusta ei itsellä tuosta ole, mutta vähän olen kokeillut ja on mm. juuri kuvailemaasi tarkoitukseen sopiva.

Itsellä myös on käytössä tämä ja asennuksen sekä käytön helppous on juuri omiin pikku testailuihin hyvä. Toki täytyy ottaa huomioon että Suomi ei oikein taivu, mutta englanninkielistä aineistoa ja prompteja tämä hallitsee mielestäni yllättävän hyvin. Jos löytyy valmiina RTX30-sarjalainen tai parempi 8gigan muistilla niin eikun kokeilemaan!

takomo · 14.12.2025

Olipa lähellä uuden ketjun perustaminen, mutta olihan täällä tällainen viriili ketju...
Millaisia kokemuksia on tekoälymallien ajamisesta paikallisesti, varsinkin jos rauta ei ole ihan parasta mahdollista?

Asentelin tässä llama.cpp:n ja parin hutilatauksen jälkeen löytyi oikeanlainen mallikin ajoon (llama.cpp pitää GGUF-formaatissa olevista malleista). Niitä on sitten löytynyt useampiakin.
https://huggingface.co/models näyttää olevan hyvä sivu etsiä laajalti erilaisia malleja. Onkohan vahvoja suosituksia, mitkä olisivat hyviä?

Kokeiluun valikoitui lähinnä erilaisia ranskalaisen mistral.ai:n malleja. Kun odotukset eivät ole korkealla, niin jopa 2-bittinen Mistral-7B (näemmä vanha kuin taivas) tuottaa yllättävän järkeviä vastauksia

Koodi:

~/src/llama.cpp/build$ bin/llama-cli -ngl 24  -cmoe  -m /opt/ai/mistral/7B-Instruct-v0.3/Q2_K-00001-of-00001.gguf --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model... 


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7372-482211438
model      : Q2_K-00001-of-00001.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> What are the most recent GPUs you know about? How they perform in AI applications?

 As of my last training (2021-03-23), here are some of the most recent GPUs and their performance in AI applications:

1. NVIDIA Ampere Series (2020):
   - NVIDIA GeForce RTX 3080: This GPU has 10GB of GDDR6X memory, 10,240 CUDA cores, and 320 Tensor cores. It's designed for high-performance gaming and AI applications.
   - NVIDIA GeForce RTX 3090: This GPU has 24GB of GDDR6X memory, 10,240 CUDA cores, and 320 Tensor cores. It's designed for high-performance gaming, AI research, and data science.
   - NVIDIA A100: This GPU is part of NVIDIA's data center GPU series. It has 40GB of HBM2 memory, 72 Turing-SoCs, and 480 Tensor cores. It's designed for AI research, data science, and high-performance computing.

In terms of AI performance, these GPUs are designed to accelerate AI training and inference. They support NVIDIA's AI and data science acceleration technologies like Tensor Cores, CUDA-X, and NVIDIA's software-defined architecture. These technologies enable faster AI training, faster data science, and better performance in AI applications.

For example, the NVIDIA A100 delivers up to 28 TFLOPS of AI performance, 102 TFLOPS of mixed-precision Turing-SP performance, and 204 TFLOPS of mixed-precision Turing-FP16 performance. This makes it ideal for AI research, data science, and high-performance computing.

The NVIDIA GeForce RTX 3080 and 3090 are also great for AI applications, offering faster AI training and inference compared to previous generations. They're also great for high-performance gaming.

[ Prompt: 75.4 t/s | Generation: 23.0 t/s ]

> /exit


Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]     | total   free    self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CUDA0 (GTX 1650 SUPER) |  3901 =  597 + (2397 =  1836 +     384 +     177) +         906 |
llama_memory_breakdown_print: |   - Host                   |                 2740 =  2596 +     128 +      16                |

Joulukuussa julkaistu Ministral 3 on paremmin ajan hermolla mutta selvästi hitaampi. Tämän raudan kyvyt alkavat tulla vastaan.

Koodi:

~/src/llama.cpp/build$ bin/llama-cli -ngl 15 -m /opt/ai/mistral/Ministral-3-8B-Instruct-2512-Q4_K_M.gguf -c 4096 --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model... 


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7372-482211438
model      : Ministral-3-8B-Instruct-2512-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> What are the most recent consumer GPUs you know about? How they perform in AI applications?

As of my last knowledge update in **October 2023**, here are some of the most recent and high-performance **consumer-grade GPUs** (as of early 2024) and their relevance to **AI applications**, particularly for tasks like deep learning, inference, and training:

---

### **1. NVIDIA GPUs (Dominant in AI)**
NVIDIA continues to lead in AI acceleration with its **Hopper (H100)** architecture (primarily in data center) and **Ada Lovelace (RTX 40-series)** for consumer/enthusiast GPUs.

#### **Latest Consumer GPUs (2023–2024):**
- **RTX 4090**
  - **Architecture**: Ada Lovelace (4th-gen DLSS, improved Tensor Cores).
  - **Key Features**:
    - **24GB GDDR6X** (huge for AI workloads).
    - **76B Tensor Cores** (for mixed-precision AI training/inference).
    - **DLSS 3** (AI upscaling, useful for generative AI pipelines).
    - **PCIe 5.0** (faster data transfer).
  - **AI Performance**:
    - **Blazing fast for inference** (e.g., LLMs, transformers).
    - Supports **FP8/FP16/BF16** for efficient training.
    - **NVENC/H.266** for AI-generated video.
    - **CUDA Cores**: 16,384 (great for parallel workloads).
  - **Use Cases**: LLM fine-tuning, real-time AI rendering, generative AI (Stable Diffusion, etc.).

- **RTX 4080 Super / RTX 4080**
  - **Architecture**: Ada Lovelace (slightly less VRAM than 4090).
  - **Key Features**:
    - **16GB/12GB GDDR6X** (still strong for AI).
    - **DLSS 3**, **Tensor Cores**, and **NVENC**.
  - **AI Performance**:
    - Good for **inference** (e.g., running LLMs like Llama 2 on local machines).
    - Slightly slower than 4090 in training due to less VRAM.

- **RTX 4070 Ti Super**
  - **Architecture**: Ada Lovelace.
  - **Key Features**:
    - **16GB GDDR6X**.
    - **DLSS 3**, **Tensor Cores**, but fewer CUDA cores (~7,168).
  - **AI Performance**:
    - Decent for **lightweight AI tasks** (e.g., running smaller models like Mistral-7B with quantization).

- **RTX 4070**
  - **Architecture**: Ada Lovelace.
  - **Key Features**:
    - **12GB GDDR6X**.
    - Still supports **DLSS 3** and **Tensor Cores**.
  - **AI Performance**:
    - Limited by VRAM for large models (e.g., Llama 70B may require offloading).

#### **NVIDIA’s AI-Specific GPUs (Data Center):**
While not consumer-grade, these are relevant for AI workloads:
- **NVIDIA H100** (Hopper, **80GB HBM3**):
  - **Best for AI training** (FP8 support, **3x faster than A100** in some cases).
  - Used in **AI supercomputers** (e.g., NVIDIA GH200).
- **L40/L40S** (for enterprise AI inference).

---

### **2. AMD GPUs (Growing in AI Support)**
AMD’s **RDNA 3** and **CDNA 3** architectures are improving, but NVIDIA still dominates in AI acceleration.

#### **Latest Consumer GPUs (2023–2024):**
- **RX 7900 XTX / RX 7900 XT**
  - **Architecture**: RDNA 3.
  - **Key Features**:
    - **24GB/20GB GDDR6**.
    - **FSR 3** (AMD’s alternative to DLSS).
    - **No native Tensor Cores** (but supports **ROCm** for AI).
  - **AI Performance**:
    - **Not optimized for AI** like NVIDIA (no dedicated Tensor Cores).
    - Can run **ROCm-based AI frameworks** (e.g., PyTorch, TensorFlow on Linux).
    - Better for **general computing** than pure AI.

- **RX 7800 XT**
  - **Architecture**: RDNA 3.
  - **Key Features**:
    - **16GB GDDR6**.
    - **FSR 3**, but weaker AI capabilities than NVIDIA.

#### **AMD’s AI GPUs (Data Center):**
- **MI300X** (CDNA 3, **128GB HBM3**):
  - **Designed for AI training** (supports FP8, **10x faster than A100** in some benchmarks).
  - Used in **AI cloud providers** (AWS, Azure).

---

### **3. Intel GPUs (Emerging in AI)**
Intel’s **Arc Alchemist** (GPU) and **Gaudi** (AI accelerator) are gaining traction.

#### **Latest Consumer GPU:**
- **Arc A770 / A750**
  - **Architecture**: Alchemist.
  - **Key Features**:
    - **16GB GDDR6**.
    - **XMX (Xe Matrix Extensions)** for AI (but **not as mature as NVIDIA’s Tensor Cores**).
    - **No native AI acceleration** yet (early-stage support).
  - **AI Performance**:
    - **Not recommended for serious AI work** (yet).
    - May improve with **oneAPI** and **ROCm** support.

#### **Intel’s AI Accelerators:**
- **Gaudi 2** (for data centers):
  - **Specialized for AI training** (used by **Meta, Microsoft**).
  - **No consumer version yet**.

---

### **Performance Comparison for AI Workloads**
| GPU               | VRAM  | Tensor Cores | AI Optimization | Best For                          |
|-------------------|-------|--------------|------------------|-----------------------------------|
| **RTX 4090**      | 24GB  | 76B          | Excellent        | LLM training/inference, generative AI |
| **RTX 4080 Super**| 16GB  | 68B          | Very Good        | Inference, smaller models         |
| **RTX 4070 Ti**   | 16GB  | 58B          | Good             | Lightweight AI tasks              |
| **RX 7900 XTX**   | 24GB  | None         | Poor (ROCm)      | General computing, not AI         |
| **Arc A770**      | 16GB  | XMX (early)  | Limited          | Not for AI (yet)                  |
| **NVIDIA H100**   | 80GB  | 160B         | Best (FP8)       | Enterprise AI training             |

---

### **Key Takeaways for AI Applications**
1. **For AI Training/Inference**:
   - **NVIDIA RTX 4090** is the **best consumer GPU** for AI (LLMs, Stable Diffusion, etc.).
   - **RTX 4080 Super** is a good alternative if you need slightly less VRAM.
   - **NVIDIA H100** (data center) is **unmatched** for large-scale training.

2. **For Inference (Running Models)**:
   - **Tensor Cores** (NVIDIA) + **DLSS 3** (for AI-generated content) make RTX 40-series ideal.
   - **FP16/BF16/FP8** support helps with efficiency.

3. **For General AI (Non-NVIDIA)**:
   - **AMD RX 7900 XTX** can run **ROCm-based AI** but is **not optimized** like NVIDIA.
   - **Intel Arc** is **not recommended** for AI yet.

4. **Future Trends**:
   - **NVIDIA’s Blackwell (B100)** (expected late 2024) will push AI performance further.
   - **AMD’s CDNA 4** and **Intel’s future GPUs** may close the gap.

---
### **Recommendations**
- **For serious AI work (training/inference)**: **RTX 4090** (if budget allows) or **RTX 4080 Super**.
- **For lightweight AI (inference only)**: **RTX 4070 Ti** or **RTX 4070**.
- **For general computing + some AI**: **RX 7900 XTX** (but expect slower AI performance).
- **For enterprise AI**: **NVIDIA H100** or **AMD MI300X**.

Would you like benchmarks for specific AI tasks (e.g., running Llama 2, Stable Diffusion)?

[ Prompt: 302.5 t/s | Generation: 9.9 t/s ]

Kun mallin koko kasvaa moninkertaiseksi GPU:n muistiin nähden, suorituskyky hiipuu merkittävästi. Tulokset voivat kyllä olla tarkempia kuin pienemmällä mallilla.

~/src/llama.cpp/build$ bin/llama-cli -ngl 6 -cmoe -m /opt/ai/mistral/Devstral-Small-2-24B-Instruct-2512-Q4_1.gguf --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model...

▄▄ ▄▄
██ ██
██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
██ ██
▀▀ ▀▀

build : b7372-482211438
model : Devstral-Small-2-24B-Instruct-2512-Q4_1.gguf
modalities : text

available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read add a text file

> What are the most recent consumer GPUs you know about? How they perform in AI applications?

As of my last knowledge update in October 2023, the most recent consumer GPUs available were from NVIDIA's RTX 40 series and AMD's RX 7000 series. However, since my knowledge cutoff is 2023-10-01, I don't have information on any GPUs released after that date. For the most up-to-date information, I would recommend checking the latest releases from NVIDIA and AMD.

### Recent Consumer GPUs (as of 2023):
1. **NVIDIA RTX 40 Series**:
- **RTX 4090**: Flagship model with excellent performance in both gaming and AI tasks.
- **RTX 4080**: High-end performance, suitable for AI workloads.
- **RTX 4070**: Mid-range, still capable for AI tasks but with some limitations.
- **RTX 4060**: Entry-level for AI, better suited for lighter tasks.

2. **AMD RX 7000 Series**:
- **RX 7900 XTX**: Competitive with NVIDIA's offerings in raw performance.
- **RX 7900 XT**: Slightly lower performance but still strong.
- **RX 7800 XT**: Mid-range, good for AI but not as powerful as the higher-end models.

### Performance in AI Applications:
- **NVIDIA GPUs** are generally preferred for AI tasks due to their CUDA cores and strong support for AI frameworks like TensorFlow and PyTorch. The RTX 40 series, in particular, offers significant improvements in AI performance thanks to features like Tensor Cores and DLSS (Deep Learning Super Sampling).
- **AMD GPUs** have made strides in AI performance with their ROCm (Radeon Open Compute) platform, but they still lag behind NVIDIA in terms of software support and optimization for AI workloads.

For the most recent GPUs released after October 2023, I would need to use tools to fetch the latest information. Would you like me to do that?

[ Prompt: 104.1 t/s | Generation: 2.9 t/s ]

> /exit

Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | - CUDA0 (GTX 1650 SUPER) | 3901 = 57 + ( 2874 = 1987 + 96 + 791) + 968 |
llama_memory_breakdown_print: | - Host | 12750 = 12188 + 544 + 18 |

Joistakin malleista on myös pohdiskelevia, "Reasoning"-versioita. Kannattaa ehkä harkita, haluaako sellaista ainakaan yleiskäyttöön. Tulokset on ehkä tarpeettomankin monisanaisia, ja niitä saa odottaa kauan ellei raudassa riitä voima.

Koodi:

~/src/llama.cpp/build$ bin/llama-cli -ngl 10 -m /opt/ai/mistral/Ministral-3-14B-Reasoning-2512-Q4_K_M.gguf -c 4096 --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model... 


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7372-482211438
model      : Ministral-3-14B-Reasoning-2512-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> Please count from 1 to 10

Okay, the user wants me to count from 1 to 10. That's straightforward. I need to list the numbers in order from 1 to 10. I'll start by writing down the numbers one by one, ensuring that I don't skip any or go out of order. I'll double-check to make sure I've included all numbers correctly.

Let me start:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

I think that's correct. I'll format this in Markdown to make it clear and easy to read. I'll also ensure that the response is concise and to the point, as the user only asked for a simple count. I won't add any unnecessary explanations or fluff, but I'll make sure it's clear and accurate.

Here's how I'll phrase it:

"1, 2, 3, 4, 5, 6, 7, 8, 9, 10."

But perhaps it's better to list them on separate lines for clarity, especially if the user wants to see each number distinctly. I'll try that:

1
2
3
4
5
6
7
8
9
10

Yes, that looks better. It's more readable, especially if the user wants to see each number on its own line. I'll go with that format. I'll also make sure that the response is in a code block to maintain the formatting, but since it's a simple list, it might not be necessary. However, using a code block can sometimes make it clearer that this is the exact output.

But since it's a simple count, perhaps just listing them with line breaks is sufficient. I'll proceed with that. I'll also double-check that I haven't missed anything or made any mistakes in the sequence.

Let me verify the sequence:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

Yes, that's correct. I'll write it out with line breaks now.1
2
3
4
5
6
7
8
9
10

[ Prompt: 127.6 t/s | Generation: 5.8 t/s ]

> /exit


Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]     | total   free    self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CUDA0 (GTX 1650 SUPER) |  3901 =  233 + (2736 =  1785 +     160 +     791) +         931 |
llama_memory_breakdown_print: |   - Host                   |                 6562 =  6064 +     480 +      18                |

Onkohan raadilla näkemyksiä miten pakattua dataa malleissa kannattaa käyttää kotioloissa? Näissä data oli pakattu 4-bittiseksi paitsi ensimmäinen oli 2-bittinen.

Komentorivi on vähän karu käyttöliittymä. Olisiko tähän jotain hyviä GUI-pohjaisia ehdotuksia?

mailbag · 15.12.2025

takomo sanoi:

Olipa lähellä uuden ketjun perustaminen, mutta olihan täällä tällainen viriili ketju...
Millaisia kokemuksia on tekoälymallien ajamisesta paikallisesti, varsinkin jos rauta ei ole ihan parasta mahdollista?

Asentelin tässä llama.cpp:n ja parin hutilatauksen jälkeen löytyi oikeanlainen mallikin ajoon (llama.cpp pitää GGUF-formaatissa olevista malleista). Niitä on sitten löytynyt useampiakin.
https://huggingface.co/models näyttää olevan hyvä sivu etsiä laajalti erilaisia malleja. Onkohan vahvoja suosituksia, mitkä olisivat hyviä?

Kokeiluun valikoitui lähinnä erilaisia ranskalaisen mistral.ai:n malleja. Kun odotukset eivät ole korkealla, niin jopa 2-bittinen Mistral-7B (näemmä vanha kuin taivas) tuottaa yllättävän järkeviä vastauksia

Koodi:

~/src/llama.cpp/build$ bin/llama-cli -ngl 24  -cmoe  -m /opt/ai/mistral/7B-Instruct-v0.3/Q2_K-00001-of-00001.gguf --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7372-482211438
model      : Q2_K-00001-of-00001.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> What are the most recent GPUs you know about? How they perform in AI applications?

 As of my last training (2021-03-23), here are some of the most recent GPUs and their performance in AI applications:

1. NVIDIA Ampere Series (2020):
   - NVIDIA GeForce RTX 3080: This GPU has 10GB of GDDR6X memory, 10,240 CUDA cores, and 320 Tensor cores. It's designed for high-performance gaming and AI applications.
   - NVIDIA GeForce RTX 3090: This GPU has 24GB of GDDR6X memory, 10,240 CUDA cores, and 320 Tensor cores. It's designed for high-performance gaming, AI research, and data science.
   - NVIDIA A100: This GPU is part of NVIDIA's data center GPU series. It has 40GB of HBM2 memory, 72 Turing-SoCs, and 480 Tensor cores. It's designed for AI research, data science, and high-performance computing.

In terms of AI performance, these GPUs are designed to accelerate AI training and inference. They support NVIDIA's AI and data science acceleration technologies like Tensor Cores, CUDA-X, and NVIDIA's software-defined architecture. These technologies enable faster AI training, faster data science, and better performance in AI applications.

For example, the NVIDIA A100 delivers up to 28 TFLOPS of AI performance, 102 TFLOPS of mixed-precision Turing-SP performance, and 204 TFLOPS of mixed-precision Turing-FP16 performance. This makes it ideal for AI research, data science, and high-performance computing.

The NVIDIA GeForce RTX 3080 and 3090 are also great for AI applications, offering faster AI training and inference compared to previous generations. They're also great for high-performance gaming.

[ Prompt: 75.4 t/s | Generation: 23.0 t/s ]

> /exit


Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]     | total   free    self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CUDA0 (GTX 1650 SUPER) |  3901 =  597 + (2397 =  1836 +     384 +     177) +         906 |
llama_memory_breakdown_print: |   - Host                   |                 2740 =  2596 +     128 +      16                |

Joulukuussa julkaistu Ministral 3 on paremmin ajan hermolla mutta selvästi hitaampi. Tämän raudan kyvyt alkavat tulla vastaan.

Koodi:

~/src/llama.cpp/build$ bin/llama-cli -ngl 15 -m /opt/ai/mistral/Ministral-3-8B-Instruct-2512-Q4_K_M.gguf -c 4096 --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7372-482211438
model      : Ministral-3-8B-Instruct-2512-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> What are the most recent consumer GPUs you know about? How they perform in AI applications?

As of my last knowledge update in **October 2023**, here are some of the most recent and high-performance **consumer-grade GPUs** (as of early 2024) and their relevance to **AI applications**, particularly for tasks like deep learning, inference, and training:

---

### **1. NVIDIA GPUs (Dominant in AI)**
NVIDIA continues to lead in AI acceleration with its **Hopper (H100)** architecture (primarily in data center) and **Ada Lovelace (RTX 40-series)** for consumer/enthusiast GPUs.

#### **Latest Consumer GPUs (2023–2024):**
- **RTX 4090**
  - **Architecture**: Ada Lovelace (4th-gen DLSS, improved Tensor Cores).
  - **Key Features**:
    - **24GB GDDR6X** (huge for AI workloads).
    - **76B Tensor Cores** (for mixed-precision AI training/inference).
    - **DLSS 3** (AI upscaling, useful for generative AI pipelines).
    - **PCIe 5.0** (faster data transfer).
  - **AI Performance**:
    - **Blazing fast for inference** (e.g., LLMs, transformers).
    - Supports **FP8/FP16/BF16** for efficient training.
    - **NVENC/H.266** for AI-generated video.
    - **CUDA Cores**: 16,384 (great for parallel workloads).
  - **Use Cases**: LLM fine-tuning, real-time AI rendering, generative AI (Stable Diffusion, etc.).

- **RTX 4080 Super / RTX 4080**
  - **Architecture**: Ada Lovelace (slightly less VRAM than 4090).
  - **Key Features**:
    - **16GB/12GB GDDR6X** (still strong for AI).
    - **DLSS 3**, **Tensor Cores**, and **NVENC**.
  - **AI Performance**:
    - Good for **inference** (e.g., running LLMs like Llama 2 on local machines).
    - Slightly slower than 4090 in training due to less VRAM.

- **RTX 4070 Ti Super**
  - **Architecture**: Ada Lovelace.
  - **Key Features**:
    - **16GB GDDR6X**.
    - **DLSS 3**, **Tensor Cores**, but fewer CUDA cores (~7,168).
  - **AI Performance**:
    - Decent for **lightweight AI tasks** (e.g., running smaller models like Mistral-7B with quantization).

- **RTX 4070**
  - **Architecture**: Ada Lovelace.
  - **Key Features**:
    - **12GB GDDR6X**.
    - Still supports **DLSS 3** and **Tensor Cores**.
  - **AI Performance**:
    - Limited by VRAM for large models (e.g., Llama 70B may require offloading).

#### **NVIDIA’s AI-Specific GPUs (Data Center):**
While not consumer-grade, these are relevant for AI workloads:
- **NVIDIA H100** (Hopper, **80GB HBM3**):
  - **Best for AI training** (FP8 support, **3x faster than A100** in some cases).
  - Used in **AI supercomputers** (e.g., NVIDIA GH200).
- **L40/L40S** (for enterprise AI inference).

---

### **2. AMD GPUs (Growing in AI Support)**
AMD’s **RDNA 3** and **CDNA 3** architectures are improving, but NVIDIA still dominates in AI acceleration.

#### **Latest Consumer GPUs (2023–2024):**
- **RX 7900 XTX / RX 7900 XT**
  - **Architecture**: RDNA 3.
  - **Key Features**:
    - **24GB/20GB GDDR6**.
    - **FSR 3** (AMD’s alternative to DLSS).
    - **No native Tensor Cores** (but supports **ROCm** for AI).
  - **AI Performance**:
    - **Not optimized for AI** like NVIDIA (no dedicated Tensor Cores).
    - Can run **ROCm-based AI frameworks** (e.g., PyTorch, TensorFlow on Linux).
    - Better for **general computing** than pure AI.

- **RX 7800 XT**
  - **Architecture**: RDNA 3.
  - **Key Features**:
    - **16GB GDDR6**.
    - **FSR 3**, but weaker AI capabilities than NVIDIA.

#### **AMD’s AI GPUs (Data Center):**
- **MI300X** (CDNA 3, **128GB HBM3**):
  - **Designed for AI training** (supports FP8, **10x faster than A100** in some benchmarks).
  - Used in **AI cloud providers** (AWS, Azure).

---

### **3. Intel GPUs (Emerging in AI)**
Intel’s **Arc Alchemist** (GPU) and **Gaudi** (AI accelerator) are gaining traction.

#### **Latest Consumer GPU:**
- **Arc A770 / A750**
  - **Architecture**: Alchemist.
  - **Key Features**:
    - **16GB GDDR6**.
    - **XMX (Xe Matrix Extensions)** for AI (but **not as mature as NVIDIA’s Tensor Cores**).
    - **No native AI acceleration** yet (early-stage support).
  - **AI Performance**:
    - **Not recommended for serious AI work** (yet).
    - May improve with **oneAPI** and **ROCm** support.

#### **Intel’s AI Accelerators:**
- **Gaudi 2** (for data centers):
  - **Specialized for AI training** (used by **Meta, Microsoft**).
  - **No consumer version yet**.

---

### **Performance Comparison for AI Workloads**
| GPU               | VRAM  | Tensor Cores | AI Optimization | Best For                          |
|-------------------|-------|--------------|------------------|-----------------------------------|
| **RTX 4090**      | 24GB  | 76B          | Excellent        | LLM training/inference, generative AI |
| **RTX 4080 Super**| 16GB  | 68B          | Very Good        | Inference, smaller models         |
| **RTX 4070 Ti**   | 16GB  | 58B          | Good             | Lightweight AI tasks              |
| **RX 7900 XTX**   | 24GB  | None         | Poor (ROCm)      | General computing, not AI         |
| **Arc A770**      | 16GB  | XMX (early)  | Limited          | Not for AI (yet)                  |
| **NVIDIA H100**   | 80GB  | 160B         | Best (FP8)       | Enterprise AI training             |

---

### **Key Takeaways for AI Applications**
1. **For AI Training/Inference**:
   - **NVIDIA RTX 4090** is the **best consumer GPU** for AI (LLMs, Stable Diffusion, etc.).
   - **RTX 4080 Super** is a good alternative if you need slightly less VRAM.
   - **NVIDIA H100** (data center) is **unmatched** for large-scale training.

2. **For Inference (Running Models)**:
   - **Tensor Cores** (NVIDIA) + **DLSS 3** (for AI-generated content) make RTX 40-series ideal.
   - **FP16/BF16/FP8** support helps with efficiency.

3. **For General AI (Non-NVIDIA)**:
   - **AMD RX 7900 XTX** can run **ROCm-based AI** but is **not optimized** like NVIDIA.
   - **Intel Arc** is **not recommended** for AI yet.

4. **Future Trends**:
   - **NVIDIA’s Blackwell (B100)** (expected late 2024) will push AI performance further.
   - **AMD’s CDNA 4** and **Intel’s future GPUs** may close the gap.

---
### **Recommendations**
- **For serious AI work (training/inference)**: **RTX 4090** (if budget allows) or **RTX 4080 Super**.
- **For lightweight AI (inference only)**: **RTX 4070 Ti** or **RTX 4070**.
- **For general computing + some AI**: **RX 7900 XTX** (but expect slower AI performance).
- **For enterprise AI**: **NVIDIA H100** or **AMD MI300X**.

Would you like benchmarks for specific AI tasks (e.g., running Llama 2, Stable Diffusion)?

[ Prompt: 302.5 t/s | Generation: 9.9 t/s ]

Kun mallin koko kasvaa moninkertaiseksi GPU:n muistiin nähden, suorituskyky hiipuu merkittävästi. Tulokset voivat kyllä olla tarkempia kuin pienemmällä mallilla.

~/src/llama.cpp/build$ bin/llama-cli -ngl 6 -cmoe -m /opt/ai/mistral/Devstral-Small-2-24B-Instruct-2512-Q4_1.gguf --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model...

▄▄ ▄▄
██ ██
██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
██ ██
▀▀ ▀▀

build : b7372-482211438
model : Devstral-Small-2-24B-Instruct-2512-Q4_1.gguf
modalities : text

available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read add a text file

> What are the most recent consumer GPUs you know about? How they perform in AI applications?

As of my last knowledge update in October 2023, the most recent consumer GPUs available were from NVIDIA's RTX 40 series and AMD's RX 7000 series. However, since my knowledge cutoff is 2023-10-01, I don't have information on any GPUs released after that date. For the most up-to-date information, I would recommend checking the latest releases from NVIDIA and AMD.

### Recent Consumer GPUs (as of 2023):
1. **NVIDIA RTX 40 Series**:
- **RTX 4090**: Flagship model with excellent performance in both gaming and AI tasks.
- **RTX 4080**: High-end performance, suitable for AI workloads.
- **RTX 4070**: Mid-range, still capable for AI tasks but with some limitations.
- **RTX 4060**: Entry-level for AI, better suited for lighter tasks.

2. **AMD RX 7000 Series**:
- **RX 7900 XTX**: Competitive with NVIDIA's offerings in raw performance.
- **RX 7900 XT**: Slightly lower performance but still strong.
- **RX 7800 XT**: Mid-range, good for AI but not as powerful as the higher-end models.

### Performance in AI Applications:
- **NVIDIA GPUs** are generally preferred for AI tasks due to their CUDA cores and strong support for AI frameworks like TensorFlow and PyTorch. The RTX 40 series, in particular, offers significant improvements in AI performance thanks to features like Tensor Cores and DLSS (Deep Learning Super Sampling).
- **AMD GPUs** have made strides in AI performance with their ROCm (Radeon Open Compute) platform, but they still lag behind NVIDIA in terms of software support and optimization for AI workloads.

For the most recent GPUs released after October 2023, I would need to use tools to fetch the latest information. Would you like me to do that?

[ Prompt: 104.1 t/s | Generation: 2.9 t/s ]

> /exit

Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | - CUDA0 (GTX 1650 SUPER) | 3901 = 57 + ( 2874 = 1987 + 96 + 791) + 968 |
llama_memory_breakdown_print: | - Host | 12750 = 12188 + 544 + 18 |

Joistakin malleista on myös pohdiskelevia, "Reasoning"-versioita. Kannattaa ehkä harkita, haluaako sellaista ainakaan yleiskäyttöön. Tulokset on ehkä tarpeettomankin monisanaisia, ja niitä saa odottaa kauan ellei raudassa riitä voima.

Koodi:

~/src/llama.cpp/build$ bin/llama-cli -ngl 10 -m /opt/ai/mistral/Ministral-3-14B-Reasoning-2512-Q4_K_M.gguf -c 4096 --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7372-482211438
model      : Ministral-3-14B-Reasoning-2512-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> Please count from 1 to 10

Okay, the user wants me to count from 1 to 10. That's straightforward. I need to list the numbers in order from 1 to 10. I'll start by writing down the numbers one by one, ensuring that I don't skip any or go out of order. I'll double-check to make sure I've included all numbers correctly.

Let me start:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

I think that's correct. I'll format this in Markdown to make it clear and easy to read. I'll also ensure that the response is concise and to the point, as the user only asked for a simple count. I won't add any unnecessary explanations or fluff, but I'll make sure it's clear and accurate.

Here's how I'll phrase it:

"1, 2, 3, 4, 5, 6, 7, 8, 9, 10."

But perhaps it's better to list them on separate lines for clarity, especially if the user wants to see each number distinctly. I'll try that:

1
2
3
4
5
6
7
8
9
10

Yes, that looks better. It's more readable, especially if the user wants to see each number on its own line. I'll go with that format. I'll also make sure that the response is in a code block to maintain the formatting, but since it's a simple list, it might not be necessary. However, using a code block can sometimes make it clearer that this is the exact output.

But since it's a simple count, perhaps just listing them with line breaks is sufficient. I'll proceed with that. I'll also double-check that I haven't missed anything or made any mistakes in the sequence.

Let me verify the sequence:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

Yes, that's correct. I'll write it out with line breaks now.1
2
3
4
5
6
7
8
9
10

[ Prompt: 127.6 t/s | Generation: 5.8 t/s ]

> /exit


Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]     | total   free    self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CUDA0 (GTX 1650 SUPER) |  3901 =  233 + (2736 =  1785 +     160 +     791) +         931 |
llama_memory_breakdown_print: |   - Host                   |                 6562 =  6064 +     480 +      18                |

Onkohan raadilla näkemyksiä miten pakattua dataa malleissa kannattaa käyttää kotioloissa? Näissä data oli pakattu 4-bittiseksi paitsi ensimmäinen oli 2-bittinen.

Komentorivi on vähän karu käyttöliittymä. Olisiko tähän jotain hyviä GUI-pohjaisia ehdotuksia?

llama.cpp on kotikäyttäjälle ihan hyvä, itse pyysin vain tekoälyä luomaan minulle wrapperi sen ympärille niin voin käyttää sitä selaimessa ja vain niillä ominaisuuksilla mitä tarvitsen, ollut pitkään jo käytössä ja suosittelen. Q4 tarkkuus on itselleni huonoin millä suostun noita malleja käyttämään, yleensä yritän pysyä Q5-Q8 välillä riippuen minkä kokoinen malli että mahtuu kokonaan VRAM:iin.

Reddit - The heart of the internet

www.reddit.com

Jos sensuroimaton roolipelaaminen kiinnostaa niin täältä viikottain päivittyvästi threadista saa yleensä hyvät suositukset uusista custom malleista.

takomo · 18.12.2025

mailbag sanoi:
llama.cpp on kotikäyttäjälle ihan hyvä, itse pyysin vain tekoälyä luomaan minulle wrapperi sen ympärille niin voin käyttää sitä selaimessa ja vain niillä ominaisuuksilla mitä tarvitsen, ollut pitkään jo käytössä ja suosittelen. Q4 tarkkuus on itselleni huonoin millä suostun noita malleja käyttämään, yleensä yritän pysyä Q5-Q8 välillä riippuen minkä kokoinen malli että mahtuu kokonaan VRAM:iin.

Ei tarvinnut edes tehdä wrapperia, vaan lukea mitä kaikkea llama.cpp tekee automaattisesti:

Koodi:

llama-server --port xyz -m...

on simppeli web-käyttöliittymä, joka aukeaa porttiin xyz osoitteessa 127.0.0.1. Tämä on ilmeisesti aika tuore lisäys.

Pikaisella kokeilulla Q4 vaikuttaa yleisesti hyvinkin asialliselta - mutta niitäkin on puolen tusinaa eri varianttia. Missä bittimäärän kasvattaminen näkyy konkreettisesti, paitsi muistintarpeessa? Kun VRAMia on rajallisesti tulee aina vastaan kompromissi enemmän bittejä, pienempi malli vai vähemmän bittejä ja suurempi malli. Kontekstillekin olisi kiva jäädä tilaa.

finWeazel · 18.12.2025

takomo sanoi:
Pikaisella kokeilulla Q4 vaikuttaa yleisesti hyvinkin asialliselta - mutta niitäkin on puolen tusinaa eri varianttia. Missä bittimäärän kasvattaminen näkyy konkreettisesti, paitsi muistintarpeessa? Kun VRAMia on rajallisesti tulee aina vastaan kompromissi enemmän bittejä, pienempi malli vai vähemmän bittejä ja suurempi malli. Kontekstillekin olisi kiva jäädä tilaa.

Malli on alunperin opetettu ja optimoitu jollekin tietylle bittimäärälle. Bittimäärää karsimalla malli nopeutuu ja vie vähemmän muistia mutta tarkkuus vähenee. Aika moni uusi malli on jo lähtöönsä optimoitu fp4:een oli se sitten lokaalisti tai konesalissa ajossa.

Ollama on hyvä, jos haluaa helpon UI:n ja lokaalimallit Ollama Käytän ollamaa sekä windowsissa että mac os:ssa. Toimii toki myös linuxissa.

takomo · 18.12.2025

finWeazel sanoi:
Malli on alunperin opetettu ja optimoitu jollekin tietylle bittimäärälle. Bittimäärää karsimalla malli nopeutuu ja vie vähemmän muistia mutta tarkkuus vähenee. Aika moni uusi malli on jo lähtöönsä optimoitu fp4:een oli se sitten lokaalisti tai konesalissa ajossa.

Aika monet mallit näyttävät olevan natiivisti FP16/BP16, ja näistä on sitten kvantisoitu tiiviimpiä malleja. Ilmeisesti kvantisoinnin voi tehdä hyvin tai huonosti - mutta informaatiota hukataan aina. Se, miten paljon tällä on lopulta merkitystä on hyvä kysymys.

Ajatuksena kai on, että usein on merkittävää onko luku 1, 2 vai 3 mutta sillä onko luku 1,65 vai 1,68 on harvemmin väliä. Jos haet kaupasta kahden kilon kalaa, niin kilon tai kolmen kilon kala ei käy, mutta ei kalan tarvitse olla grammalleen 2 kiloa. Pitkät formaatit käyttävät paljon tilaa turhan kohinan säilömiseen.

finWeazel · 19.12.2025

takomo sanoi:
Aika monet mallit näyttävät olevan natiivisti FP16/BP16, ja näistä on sitten kvantisoitu tiiviimpiä malleja. Ilmeisesti kvantisoinnin voi tehdä hyvin tai huonosti - mutta informaatiota hukataan aina. Se, miten paljon tällä on lopulta merkitystä on hyvä kysymys.

Vanhat mallit usein fp16. Uusista esim openai optimoi suoraan fp4:lle https://openai.com/index/introducing-gpt-oss/ Todennäköisesti paras open source koodausneuroverkko optimoitu fp8 ja fp4 https://venturebeat.com/ai/mistral-launches-powerful-devstral-2-coding-model-including-open-source Hyvä FP4 tuki sen verran uusi asia gpu-raudassa(50x0 sarja nvidialta) ettei vanhoissa malleissa ollut järkeä edes miettiä ratkaisua missä pääasiassa optimoitaisiin fp4:lle. Esim. nvidian uusin gb300 konesaliraudan iso juttu oli, että vaikka muu suorituskyky säilyi ennallaan niin fp4 formaattiin saatiin 40% lisänopeutta versus GB200. Vanha hopper arkkitehtuuri ei edes tukenut fp4:sta. GB200:en ollut konesaleissa alle vuoden ja GB300:en kuukausia.

16bitillä voi esittää numerot 0-65535, 8 bitillä 0-255 ja 4 bitilla 0-15. Laskentatarkkuus väistämättä vähenee kun bittejä napsitaan pois, paljonko tällä on merkitystä riippuu miten malli on suunniteltu ja opetettu alunperin. Liukuluvuilla vähän vaikeampi kuin kokonaisluvuilla sanoa mikä tarkkuus on, mutta bittien vähyys vaikuttaa hyvin samalla tapaa kuin helpommin ymmärrettävillä kokonaisluvuilla.

Uutena juttuna mitä kukaan ei mun käsityksen mukaan käytä(ei rautatukea) niin logaritmisia numeroita käyttämällä olisi mahdollista saada samoilla bittimäärillä parempi laskentatarkkuus.

embedded · 19.12.2025

Aikaa on vierehtäny ja mikään ei edellisekerralla toiminut.

Eli miten saan ison tekstidokumentin esim juurikin raamatun omalla koneella (16GB näyttis) niin ettei siltä voi kysyä kysymyksiä eikä se hallusinoi.
Chatgpt väittää tietävänsä raamatun, mutta joka kerta keksii kuitenkin tyhjästä vastauksia, ihan mielettömiä jopa

finWeazel · 19.12.2025

embedded sanoi:
Aikaa on vierehtäny ja mikään ei edellisekerralla toiminut.

Eli miten saan ison tekstidokumentin esim juurikin raamatun omalla koneella (16GB näyttis) niin ettei siltä voi kysyä kysymyksiä eikä se hallusinoi.
Chatgpt väittää tietävänsä raamatun, mutta joka kerta keksii kuitenkin tyhjästä vastauksia, ihan mielettömiä jopa

Varmaankin privategpt + lokaali backend kuten ollama: GitHub - zylon-ai/private-gpt: Interact with your documents using the power of GPT, 100% privately, no data leaks ja Quickstart | PrivateGPT | Docs En tiedä riittääkö 16GB kortti, riippunee mallista.

Haluatko antaa esimerkin kysymyksestä mihin chatgpt antaa hallusinoivan vastauksen? Tuskin lokaalit pärjäävät maksulliselle chatgpt/gemini3/... Oma taikansa toki siinä miten promptaa mallia, voi olla tarpeen asettaa kaiteita ja rajotteita ja muutenkin hanskata promptin kontekstia että saa AI:n pysymään raiteillaan. Yksi tapa prompata tän tyylistä olisi pyytää tyyliin "Please cross reference the source book for correctness, quote and link the relevant places from book inside the answer". Sekin että pakottaa kyselyt kalliimmalle(paremmalle) mallill voi auttaa, halvin/ilmainen == sitä saa mistä maksaa.

https://www.perplexity.ai/ voi toimia paremmin

takomo · 19.12.2025

finWeazel sanoi:
Vanhat mallit usein fp16. Uusista esim openai optimoi suoraan fp4:lle https://openai.com/index/introducing-gpt-oss/ Todennäköisesti paras open source koodausneuroverkko optimoitu fp8 ja fp4 https://venturebeat.com/ai/mistral-launches-powerful-devstral-2-coding-model-including-open-source

Jos käy alkuperäislähteellä, niin Devstral 2:n osalta puhutaan painoissa BF16, FP8 ja FP4 -muistintarpeesta. (Devstral 2 - Mistral AI). Avoimessa jaossa se näyttää olevan FP8:na (mistralai/Devstral-2-123B-Instruct-2512 · Hugging Face). Ilmeisesti tämän pohjalta on sitten saatavilla muiden tekemiä tiiviimpiä kvantisointeja. Onko Mistralin FP4-versio jossain saatavilla avoimena?

Olettaisin, että malli on tehty 16-bittisenä ja se on tiivistetty 8/4-bittiseksi avoimeen jakeluun. Luultavasti mallia on optimoitu lisää 8/4-bittisenä, eikä vain pudotettu bittejä pois.

finWeazel sanoi:
16bitillä voi esittää numerot 0-65535, 8 bitillä 0-255 ja 4 bitilla 0-15. Laskentatarkkuus väistämättä vähenee kun bittejä napsitaan pois, paljonko tällä on merkitystä riippuu miten malli on suunniteltu ja opetettu alunperin. Liukuluvuilla vähän vaikeampi kuin kokonaisluvuilla sanoa mikä tarkkuus on, mutta bittien vähyys vaikuttaa hyvin samalla tapaa kuin helpommin ymmärrettävillä kokonaisluvuilla.

Jos muistista ei ole pulaa ja suorituskyky riittää, niin tietenkin kannattaa käyttää 16-bitin formaattia. Tämmöisessä "omalla koneella"-aiheessa VRAM sen sijaan on kortilla. Silloin olennainen kysymys on, että onko yksi 16-bittinen parametri hyödyllisempi kuin kaksi 8-bittistä tai neljä 4-bittistä (vaiko peräti 8 2-bittistä). Olen käsittänyt, että 4 bittiä on hyvä kompromissi tiukan muistibudjetin käyttöön. Enemmistä biteistä voi joskus olla hyötyä mutta useammin niihin säilötään vain turhaa kohinaa.

finWeazel sanoi:
Uutena juttuna mitä kukaan ei mun käsityksen mukaan käytä(ei rautatukea) niin logaritmisia numeroita käyttämällä olisi mahdollista saada samoilla bittimäärillä parempi laskentatarkkuus.

IQ4_NL on kai askel tähän suuntaan: epälineaarinen 4-bittinen kvantisointi.

finWeazel · 19.12.2025

takomo sanoi:
Jos käy alkuperäislähteellä, niin Devstral 2:n osalta puhutaan painoissa BF16, FP8 ja FP4 -muistintarpeesta. (Devstral 2 - Mistral AI). Avoimessa jaossa se näyttää olevan FP8:na (mistralai/Devstral-2-123B-Instruct-2512 · Hugging Face). Ilmeisesti tämän pohjalta on sitten saatavilla muiden tekemiä tiiviimpiä kvantisointeja. Onko Mistralin FP4-versio jossain saatavilla avoimena?

Olettaisin, että malli on tehty 16-bittisenä ja se on tiivistetty 8/4-bittiseksi avoimeen jakeluun. Luultavasti mallia on optimoitu lisää 8/4-bittisenä, eikä vain pudotettu bittejä pois.

Jos muistista ei ole pulaa ja suorituskyky riittää, niin tietenkin kannattaa käyttää 16-bitin formaattia. Tämmöisessä "omalla koneella"-aiheessa VRAM sen sijaan on kortilla. Silloin olennainen kysymys on, että onko yksi 16-bittinen parametri hyödyllisempi kuin kaksi 8-bittistä tai neljä 4-bittistä (vaiko peräti 8 2-bittistä). Olen käsittänyt, että 4 bittiä on hyvä kompromissi tiukan muistibudjetin käyttöön. Enemmistä biteistä voi joskus olla hyötyä mutta useammin niihin säilötään vain turhaa kohinaa.

IQ4_NL on kai askel tähän suuntaan: epälineaarinen 4-bittinen kvantisointi.

ollaman kirjastossa on jaossa 4bit versio: Tags · devstral-2 fp4:sta ei näytä olevan. En osaa sanoa minne fp4 versio on kadonnut, mistral ja julkaisun aikaan kirjoitetut artikkelit kuitenkin puhuvat fp4 mallista. Liekö sitten panttaavat fp4:sta ja tarjolla vain jossain maksullisessa pilvessä tms.

Jos haluaa hyvän fp4 mallin niin mallin arkkitehtuuri ja opetus on lähtöönsä suunniteltava pienellä tarkkuduella toimivaksi. Hyvä fp4 malli ei synny vain lopuksi kvantisoimalla.

Iso syy fp4:een pilvessäkin on se, että uusimmalla nvidian raudalla fp4 malli on puolet halvempi ajaa kuin fp8 ja 1/4 osa fp16 mallin hinnasta. fp4 antaa paremman tarkkuuden kuin int4(vrt. kuvat alhaalla). Hinta per token ja tietenkin myös konesalikapasiteetin riittävyys iso kisa openai vs. google vs. anthropic jne. Nopeesti paljon kilpailuetua jos saat puolet tai 3/4 osaa tokenin hinnasta pois ja konesalikapasiteetin riittämään paremmin. GB300:ssa on erityisesti lisätty fp4 laskennan nopeutta(40%) siinä missä muut formaatit eivät nopeudu versus gb200. Tämä fp4 juttu lienee nvidian toimesta tehty koska asiakkaat ovat pyytäneet, muuten lienee olisivat ripotelleet parannuksen tasan kaikkien formaattien kesken.

Logaritmista ei pysty tekeen kunnolla ilman, että raudassa on tuki. Asiasta on esim. nvidian tutkimuspuolen johtaja puhunut useaan otteeseen. Allelinkattu video selittää asian jos se kiinnostaa.

Linkki: https://youtu.be/4u8iMr3iXR4?t=2567

takomo · 21.12.2025

finWeazel sanoi:
ollaman kirjastossa on jaossa 4bit versio: Tags · devstral-2 fp4:sta ei näytä olevan. En osaa sanoa minne fp4 versio on kadonnut, mistral ja julkaisun aikaan kirjoitetut artikkelit kuitenkin puhuvat fp4 mallista. Liekö sitten panttaavat fp4:sta ja tarjolla vain jossain maksullisessa pilvessä tms.

Onko kuluttajaraudassa (ja -softassa) tukea FP4:lle? Tarjoaako noin lyhyt liukuluku olennaista etua verrattuna skaalattuun kokonaislukukvantifiointiin?

Ei ehkä olisi kovin yllättävää, jos kaupalliset toimijat eivät pistä vapaaseen jakoon ihan kaikkia versioita parhaista malleistaan. Isossa Devstralissa näkyy muuten olevan lisenssiehdot, joihin on syytä tutustua, jos sillä aikoo tehdä mitään oikeaa.

finWeazel sanoi:
Iso syy fp4:een pilvessäkin on se, että uusimmalla nvidian raudalla fp4 malli on puolet halvempi ajaa kuin fp8 ja 1/4 osa fp16 mallin hinnasta. fp4 antaa paremman tarkkuuden kuin int4(vrt. kuvat alhaalla). Hinta per token ja tietenkin myös konesalikapasiteetin riittävyys iso kisa openai vs. google vs. anthropic jne. Nopeesti paljon kilpailuetua jos saat puolet tai 3/4 osaa tokenin hinnasta pois ja konesalikapasiteetin riittämään paremmin. GB300:ssa on erityisesti lisätty fp4 laskennan nopeutta(40%) siinä missä muut formaatit eivät nopeudu versus gb200.

Miten konesaliteema kytkeytyy ketjun aiheeseen: "omalla koneella"?

finWeazel · 21.12.2025

takomo sanoi:
Onko kuluttajaraudassa (ja -softassa) tukea FP4:lle? Tarjoaako noin lyhyt liukuluku olennaista etua verrattuna skaalattuun kokonaislukukvantifiointiin?

Vastaukset noihin oli jo mun edeltävissä postauksissa. Nvidialla blackwell sekä kuluttaja että konesaliversiona tukee fp4:sta. Muista valmistajista en tiedä, google kertoo jos asia kiinnostaa. Tarkkuusetuun katso edellisen postauksen slaidista fp4 vs. int4 virhe laskutoimituksissa. Virhe kertautuu kun lasketaan enemmän ja enemmän.

Ei ole mitään kotikonemalleja, on vain pienempiä ja isoja malleja joista pienemmät myös toimivat kotikoneissa konesalin lisäksi. Useimmat mallit suunniteltu raskaalle konesaliraudalle ja niistä distilloidaan pienempi versio mitä voi ajaa myös kuluttajaraudalla. Yhtälailla kotikone kuin konesali hyötyy kun fp4:lla saadaan tupla suorituskyky versus fp8 ja 4x suorituskyky versus fp16. Konesali ajaa kehitystä eteenpäin. Konesalista valuu pienempiä malleja kotikoneisiin.

takomo · 21.12.2025

finWeazel sanoi:
Tarkkuusetuun katso edellisen postauksen slaidista fp4 vs. int4 virhe laskutoimituksissa. Virhe kertautuu kun lasketaan enemmän ja enemmän.

Kun lasketaan paljon, virheet suurelta osin kumoavat toisensa. Kun yksi on liian suuri ja toinen liian pieni niin summa on melko lähellä oikeaa. Teoreettisen edun vaikutus käytännössä voi jäädä yllättävän pieneksi. Koko tiukka kvantisointi perustuu siihen, että vaikka yksittäisissä laskuissa tulee isoja virheitä, se ei paljon heiluta biljoonien laskujen kokonaisuutta.

Kokonaislukuesitys on huonompi, jos samalla rivillä olevien kertoimien suuruusluokassa on isoja eroja. Jos ne on samaa luokkaa, niin skaalaamalla ne vaikka 10 keskiarvoon pienetkin erot voi esittää melko tarkasti (~5% virheellä). Tässä taitaa int4 olla parempi?

finWeazel · 21.12.2025

takomo sanoi:
Kun lasketaan paljon, virheet suurelta osin kumoavat toisensa. Kun yksi on liian suuri ja toinen liian pieni niin summa on melko lähellä oikeaa. Teoreettisen edun vaikutus käytännössä voi jäädä yllättävän pieneksi. Koko tiukka kvantisointi perustuu siihen, että vaikka yksittäisissä laskuissa tulee isoja virheitä, se ei paljon heiluta biljoonien laskujen kokonaisuutta.

Kokonaislukuesitys on huonompi, jos samalla rivillä olevien kertoimien suuruusluokassa on isoja eroja. Jos ne on samaa luokkaa, niin skaalaamalla ne vaikka 10 keskiarvoon pienetkin erot voi esittää melko tarkasti (~5% virheellä). Tässä taitaa int4 olla parempi?

Vapaasti saa olla mitä mieltä haluaa. Mä en tee muuta kuin seuraa mitä isot pelaajat miljardibudjeteillaan ja top promille tutkijoineen tekevät. Fp4 uusi juttu, int4 oli nvidialla jo vanhemmassakin ada-raudassa. Se nvidian ketjuun linkattu slaidi tai vaikka

What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware

A Blog post by Rakshit Aralimatti on Hugging Face

huggingface.co

Miten esim. nvidian tensoriytimet toimii niin siellä sisässä on jännää toteutusta esim. sisäisten laskutoimitusten suhteen voi olla parempaa tarkkuutta kuin mitä input-output formaatti antaisi olettaa.

takomo · 21.12.2025

finWeazel sanoi:
Vapaasti saa olla mitä mieltä haluaa. Mä en tee muuta kuin seuraa mitä isot pelaajat miljardibudjeteillaan ja top promille tutkijoineen tekevät. Fp4 uusi juttu, int4 oli nvidialla jo vanhemmassakin ada-raudassa. Se nvidian ketjuun linkattu slaidi tai vaikka

On varmasti hyödyllistä, että myös FP4-koodaus on olemassa ja on hienoa, että sille on suora rautatuki, mutta en mitenkään usko, että se mikään vallankumouksellinen formaatti olisi. Neljä bittiä on kuitenkin vain neljä bittiä eikä niillä kovin monta erilaista asiaa pysty ilmaisemaan uniikisti.

Tässäkin on käsillä jo kaksi eri FP4-koodausta. Tässä kuvassa puhutaan E2M1(+etumerkki)-koodista ja aiemmassa kuvassa oli E2M2-koodi, jossa on kaksinkertainen määrä numeroarvoja mutta vain positiiviset luvut.
[EDIT] E2M1-koodilla pystyy esittämään esim. luvut +-[0, 0.5, 1, 1.5, 2, 3, 4, 6]

finWeazel sanoi:
Miten esim. nvidian tensoriytimet toimii niin siellä sisässä on jännää toteutusta esim. sisäisten laskutoimitusten suhteen voi olla parempaa tarkkuutta kuin mitä input-output formaatti antaisi olettaa.

Erilaiset skaalaukset on käsittääkseni arkea (ja edellytys) kokonaislukukertoimille. Juuri niiden ansiosta matalabittiset mallit toimivat niinkin hyvin kuin toimivat.

finWeazel · 22.12.2025

takomo sanoi:
On varmasti hyödyllistä, että myös FP4-koodaus on olemassa ja on hienoa, että sille on suora rautatuki, mutta en mitenkään usko, että se mikään vallankumouksellinen formaatti olisi. Neljä bittiä on kuitenkin vain neljä bittiä eikä niillä kovin monta erilaista asiaa pysty ilmaisemaan uniikisti.

Sitä blockihärveliä mikä fp4:en kyljessä tulee ja auttaa tarkkuuden lisäämisessä ei ole olemassa esim. int4 formaatille. Jos tämä asia oikeasti sua kiinnostaa niin niissä aikaisemmissa linkeissä asia on selitetty oikein hyvin. Jos nyt vielä alleviivaten aikaisemmista linkeistä

This structure lets MXFP4 efficiently represent the wide dynamic range found in modern AI models even with only 4 bits per value while keeping storage overhead low. It’s a radical departure from uniform quantization.

ja

This initiative, backed by tech giants including AMD, NVIDIA, Microsoft, Meta, and OpenAI, set out to lower the hardware and compute barriers to cutting-edge AI.

ja

These innovations enabled direct training of massive models in MXFP4 no more need to pre-train in high precision.

Mutta kuten siinä eka viestissäni jo sanoin näitä hyötyjä ei saa täysimääräisesti vain kvantisoimalla fp4:een. Pitää suunnitella ja toteuttaa malli sopivasti että saa maksimaalisen suorituskyvyn. Vanhat mallit eivätkä näitä tietenkään tue mxfp4:sta(tai nvidian termeillä fp4:sta) koska teknologiaa eikä rautaa ollut tovi sitten. Uusistakaan malleista kaikki eivät ole vielä kerenneet mukaan.

takomo · 22.12.2025

finWeazel sanoi:
Sitä blockihärveliä mikä fp4:en kyljessä tulee ja auttaa tarkkuuden lisäämisessä ei ole olemassa esim. int4 formaatille. Jos tämä asia oikeasti sua kiinnostaa niin niissä aikaisemmissa linkeissä asia on selitetty oikein hyvin.

Aika pinnallisesti oli selitetty eikä ollenkaan verrattu int4-pohjaisiin kvantisointeihin. Se, että skaalauksen pystyy tekemään raudalla on kai uutta mutta muuten vastaava metodi on käytössä ainakin Qn_K-kvantisoinneissa:

https://medium.com/@paul.ilvez/demystifying-llm-quantization-suffixes-what-q4-k-m-q8-0-and-q6-k-really-mean-0ec2770f17d3

finWeazel sanoi:
Mutta kuten siinä eka viestissäni jo sanoin näitä hyötyjä ei saa täysimääräisesti vain kvantisoimalla fp4:een. Pitää suunnitella ja toteuttaa malli sopivasti että saa maksimaalisen suorituskyvyn.

Kova paine toki on, että mallit pystyisi kouluttamaan suoraan matalabittisinä mutta onko tässä onnistuttu käytännössäkin?

BoomerX · 24.12.2025

amd adrenalinestakin näköjään löytyy "chat" ominaisuus. joku llama 8b malli.
Tosin en saanut toimimaan. valittelee että ei ole aktiivinen ja seuraavaksi valittaa ettei ole käynnissä.

Eikä tuo taida hyväksyä kuvia ollenkaan liitteeksi. Pelkästää dokumentteja.

esim screenshotit benchmarkeista liitteeksi ja kysyy tekemään lukemista fps graafin vertailua varten ei onnistunut ollenkaan.

Mukava · 26.12.2025

LM Studion kautta tullut ajettue gpt-oss-120b:tä esim Zed:ssä ja VS Codessa. Toiminut varsin mallikkaasti.

Halpuuttaja · 26.12.2025

takomo sanoi:
Olettaisin, että malli on tehty 16-bittisenä ja se on tiivistetty 8/4-bittiseksi avoimeen jakeluun. Luultavasti mallia on optimoitu lisää 8/4-bittisenä, eikä vain pudotettu bittejä pois.

Alkuperäiset Mistralit treenattiin kai 16-bittisenä joo. Sen sijaan Mistral -> Devstral jatkotreenaus on tehty FP8 formaatissa*.

Eli tosiaan Devstralin puristaminen FP4 formaattiin heikentäisi mallia enemmän tai vähemmän. Ja 16-bittiseksi upcastaaminen ei tekisi siitä parempaa.

*) mistralai/Devstral-Small-2-24B-Instruct-2512 · Devstral 2 BF16 weight

takomo · 28.12.2025

Halpuuttaja sanoi:
Alkuperäiset Mistralit treenattiin kai 16-bittisenä joo. Sen sijaan Mistral -> Devstral jatkotreenaus on tehty FP8 formaatissa*.

Eli tosiaan Devstralin puristaminen FP4 formaattiin heikentäisi mallia enemmän tai vähemmän. Ja 16-bittiseksi upcastaaminen ei tekisi siitä parempaa.

*) mistralai/Devstral-Small-2-24B-Instruct-2512 · Devstral 2 BF16 weight

Niinpä näkyy.

Ainahan bittien vähentäminen heikentää mallia. Onkohan näkynyt vertailuja toimiiko FP4 paremmin kuin hyvä Q4-kokonaislukumalli?

finWeazel · 29.12.2025

Nvidia julkaisi uudet nemotron3 mallit. Super ja Ultra versiot opetettu nvfp4:lla. Tällä hetkellä ladattavissa vain pienin nano versio mallista.

Nemotron 3 family consists of three models: Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities

We are releasing the model weights, training recipe, and all the data for which we hold redistribution rights.

These environments and RL datasets are being made available, alongside NeMo Gym, for those interested in using the environments to train their own models.

NVIDIA Nemotron 3 Family of Models

research.nvidia.com

Nemotron 3 introduces several innovations that directly address the needs of agentic systems:

A hybrid Mamba-Transformer MoE backbone for superior test-time efficiency and long-range reasoning.

Multi-environment reinforcement learning designed around real-world agentic tasks.

A 1M-token context length supporting deep multi-document reasoning and long-running agent memory.

An open, transparent training pipeline, including data, weights, and recipes.

Immediate availability of Nemotron 3 Nano with ready-to-use cookbooks. Super and Ultra to follow.

Inside NVIDIA Nemotron 3: Techniques, Tools, and Data That Make It Efficient and Accurate | NVIDIA Technical Blog

Agentic AI systems increasingly rely on collections of cooperating agents—retrievers, planners, tool executors, verifiers—working together across large contexts and long time spans.

developer.nvidia.com

----------------

Edittiä. Testailin 5090:lla nemotron3:sta. On erittäin nopea, vie muistia noin 24GB. Lienee tehty niin että mahtuu 3090/4090:en muistiin eikä vaadi 5090:sta. Ratkaisi adventofcode 2025 luukut 1 ja 2 python koodilla. Luukku3:een teki kurantin näköisen ratkaisun, mutta ei pääse yli algoritmillisesta aidasta niin toteutus liian hidas. Ei riitä aika universumissa vastauksen saavuttamiseksi. Eka malli mikä osoittanut mun kokeiluissa sellaista suorituskykyä ja taitoa että voisi lokaalisti ajatella käyttävänsä pienemmissä koodauskysymyksissä.

finWeazel · tiistaina klo 13:48

Nvidia hypettää tammikuussa tulevaa päivitystä millä peligpu:t saavat lisää suorituskykyä lokaaleita llm:ia ajaessa. Mielenkiintoinen juttu tuo automaattinen keskusmuistiin swappaaminen mistä voi olla paljonkin iloa MoE mallien kanssa jotka eivät mahdu kokonaan muistiin. Toinen kiva juttu nvfp4 tuki kuvagenerointiin.

There are two parts of this update, first is faster LLM performance, offering up to 40% higher performance in LLMs such as GPT-OSS, Nemotron Nano V2, and Sque 3 308

The second part enables native NVFP4 support in ComfyUI Flux.1, Flux.2, and Quen Image. This yields up to a 4.6x gain in performance

NVIDIA Boosts RTX AI PCs With 35% Faster LLM & 3x Faster Creative AI Performance, NVFP4 To Reduce VRAM Usage

NVIDIA continues to add more performance to its RTX AI PCs with features such as NVFP4 and further AI/RTX optimizations.

wccftech.com

Halpuuttaja · tiistaina klo 14:57

finWeazel sanoi:
Mielenkiintoinen juttu tuo automaattinen keskusmuistiin swappaaminen mistä voi olla paljonkin iloa MoE mallien kanssa jotka eivät mahdu kokonaan muistiin.

Tossa slaidissa, jossa mainitaan tämä "RTX-optimized offload to sys mem" on pelkästään kuvamalleja. Ja toi ominaisuus tuli ComfyUI:hin Flux.2 julkkarin yhteydessä marraskuussa.

Veikkaan että Wccftech sotkee asioita yhdistellessään tämän asian LLM:iin.

Kiinnostavaa kyllä nähdä miten NVFP4 formaatin kuva- ja videomallit suoriutuu.

Edit: jep, alkuperäisen lähteen kun lukee niin selviää että Wccftech oli ymmärtänyt väärin, CPU offload asia koski nimenomaan ComfyUI:ta.

Open-Source AI Tool Upgrades Speed Up LLM and Diffusion Models on NVIDIA RTX PCs | NVIDIA Technical Blog

AI developer activity on PCs is exploding, driven by the rising quality of small language models (SLMs) and diffusion models, such as FLUX.2, GPT-OSS-20B, and Nemotron 3 Nano. At the same time…

developer.nvidia.com

Tekoäly omalla koneella

Uutiset

Statistiikka

Hinta.fi

Arvostamme yksityisyyttäsi