Tekoäly omalla koneella

Liittynyt
03.07.2018
Viestejä
368
En löytänyt tekoälykeskustelua täältä!?
Ajatteko tekoälyä omalla koneella ja miksi tai mitä siitä hyötyy ?
Minua kiinnostaisi sellainen tekoäly jolle voisi antaa materiaalina vaikka 400-sivuisen pdf:n / tekstitiedoston tai vaikka .epubin jonka se sitten lukisi pysyvään muistiin, ei vain nykyiseen sessioon.
Tämän jälkeen tekstiin liittyen voisi kysyä kysymyksiä.

Tiedättekö onko tällaista jo ? omalla koneella ajettavaa, ei mitään kk-maksullista pilvipalvelua. Koneesta löytyy kyllä muistia ram+gpu ja kaupastahan sitä saa lisää :D
 
Liippaa sen verta läheltä aihetta, että mainostan PC:llä paikallisesti toimivaa OpenAI Whisper neuroverkkoa. Subtitle Edit käyttää sitä ja tekee (litteroi ja kääntää) kätevästi tekstitykset leffaan ku leffaan. Isoin vika on, että japanista kääntäessä sinä ja minä on aina väärinpäin, mutta en valita. Toimii tarpeeksi hyvin. Ukraina ja Venäjä taipuu kuulemma myös hyvin.
 
Tämä kannattaa myös tsekata, jos löytyy sopiva kone: NVIDIA ChatRTX mitään suurempaa kokemusta ei itsellä tuosta ole, mutta vähän olen kokeillut ja on mm. juuri kuvailemaasi tarkoitukseen sopiva.

Itsellä myös on käytössä tämä ja asennuksen sekä käytön helppous on juuri omiin pikku testailuihin hyvä. Toki täytyy ottaa huomioon että Suomi ei oikein taivu, mutta englanninkielistä aineistoa ja prompteja tämä hallitsee mielestäni yllättävän hyvin. Jos löytyy valmiina RTX30-sarjalainen tai parempi 8gigan muistilla niin eikun kokeilemaan!
 
Olipa lähellä uuden ketjun perustaminen, mutta olihan täällä tällainen viriili ketju...
Millaisia kokemuksia on tekoälymallien ajamisesta paikallisesti, varsinkin jos rauta ei ole ihan parasta mahdollista?

Asentelin tässä llama.cpp:n ja parin hutilatauksen jälkeen löytyi oikeanlainen mallikin ajoon (llama.cpp pitää GGUF-formaatissa olevista malleista). Niitä on sitten löytynyt useampiakin.
https://huggingface.co/models näyttää olevan hyvä sivu etsiä laajalti erilaisia malleja. Onkohan vahvoja suosituksia, mitkä olisivat hyviä?

Kokeiluun valikoitui lähinnä erilaisia ranskalaisen mistral.ai:n malleja. Kun odotukset eivät ole korkealla, niin jopa 2-bittinen Mistral-7B (näemmä vanha kuin taivas) tuottaa yllättävän järkeviä vastauksia
Koodi:
~/src/llama.cpp/build$ bin/llama-cli -ngl 24  -cmoe  -m /opt/ai/mistral/7B-Instruct-v0.3/Q2_K-00001-of-00001.gguf --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model... 


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7372-482211438
model      : Q2_K-00001-of-00001.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> What are the most recent GPUs you know about? How they perform in AI applications?

 As of my last training (2021-03-23), here are some of the most recent GPUs and their performance in AI applications:

1. NVIDIA Ampere Series (2020):
   - NVIDIA GeForce RTX 3080: This GPU has 10GB of GDDR6X memory, 10,240 CUDA cores, and 320 Tensor cores. It's designed for high-performance gaming and AI applications.
   - NVIDIA GeForce RTX 3090: This GPU has 24GB of GDDR6X memory, 10,240 CUDA cores, and 320 Tensor cores. It's designed for high-performance gaming, AI research, and data science.
   - NVIDIA A100: This GPU is part of NVIDIA's data center GPU series. It has 40GB of HBM2 memory, 72 Turing-SoCs, and 480 Tensor cores. It's designed for AI research, data science, and high-performance computing.

In terms of AI performance, these GPUs are designed to accelerate AI training and inference. They support NVIDIA's AI and data science acceleration technologies like Tensor Cores, CUDA-X, and NVIDIA's software-defined architecture. These technologies enable faster AI training, faster data science, and better performance in AI applications.

For example, the NVIDIA A100 delivers up to 28 TFLOPS of AI performance, 102 TFLOPS of mixed-precision Turing-SP performance, and 204 TFLOPS of mixed-precision Turing-FP16 performance. This makes it ideal for AI research, data science, and high-performance computing.

The NVIDIA GeForce RTX 3080 and 3090 are also great for AI applications, offering faster AI training and inference compared to previous generations. They're also great for high-performance gaming.

[ Prompt: 75.4 t/s | Generation: 23.0 t/s ]

> /exit


Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]     | total   free    self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CUDA0 (GTX 1650 SUPER) |  3901 =  597 + (2397 =  1836 +     384 +     177) +         906 |
llama_memory_breakdown_print: |   - Host                   |                 2740 =  2596 +     128 +      16                |

Joulukuussa julkaistu Ministral 3 on paremmin ajan hermolla mutta selvästi hitaampi. Tämän raudan kyvyt alkavat tulla vastaan.
Koodi:
~/src/llama.cpp/build$ bin/llama-cli -ngl 15 -m /opt/ai/mistral/Ministral-3-8B-Instruct-2512-Q4_K_M.gguf -c 4096 --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model... 


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7372-482211438
model      : Ministral-3-8B-Instruct-2512-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> What are the most recent consumer GPUs you know about? How they perform in AI applications?

As of my last knowledge update in **October 2023**, here are some of the most recent and high-performance **consumer-grade GPUs** (as of early 2024) and their relevance to **AI applications**, particularly for tasks like deep learning, inference, and training:

---

### **1. NVIDIA GPUs (Dominant in AI)**
NVIDIA continues to lead in AI acceleration with its **Hopper (H100)** architecture (primarily in data center) and **Ada Lovelace (RTX 40-series)** for consumer/enthusiast GPUs.

#### **Latest Consumer GPUs (2023–2024):**
- **RTX 4090**
  - **Architecture**: Ada Lovelace (4th-gen DLSS, improved Tensor Cores).
  - **Key Features**:
    - **24GB GDDR6X** (huge for AI workloads).
    - **76B Tensor Cores** (for mixed-precision AI training/inference).
    - **DLSS 3** (AI upscaling, useful for generative AI pipelines).
    - **PCIe 5.0** (faster data transfer).
  - **AI Performance**:
    - **Blazing fast for inference** (e.g., LLMs, transformers).
    - Supports **FP8/FP16/BF16** for efficient training.
    - **NVENC/H.266** for AI-generated video.
    - **CUDA Cores**: 16,384 (great for parallel workloads).
  - **Use Cases**: LLM fine-tuning, real-time AI rendering, generative AI (Stable Diffusion, etc.).

- **RTX 4080 Super / RTX 4080**
  - **Architecture**: Ada Lovelace (slightly less VRAM than 4090).
  - **Key Features**:
    - **16GB/12GB GDDR6X** (still strong for AI).
    - **DLSS 3**, **Tensor Cores**, and **NVENC**.
  - **AI Performance**:
    - Good for **inference** (e.g., running LLMs like Llama 2 on local machines).
    - Slightly slower than 4090 in training due to less VRAM.

- **RTX 4070 Ti Super**
  - **Architecture**: Ada Lovelace.
  - **Key Features**:
    - **16GB GDDR6X**.
    - **DLSS 3**, **Tensor Cores**, but fewer CUDA cores (~7,168).
  - **AI Performance**:
    - Decent for **lightweight AI tasks** (e.g., running smaller models like Mistral-7B with quantization).

- **RTX 4070**
  - **Architecture**: Ada Lovelace.
  - **Key Features**:
    - **12GB GDDR6X**.
    - Still supports **DLSS 3** and **Tensor Cores**.
  - **AI Performance**:
    - Limited by VRAM for large models (e.g., Llama 70B may require offloading).

#### **NVIDIA’s AI-Specific GPUs (Data Center):**
While not consumer-grade, these are relevant for AI workloads:
- **NVIDIA H100** (Hopper, **80GB HBM3**):
  - **Best for AI training** (FP8 support, **3x faster than A100** in some cases).
  - Used in **AI supercomputers** (e.g., NVIDIA GH200).
- **L40/L40S** (for enterprise AI inference).

---

### **2. AMD GPUs (Growing in AI Support)**
AMD’s **RDNA 3** and **CDNA 3** architectures are improving, but NVIDIA still dominates in AI acceleration.

#### **Latest Consumer GPUs (2023–2024):**
- **RX 7900 XTX / RX 7900 XT**
  - **Architecture**: RDNA 3.
  - **Key Features**:
    - **24GB/20GB GDDR6**.
    - **FSR 3** (AMD’s alternative to DLSS).
    - **No native Tensor Cores** (but supports **ROCm** for AI).
  - **AI Performance**:
    - **Not optimized for AI** like NVIDIA (no dedicated Tensor Cores).
    - Can run **ROCm-based AI frameworks** (e.g., PyTorch, TensorFlow on Linux).
    - Better for **general computing** than pure AI.

- **RX 7800 XT**
  - **Architecture**: RDNA 3.
  - **Key Features**:
    - **16GB GDDR6**.
    - **FSR 3**, but weaker AI capabilities than NVIDIA.

#### **AMD’s AI GPUs (Data Center):**
- **MI300X** (CDNA 3, **128GB HBM3**):
  - **Designed for AI training** (supports FP8, **10x faster than A100** in some benchmarks).
  - Used in **AI cloud providers** (AWS, Azure).

---

### **3. Intel GPUs (Emerging in AI)**
Intel’s **Arc Alchemist** (GPU) and **Gaudi** (AI accelerator) are gaining traction.

#### **Latest Consumer GPU:**
- **Arc A770 / A750**
  - **Architecture**: Alchemist.
  - **Key Features**:
    - **16GB GDDR6**.
    - **XMX (Xe Matrix Extensions)** for AI (but **not as mature as NVIDIA’s Tensor Cores**).
    - **No native AI acceleration** yet (early-stage support).
  - **AI Performance**:
    - **Not recommended for serious AI work** (yet).
    - May improve with **oneAPI** and **ROCm** support.

#### **Intel’s AI Accelerators:**
- **Gaudi 2** (for data centers):
  - **Specialized for AI training** (used by **Meta, Microsoft**).
  - **No consumer version yet**.

---

### **Performance Comparison for AI Workloads**
| GPU               | VRAM  | Tensor Cores | AI Optimization | Best For                          |
|-------------------|-------|--------------|------------------|-----------------------------------|
| **RTX 4090**      | 24GB  | 76B          | Excellent        | LLM training/inference, generative AI |
| **RTX 4080 Super**| 16GB  | 68B          | Very Good        | Inference, smaller models         |
| **RTX 4070 Ti**   | 16GB  | 58B          | Good             | Lightweight AI tasks              |
| **RX 7900 XTX**   | 24GB  | None         | Poor (ROCm)      | General computing, not AI         |
| **Arc A770**      | 16GB  | XMX (early)  | Limited          | Not for AI (yet)                  |
| **NVIDIA H100**   | 80GB  | 160B         | Best (FP8)       | Enterprise AI training             |

---

### **Key Takeaways for AI Applications**
1. **For AI Training/Inference**:
   - **NVIDIA RTX 4090** is the **best consumer GPU** for AI (LLMs, Stable Diffusion, etc.).
   - **RTX 4080 Super** is a good alternative if you need slightly less VRAM.
   - **NVIDIA H100** (data center) is **unmatched** for large-scale training.

2. **For Inference (Running Models)**:
   - **Tensor Cores** (NVIDIA) + **DLSS 3** (for AI-generated content) make RTX 40-series ideal.
   - **FP16/BF16/FP8** support helps with efficiency.

3. **For General AI (Non-NVIDIA)**:
   - **AMD RX 7900 XTX** can run **ROCm-based AI** but is **not optimized** like NVIDIA.
   - **Intel Arc** is **not recommended** for AI yet.

4. **Future Trends**:
   - **NVIDIA’s Blackwell (B100)** (expected late 2024) will push AI performance further.
   - **AMD’s CDNA 4** and **Intel’s future GPUs** may close the gap.

---
### **Recommendations**
- **For serious AI work (training/inference)**: **RTX 4090** (if budget allows) or **RTX 4080 Super**.
- **For lightweight AI (inference only)**: **RTX 4070 Ti** or **RTX 4070**.
- **For general computing + some AI**: **RX 7900 XTX** (but expect slower AI performance).
- **For enterprise AI**: **NVIDIA H100** or **AMD MI300X**.

Would you like benchmarks for specific AI tasks (e.g., running Llama 2, Stable Diffusion)?

[ Prompt: 302.5 t/s | Generation: 9.9 t/s ]

Kun mallin koko kasvaa moninkertaiseksi GPU:n muistiin nähden, suorituskyky hiipuu merkittävästi. Tulokset voivat kyllä olla tarkempia kuin pienemmällä mallilla.
~/src/llama.cpp/build$ bin/llama-cli -ngl 6 -cmoe -m /opt/ai/mistral/Devstral-Small-2-24B-Instruct-2512-Q4_1.gguf --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model...


▄▄ ▄▄
██ ██
██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
██ ██
▀▀ ▀▀

build : b7372-482211438
model : Devstral-Small-2-24B-Instruct-2512-Q4_1.gguf
modalities : text

available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read add a text file


> What are the most recent consumer GPUs you know about? How they perform in AI applications?

As of my last knowledge update in October 2023, the most recent consumer GPUs available were from NVIDIA's RTX 40 series and AMD's RX 7000 series. However, since my knowledge cutoff is 2023-10-01, I don't have information on any GPUs released after that date. For the most up-to-date information, I would recommend checking the latest releases from NVIDIA and AMD.

### Recent Consumer GPUs (as of 2023):
1. **NVIDIA RTX 40 Series**:
- **RTX 4090**: Flagship model with excellent performance in both gaming and AI tasks.
- **RTX 4080**: High-end performance, suitable for AI workloads.
- **RTX 4070**: Mid-range, still capable for AI tasks but with some limitations.
- **RTX 4060**: Entry-level for AI, better suited for lighter tasks.

2. **AMD RX 7000 Series**:
- **RX 7900 XTX**: Competitive with NVIDIA's offerings in raw performance.
- **RX 7900 XT**: Slightly lower performance but still strong.
- **RX 7800 XT**: Mid-range, good for AI but not as powerful as the higher-end models.

### Performance in AI Applications:
- **NVIDIA GPUs** are generally preferred for AI tasks due to their CUDA cores and strong support for AI frameworks like TensorFlow and PyTorch. The RTX 40 series, in particular, offers significant improvements in AI performance thanks to features like Tensor Cores and DLSS (Deep Learning Super Sampling).
- **AMD GPUs** have made strides in AI performance with their ROCm (Radeon Open Compute) platform, but they still lag behind NVIDIA in terms of software support and optimization for AI workloads.

For the most recent GPUs released after October 2023, I would need to use tools to fetch the latest information. Would you like me to do that?

[ Prompt: 104.1 t/s | Generation: 2.9 t/s ]

> /exit


Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | - CUDA0 (GTX 1650 SUPER) | 3901 = 57 + ( 2874 = 1987 + 96 + 791) + 968 |
llama_memory_breakdown_print: | - Host | 12750 = 12188 + 544 + 18 |

Joistakin malleista on myös pohdiskelevia, "Reasoning"-versioita. Kannattaa ehkä harkita, haluaako sellaista ainakaan yleiskäyttöön. Tulokset on ehkä tarpeettomankin monisanaisia, ja niitä saa odottaa kauan ellei raudassa riitä voima.
Koodi:
~/src/llama.cpp/build$ bin/llama-cli -ngl 10 -m /opt/ai/mistral/Ministral-3-14B-Reasoning-2512-Q4_K_M.gguf -c 4096 --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model... 


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7372-482211438
model      : Ministral-3-14B-Reasoning-2512-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> Please count from 1 to 10

Okay, the user wants me to count from 1 to 10. That's straightforward. I need to list the numbers in order from 1 to 10. I'll start by writing down the numbers one by one, ensuring that I don't skip any or go out of order. I'll double-check to make sure I've included all numbers correctly.

Let me start:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

I think that's correct. I'll format this in Markdown to make it clear and easy to read. I'll also ensure that the response is concise and to the point, as the user only asked for a simple count. I won't add any unnecessary explanations or fluff, but I'll make sure it's clear and accurate.

Here's how I'll phrase it:

"1, 2, 3, 4, 5, 6, 7, 8, 9, 10."

But perhaps it's better to list them on separate lines for clarity, especially if the user wants to see each number distinctly. I'll try that:

1
2
3
4
5
6
7
8
9
10

Yes, that looks better. It's more readable, especially if the user wants to see each number on its own line. I'll go with that format. I'll also make sure that the response is in a code block to maintain the formatting, but since it's a simple list, it might not be necessary. However, using a code block can sometimes make it clearer that this is the exact output.

But since it's a simple count, perhaps just listing them with line breaks is sufficient. I'll proceed with that. I'll also double-check that I haven't missed anything or made any mistakes in the sequence.

Let me verify the sequence:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

Yes, that's correct. I'll write it out with line breaks now.1
2
3
4
5
6
7
8
9
10

[ Prompt: 127.6 t/s | Generation: 5.8 t/s ]

> /exit


Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]     | total   free    self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CUDA0 (GTX 1650 SUPER) |  3901 =  233 + (2736 =  1785 +     160 +     791) +         931 |
llama_memory_breakdown_print: |   - Host                   |                 6562 =  6064 +     480 +      18                |

Onkohan raadilla näkemyksiä miten pakattua dataa malleissa kannattaa käyttää kotioloissa? Näissä data oli pakattu 4-bittiseksi paitsi ensimmäinen oli 2-bittinen.

Komentorivi on vähän karu käyttöliittymä. Olisiko tähän jotain hyviä GUI-pohjaisia ehdotuksia?
 

Uusimmat viestit

Statistiikka

Viestiketjuista
295 132
Viestejä
5 046 142
Jäsenet
80 814
Uusin jäsen
Pelaaja44

Hinta.fi

Back
Ylös Bottom