Llama 2 本地 AI 使用 CPU 而不是 GPU - i5 第 10 代、RTX 3060 Ti、48GB RAM

Question

我的设置配备了 Intel i5 第 10 代处理器、NVIDIA RTX 3060 Ti GPU 和 48GB RAM，运行频率为 3200MHz，Windows 11。我最近从 TheBloke 下载了 LLama 2 模型，但 AI 似乎正在利用我的CPU 而不是我的 GPU。

是否需要更改配置或设置才能使 LLama 2 Local AI 使用我的 GPU 而不是 CPU 进行处理？我想充分利用 GPU 的功能来获得更好的性能。

AI 给出的一些答案不合时宜，就好像它很疯狂，或者 AI 在答案中间割断了自己。

任何指导或故障排除步骤将不胜感激。预先感谢！

import speech_recognition as sr
import pyttsx3
import warnings
from llama_cpp import Llama

warnings.filterwarnings("ignore")

# Initialize the speech recognition recognizer
recognizer = sr.Recognizer()

# Initialize the text-to-speech engine
engine = pyttsx3.init()

# Initialize the wake-up word and sleep mode word
wake_up_word = "assistant"
sleep_mode_word = "mute"

# Load the large language model file
LLM = Llama(model_path="D:\VoiceAssisant\llama-2-13b-chat.Q6_K.gguf", f16_kv=True)

# Create a function to listen for the wake-up word and start listening for user input
def listen():
    asleep = True  # Initialize sleep mode state

    while True:
        print("Listening for wake up word...")
        with sr.Microphone() as source:
            audio = recognizer.listen(source)
            try:
                text = recognizer.recognize_google(audio)
                if text.lower() == wake_up_word:
                    print("Listening for query...")
                    asleep = False  # Exit sleep mode
                    while True:
                        with sr.Microphone() as source:
                            audio = recognizer.listen(source)
                            try:
                                input_text = recognizer.recognize_google(audio)
                                if input_text.lower() == sleep_mode_word:
                                    print("Sleep mode activated...")
                                    asleep = True  # Enter sleep mode
                                    break
                                else:
                                    prompt = "Q: " + input_text + " A:"
                                    output = LLM(prompt, max_tokens=256, stop=["Q:", "\n"], echo=True)
                                    response_text = output["choices"][0]["text"]
                                    print(response_text)
                                    engine.say(response_text)
                                    engine.runAndWait()
                            except sr.UnknownValueError:
                                pass
                elif text.lower() == sleep_mode_word:
                    print("Sleep mode activated...")
                    asleep = True  # Enter sleep mode
            except sr.UnknownValueError:
                pass

# Start listening for the wake-up word
listen()

我尝试安装一些 Nvidia 的程序，但似乎我想安装的程序仅适用于 Linux。

Answer 1

大多数 Nvidia 3060Ti GPU 仅具有 8GB VRAM。您选择的型号“llama-2-13b-chat.Q6_K.gguf”具有 10.68 GB 大小和 13.18 GB 最大 RAM 要求，不适合您的 GPU 的 VRAM。尝试使用较小的模型，例如“llama-2-13b-chat.Q2_K.gguf”，大小为 5.43 GB，最大 RAM 要求为 7.93 GB。

Llama 2 本地 AI 使用 CPU 而不是 GPU - i5 第 10 代、RTX 3060 Ti、48GB RAM

问题描述投票：0回答：1

1个回答

最新问题

Llama 2 本地 AI 使用 CPU 而不是 GPU - i5 第 10 代、RTX 3060 Ti、48GB RAM

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1