我正在尝试通过 @stablequan 为 nanoLLaVA 制作渐变演示。我仅移植 Moondream 存储库中 Apache 2.0 许可代码的结构。
nanoLLaVA 存储库中有示例代码,我用它来制作 this 脚本。这有效并提供了合理的输出。 当我在 gradio here 中使用相同的代码时,我收到有关设备不匹配的错误。
Traceback (most recent call last):
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\gradio\queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\gradio\route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\gradio\blocks.py", line 1561, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\gradio\blocks.py", line 1179, in call_function
prediction = await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\gradio\utils.py", line 678, in wrapper
response = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\Downloads\llm\nanollava\nanollava_gradio_demo.py", line 46, in answer_question
output_ids = model.generate(
^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generation\utils.py", line 1575, in generate
result = self._sample(
^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generation\utils.py", line 2697, in _sample
outputs = self(
^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\.cache\huggingface\modules\transformers_modules\qnguyen3\nanoLLaVA\4a1bd2e2854c6df9c4af831a408b14f7b035f4c0\modeling_llava_qwen2.py", line 2267, in forward
) = self.prepare_inputs_labels_for_multimodal(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\.cache\huggingface\modules\transformers_modules\qnguyen3\nanoLLaVA\4a1bd2e2854c6df9c4af831a408b14f7b035f4c0\modeling_llava_qwen2.py", line 687, in prepare_inputs_labels_for_multimodal
image_features = self.encode_images(images).to(self.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\.cache\huggingface\modules\transformers_modules\qnguyen3\nanoLLaVA\4a1bd2e2854c6df9c4af831a408b14f7b035f4c0\modeling_llava_qwen2.py", line 661, in encode_images
image_features = self.get_model().mm_projector(image_features)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\container.py", line 217, in forward
input = module(input)
^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\linear.py", line 116, in forward
return F.linear(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
现在,如果您以前见过此内容,并通读错误消息,您可能可以立即告诉我,“哦,显然它在不同的线程上运行,并且
set_default_device()
不会延续到该线程”。这里的issue与此相关。我不确定该修复适用于哪个版本。但无论哪种方式,由于默认是“cpu”,如果一切都在 cpu 上,应该没问题吧?
因此,我修改了
AutoModelForCausalLM
调用以使用 device_map='cpu'
,并在执行 .device
之前打印了输入和模型的 model.generate
:
<|im_start|>system
Answer the questions.<|im_end|><|im_start|>user
<image>
What do you see?<|im_end|><|im_start|>assistant
cpu
cpu
cpu
因此,您需要做的是找出一种在该线程上执行 set_default_device 的方法,或者如here所示,您可以尝试 set_default_tensor_type,这很有效!!
# set device
torch.set_default_device('cuda') # or 'cpu'
torch.set_default_tensor_type('torch.cuda.FloatTensor')