我对法学硕士服务和量化非常陌生。任何线索将不胜感激。我正在尝试使用 Autoawq 量化我的模型。我已经安装了以下软件包:
Package Version
------------------ ------------
absl-py 2.0.0
accelerate 0.24.1
aiohttp 3.9.0
aiosignal 1.3.1
annotated-types 0.6.0
anyio 3.7.1
async-timeout 4.0.3
attributedict 0.3.0
attrs 23.1.0
autoawq 0.1.7
blessings 1.7
cachetools 5.3.2
certifi 2022.12.7
chardet 5.2.0
charset-normalizer 2.1.1
click 8.1.7
codecov 2.1.13
colorama 0.4.6
coloredlogs 15.0.1
colour-runner 0.1.1
coverage 7.3.2
DataProperty 1.0.1
datasets 2.15.0
deepdiff 6.7.1
dill 0.3.7
distlib 0.3.7
distro 1.8.0
exceptiongroup 1.1.3
filelock 3.9.0
frozenlist 1.4.0
fsspec 2023.4.0
h11 0.14.0
httpcore 1.0.2
httpx 0.25.1
huggingface-hub 0.19.4
humanfriendly 10.0
idna 3.4
inspecta 0.1.3
Jinja2 3.1.2
joblib 1.3.2
jsonlines 4.0.0
lm-eval 0.3.0
MarkupSafe 2.1.3
mbstrdecoder 1.1.3
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.15
networkx 3.0
nltk 3.8.1
numexpr 2.8.6
numpy 1.24.1
openai 1.3.3
ordered-set 4.1.0
packaging 23.2
pandas 2.0.3
pathvalidate 3.2.0
Pillow 9.3.0
pip 19.3.1
platformdirs 4.0.0
pluggy 1.3.0
portalocker 2.8.2
protobuf 4.25.1
psutil 5.9.6
pyarrow 14.0.1
pyarrow-hotfix 0.5
pybind11 2.11.1
pycountry 22.3.5
pydantic 2.5.1
pydantic-core 2.14.3
pygments 2.17.1
pyproject-api 1.6.1
pytablewriter 1.2.0
python-dateutil 2.8.2
pytz 2023.3.post1
PyYAML 6.0.1
regex 2023.10.3
requests 2.28.1
rootpath 0.1.1
rouge-score 0.1.2
sacrebleu 1.5.0
safetensors 0.4.0
scikit-learn 1.3.2
scipy 1.10.1
sentencepiece 0.1.99
setuptools 41.6.0
six 1.16.0
sniffio 1.3.0
sqlitedict 2.1.0
sympy 1.12
tabledata 1.3.3
tabulate 0.9.0
tcolorpy 0.1.4
termcolor 2.3.0
texttable 1.7.0
threadpoolctl 3.2.0
tokenizers 0.15.0
toml 0.10.2
tomli 2.0.1
torch 2.1.1+cu118
torchaudio 2.1.1+cu118
torchvision 0.16.1+cu118
tox 4.11.3
tqdm 4.66.1
tqdm-multiprocess 0.0.11
transformers 4.35.2
triton 2.1.0
typepy 1.3.2
typing-extensions 4.4.0
tzdata 2023.3
urllib3 1.26.13
virtualenv 20.24.6
xxhash 3.4.1
yarl 1.9.2
zstandard 0.22.0
我正在尝试运行来自 https://github.com/casper-hansen/AutoAWQ:
的示例代码from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
model_path = 'lmsys/vicuna-7b-v1.5'
quant_path = 'vicuna-7b-v1.5-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
# Load model
model = AutoAWQForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# Quantize
model.quantize(tokenizer, quant_config=quant_config)
# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)
但是我收到以下错误:
/usr/test3/lib64/python3.8/site-packages/huggingface_hub/utils/_runtime.py:184: UserWarning: Pydantic is installed but cannot be imported. Please check your installation. `huggingface_hub` will default to not using Pydantic. Error message: '{e}'
warnings.warn(
Traceback (most recent call last):
File "quant.py", line 1, in <module>
from awq import AutoAWQForCausalLM
File "/usr/test3/lib64/python3.8/site-packages/awq/__init__.py", line 2, in <module>
from awq.models.auto import AutoAWQForCausalLM
File "/usr/test3/lib64/python3.8/site-packages/awq/models/__init__.py", line 1, in <module>
from .mpt import MptAWQForCausalLM
File "/usr/test3/lib64/python3.8/site-packages/awq/models/mpt.py", line 1, in <module>
from .base import BaseAWQForCausalLM
File "/usr/test3/lib64/python3.8/site-packages/awq/models/base.py", line 12, in <module>
from awq.quantize.quantizer import AwqQuantizer
File "/usr/test3/lib64/python3.8/site-packages/awq/quantize/quantizer.py", line 11, in <module>
from awq.modules.linear import WQLinear_GEMM, WQLinear_GEMV
File "/usr/test3/lib64/python3.8/site-packages/awq/modules/linear.py", line 4, in <module>
import awq_inference_engine # with CUDA kernels
ImportError: libcudart.so.12: cannot open shared object file: No such file or directory
这是我的 nvidia 配置:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A10 Off | 00000000:17:00.0 Off | 0 |
| 0% 40C P0 59W / 150W | 18106MiB / 23028MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A10 Off | 00000000:31:00.0 Off | 0 |
| 0% 28C P8 21W / 150W | 2MiB / 23028MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A10 Off | 00000000:B1:00.0 Off | 0 |
| 0% 26C P8 20W / 150W | 2MiB / 23028MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A10 Off | 00000000:CA:00.0 Off | 0 |
| 0% 26C P8 20W / 150W | 2MiB / 23028MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
这是 nvcc --version 输出:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
我最近在 Runpod 中使用 AWQ。面临同样的问题。因此,默认情况下使用 nvidia-smi 看到 cuda 版本是 12.3 。 通过使用这些命令安装库解决了该问题。
!pip -q install --upgrade fschat 加速 autoawq vllm
!pip install torch==2.1.0+cu121 torchvision==0.16.0+cu121 torchaudio==2.1.0 torchtext==0.16.0+cpu torchdata==0.7.0 --index-url https:// download.pytorch.org/whl/cu121