OpenAI公司基于GPT模型的ChatGPT风光无两,眼看它起朱楼,眼看它宴宾客,FaceBook终于坐不住了,发布了同样基于LLM的人工智能大语言模型LLaMA,号称包含70亿、130亿、330亿和650亿这4种参数规模的模型,参数是指神经网络中的权重和偏置等可调整的变量,用于训练和优化神经网络的性能,70亿意味着神经网络中有70亿个参数,由此类推。

在一些大型神经网络中,每个参数需要使用32位或64位浮点数进行存储,这意味着每个参数需要占用4字节或8字节的存储空间。因此,对于包含70亿个参数的神经网络,其存储空间将分别为8 GB或12GB。

此外,神经网络的大小不仅取决于参数的数量,还取决于神经元的数目,层数和其他结构参数等。因此,70亿的神经网络可能会占用更多的存储空间,具体取决于网络的结构和实现细节。

因此这种体量的模型单机跑绝对够我们喝一壶,所以本次使用最小的LLaMA 7B模型进行测试。

LLaMA项目安装和模型配置

和Stable-Diffusion项目如出一辙,FaceBook开源的LLaMA项目默认写死使用cuda模式,这也就意味着必须有 NVIDIA 的 GPU来训练和运行,不过好在大神GeorgiGerganov 用 C++ 基于 LLaMA 项目重写了一个跑在 CPU 上的移植版本 llama.cpp应用。

llama.cpp首先适配的就是苹果的M系列芯片,这对于果粉来说无疑是一个重大利好,首先通过命令拉取C++版本的LLaMA项目:

git clone https://github.com/ggerganov/llama.cpp

随后进入项目目录:

llama.cpp

在项目中,需要单独建立一个模型文件夹models:

mkdir models

随后去huggingface官网下载LLaMA的7B模型文件:https://huggingface.co/nyanko7/LLaMA-7B/tree/main

是的,主模型文件已经达到了13.5gb之巨,如果本地硬盘空间告急,请谨慎下载。

随后在models目录建立模型子目录7B:

mkdir 7B

将tokenizer.model和tokenizer_checklist.chk放入和7B平行的目录中:

➜  models git:(master) ✗ ls
7B tokenizer.model tokenizer_checklist.chk

随后将checklist.chk consolidated.00.pth和params.json放入7B目录中:

➜  7B git:(master) ✗ ls
checklist.chk consolidated.00.pth params.json

至此,模型就配置好了。

LLaMA模型转换

由于我们没有使用FaceBook的原版项目,所以它的模型还需要进行转换,也就是转换为当前C++版本的LLaMA可以运行的模型。

这里通过Python脚本进行转换操作:

python3 convert-pth-to-ggml.py models/7B/ 1

第一个参数是模型所在目录,第二个参数为转换时使用的浮点类型,使用 float32,转换的结果文件会大一倍,当该参数值为 1时,则使用 float16 这个默认值,这里我们使用默认数据类型。

程序输出:

➜  llama.cpp git:(master) ✗ python convert-pth-to-ggml.py models/7B/ 1
{'dim': 4096, 'multiple_of': 256, 'n_heads': 32, 'n_layers': 32, 'norm_eps': 1e-06, 'vocab_size': -1}
n_parts = 1 Processing part 0 Processing variable: tok_embeddings.weight with shape: torch.Size([32000, 4096]) and type: torch.float16
Processing variable: norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: output.weight with shape: torch.Size([32000, 4096]) and type: torch.float16
Processing variable: layers.0.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.0.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.0.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.0.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.0.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.0.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.0.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.0.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.0.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.1.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.1.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.1.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.1.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.1.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.1.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.1.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.1.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.1.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.2.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.2.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.2.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.2.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.2.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.2.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.2.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.2.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.2.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.3.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.3.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.3.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.3.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.3.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.3.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.3.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.3.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.3.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.4.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.4.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.4.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.4.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.4.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.4.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.4.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.4.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.4.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.5.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.5.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.5.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.5.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.5.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.5.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.5.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.5.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.5.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.6.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.6.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.6.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.6.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.6.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.6.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.6.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.6.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.6.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.7.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.7.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.7.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.7.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.7.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.7.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.7.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.7.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.7.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.8.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.8.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.8.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.8.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.8.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.8.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.8.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.8.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.8.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.9.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.9.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.9.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.9.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.9.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.9.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.9.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.9.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.9.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.10.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.10.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.10.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.10.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.10.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.10.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.10.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.10.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.10.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.11.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.11.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.11.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.11.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.11.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.11.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.11.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.11.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.11.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.12.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.12.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.12.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.12.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.12.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.12.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.12.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.12.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.12.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.13.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.13.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.13.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.13.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.13.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.13.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.13.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.13.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.13.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.14.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.14.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.14.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.14.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.14.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.14.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.14.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.14.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.14.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.15.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.15.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.15.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.15.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.15.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.15.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.15.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.15.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.15.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.16.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.16.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.16.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.16.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.16.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.16.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.16.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.16.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.16.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.17.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.17.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.17.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.17.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.17.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.17.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.17.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.17.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.17.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.18.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.18.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.18.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.18.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.18.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.18.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.18.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.18.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.18.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.19.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.19.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.19.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.19.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.19.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.19.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.19.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.19.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.19.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.20.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.20.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.20.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.20.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.20.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.20.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.20.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.20.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.20.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.21.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.21.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.21.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.21.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.21.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.21.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.21.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.21.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.21.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.22.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.22.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.22.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.22.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.22.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.22.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.22.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.22.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.22.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.23.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.23.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.23.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.23.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.23.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.23.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.23.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.23.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.23.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.24.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.24.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.24.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.24.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.24.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.24.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.24.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.24.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.24.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.25.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.25.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.25.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.25.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.25.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.25.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.25.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.25.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.25.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.26.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.26.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.26.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.26.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.26.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.26.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.26.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.26.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.26.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.27.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.27.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.27.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.27.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.27.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.27.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.27.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.27.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.27.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.28.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.28.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.28.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.28.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.28.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.28.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.28.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.28.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.28.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.29.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.29.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.29.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.29.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.29.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.29.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.29.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.29.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.29.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.30.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.30.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.30.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.30.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.30.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.30.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.30.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.30.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.30.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.31.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.31.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.31.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.31.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.31.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.31.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.31.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.31.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.31.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Done. Output file: models/7B//ggml-model-f16.bin, (part 0)

可以看到,如果转换成功,会在models/7B/目录生成一个C++可以调用的ggml-model-f16.bin模型文件。

LLaMA模型调用

接下来就可以调用转换后的模型了,首先在编译C++项目:

make

程序返回:

➜  llama.cpp git:(master) ✗ make
I llama.cpp build info:
I UNAME_S: Darwin
I UNAME_P: arm
I UNAME_M: arm64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread
I LDFLAGS: -framework Accelerate
I CC: Apple clang version 14.0.0 (clang-1400.0.29.202)
I CXX: Apple clang version 14.0.0 (clang-1400.0.29.202) cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread -c utils.cpp -o utils.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread main.cpp ggml.o utils.o -o main -framework Accelerate
./main -h
usage: ./main [options] options:
-h, --help show this help message and exit
-i, --interactive run in interactive mode
-ins, --instruct run in instruction mode (use with Alpaca models)
-r PROMPT, --reverse-prompt PROMPT
in interactive mode, poll user input upon seeing PROMPT (can be
specified more than once for multiple prompts).
--color colorise output to distinguish prompt and user input from generations
-s SEED, --seed SEED RNG seed (default: -1)
-t N, --threads N number of threads to use during computation (default: 4)
-p PROMPT, --prompt PROMPT
prompt to start generation with (default: empty)
--random-prompt start with a randomized prompt.
-f FNAME, --file FNAME
prompt file to start generation.
-n N, --n_predict N number of tokens to predict (default: 128)
--top_k N top-k sampling (default: 40)
--top_p N top-p sampling (default: 0.9)
--repeat_last_n N last n tokens to consider for penalize (default: 64)
--repeat_penalty N penalize repeat sequence of tokens (default: 1.3)
-c N, --ctx_size N size of the prompt context (default: 512)
--ignore-eos ignore end of stream token and continue generating
--memory_f16 use f16 instead of f32 for memory key+value
--temp N temperature (default: 0.8)
-b N, --batch_size N batch size for prompt processing (default: 8)
-m FNAME, --model FNAME
model path (default: models/llama-7B/ggml-model.bin) c++ -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread quantize.cpp ggml.o utils.o -o quantize -framework Accelerate

编译成功后,本地会生成一个main.cpp文件。

随后根据编译后输出的说明文档直接调用模型即可:

./main -m ./models/7B/ggml-model-f16.bin -p 'Hi i am '

程序输出:

➜  llama.cpp git:(master) ✗ ./main -m ./models/7B/ggml-model-f16.bin -p 'hi i am'
main: seed = 1679400707
llama_model_load: loading model from './models/7B/ggml-model-f16.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 1
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 13365.09 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-f16.bin'
llama_model_load: .................................... done
llama_model_load: model size = 12853.02 MB / num tensors = 291 system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | main: prompt: ' hi i am'
main: number of tokens in prompt = 6
1 -> ''
13450 -> ' hi'
423 -> 'i'
25523 -> ' am' sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000 hi i am a pythoner, but sunk to become a ruby

说实话,推理速度实在不敢恭维,也可能是因为笔者的电脑配置太渣导致。

结语

LLaMA 7B模型总体上需要纯英文的提示词(prompt),对中文的理解能力还不够,优势是确实可以单机跑起来,当然本地跑的话,减少了网络传输数据的环节,推理效率自然也就更高,对于普通的AI爱好者来说,足矣。

本地推理,单机运行,MacM1芯片系统基于大语言模型C++版本LLaMA部署“本地版”的ChatGPT的更多相关文章

  1. Atitit.播放系统规划新版本 and 最近版本回顾 v3  pbf.doc  1 版本11 (ing)41.1 规划h5本地缓存系列 41.2 Android版本app41.3 双类别系统,

    Atitit.播放系统规划新版本 and 最近版本回顾 v3  pbf.doc 1 版本11 (ing)4 1.1 规划h5本地缓存系列 4 1.2 Android版本app4 1.3 双类别系统, ...

  2. 企业实战|基于Cobbler实现多版本系统批量部署

    前言 运维自动化在生产环境中占据着举足轻重的地位,尤其是面对几百台,几千台甚至几万台的服务器时,仅仅是安装操作系统,如果不通过自动化来完成,根本是不可想象的.记得前面我们探究了基于PXE实现系统全自动 ...

  3. K8S 使用Minikube搭建Kubernetes(K8S)~单机运行Kubernetes~适用于快速学习

    在一台主机上运行起来的Kubernetes,仅适用于学习!~~~ 系统版本:CentOS Linux release 7.6.1810 (Core) 软件版本:Docker-ce-18.06.0.Ku ...

  4. win7以管理员身份运行bat提示系统找不到指定的路径。

    windows7“以管理员身份运行”bat提示“系统找不到指定的路径.” 使用批处理安装服务,直接双击运行没有权限,右键“以管理员身份运行”却提示“系统找不到指定的路径.”,反复查看路径是正确的. 打 ...

  5. iOS开发——运行时OC篇&使用运行时获取系统的属性:使用自己的手势修改系统自带的手势

    使用运行时获取系统的属性:使用自己的手势修改系统自带的手势 有的时候我需要实现一个功能,但是没有想到很好的方法或者想到了方法只是那个方法实现起来太麻烦,一或者确实为了装逼,我们就会想到iOS开发中最牛 ...

  6. sublime text3在指定浏览器上本地服务器(localhost)运行文件(php)

    昨天在使用sublime text3时,发现能在本地服务器上运行php文件,于是百度了一下有关知识, 终于成功了,今天总结一下. 首先要让sublime text3 出现侧边栏sidebar,不会的可 ...

  7. javaCV开发详解之4:转流器实现(也可作为本地收流器、推流器,新增添加图片及文字水印,视频图像帧保存),实现rtsp/rtmp/本地文件转发到rtmp流媒体服务器(基于javaCV-FFMPEG)

    javaCV系列文章: javacv开发详解之1:调用本机摄像头视频 javaCV开发详解之2:推流器实现,推本地摄像头视频到流媒体服务器以及摄像头录制视频功能实现(基于javaCV-FFMPEG.j ...

  8. 关于在64位win7下运行Virtualbox安装系统时出错(提示VBoxDD.DLL错误)的解决方

    安装没有问题,安装了最新版VirtualBox-4.3.18-96516-Win,一点运行想安装系统时就出错. 这是提示的错误: 运行Virtualbox去安装系统时出错:Failed to open ...

  9. telint---切换当前正在运行的Linux系统的运行等级

    telint命令用于切换当前正在运行的Linux系统的运行等级 Send control commands to the init daemon. --help Show this help --no ...

  10. [转]sublime text3在指定浏览器上本地服务器(localhost)运行文件(php)

    昨天在使用sublime text3时,发现能在本地服务器上运行php文件,于是百度了一下有关知识, 终于成功了,今天总结一下. 首先要让sublime text3 出现侧边栏sidebar,不会的可 ...

随机推荐

  1. labuladong数据结构

    缓存淘汰算法:LRU①.LFU② BST③ 完全二叉树④ 序列化和反序列化二叉树⑤ 最近公共祖先⑥ 单调栈⑦ 单调队列⑧ 递归反转链表⑨ k个一组反转链表

  2. redis - 常用方法封装总结

    package com.citydo.utils; import org.springframework.data.redis.connection.DataType; import org.spri ...

  3. 一个线程池的c++实现

    前面我们实现了CallBack类,实现了对任意可调用对象的封装,且统一了调用接口. 现在利用CallBack类,我们来实现一个线程池,我们的线程池包含: 1. 状态机, 用于控制和管理线程池的运行.停 ...

  4. lua按某些键排序的方法

    function sort(list, ...) local opts = {...}; local len = #opts; return table.sort(list, function(a, ...

  5. 20200925--矩阵乘法(奥赛一本通P94 多维数组)

    计算两个矩阵的乘法.n*m阶的矩阵A乘以m*k阶的矩阵B得到的矩阵C是n*k阶的,且C[i][j]=A[i][0]*B[0][j]+A[i][1]*B[1][j]+...+A[i][m-1]*B[m- ...

  6. CentOS7/6 关闭防火墙(转载)

    CentOS6关闭防火墙使用以下命令, //临时关闭 service iptables stop //禁止开机启动 chkconfig iptables off CentOS7中若使用同样的命令会报错 ...

  7. C# net core 从文件流中获取文件头、匹配文件类型

    常用文件的文件头如下: (以前六位为准) JPEG (jpg),文件头:FFD8FF PNG (png),文件头:89504E47 GIF (gif),文件头:47494638 TIFF (tif), ...

  8. vue2项目引入新版ant-design-vue报错问题

    vue2项目引入3.2.14版ant-design-vue会报1600多个编译错误,纯属版本问题,但3.2.14版本卸载会出错,需要删除项目重建,重建后搜索依赖ant-design-vue-fixed ...

  9. jmeter&badboy安装

    一.jmeter下载地址: 1. http://jmeter.apache.org/download_jmeter.cgi   \  https://www.apache.org/dist/jmete ...

  10. Vue+SSM+Element-Ui实现前后端分离(3)

    前言:经过博文(1)vue搭建,(2)ssm搭建,到这里就该真真切切的实现小功能了<-_-> 规划:实现登录,用户列表查询,访问日志aop; 开始先解决跨域:在目录config/index ...