我正在使用 Llama2[7b 模型]-huggingface 和 lang-chain 来做一个简单的地址隔离/分类任务。我希望模型从输入字符串中找到城市、州和国家/地区。我希望我的答案/查询以特定方式格式化以用于问答/文本生成任务。我知道我可以使用 FewShotPromptTemplate,其中我可以向 LLM 展示一些示例并以我想要的格式获得输出。
我生成了一些示例作为样本:
examples = [
{"input": "Plot No. 7, Sector 22, Noida, Uttar Pradesh, 201301, India",
"Address": "Plot No. 7, Sector 22, Noida",
"City" : "Noida",
"State" : "Uttar Pradesh",
"Country" : "India"},
{"input": "Banjara Hills, Telangana, 500034, India",
"Address": "Banjara Hills",
"City" : "Not present",
"State" : "Telangana",
"Country" : "India"},
]
我设置了模板
example_formatter_template = """
input: {input},
Address : {Address},
City : {Address},
State : {State},
Country : {Country},
\n
"""
# prompt
example_prompt = PromptTemplate(
input_variables=["input", "Address", "City", "State", "Country"],
template=example_formatter_template)
few_shot_prompt = FewShotPromptTemplate(
examples=examples,
example_prompt=example_prompt,
prefix="What is the address, city, state, country in the string : ",
suffix="input: {input}\n ",
input_variables=["input"],
example_separator="\n")
chain = LLMChain(llm=llm, prompt=few_shot_prompt, verbose = True)
# Run the chain only specifying the input variable.
print(chain.run("B-12, Gandhi Colony, Bhopal, Madhya Pradesh, 462016, India"))
这是我想要的示例:
{"input": "B-12, Gandhi Colony, Bhopal, Madhya Pradesh, 462016, India",
"Address": "B-12, Gandhi Colony",
"City" : "Bhopal",
"State" : "Madhya Pradesh",
"Country" : "India"},
我不断收到:正确格式化模型中的预期输出。因此什么也没有返回。
此外,我想防止模型添加任何额外的信息 不存在于上下文/字符串中,否则查询需要很长时间才能响应。 即如果城市或州或国家不存在于刺中,则返回“”或未找到。
有人可以帮忙吗?
将示例和格式更改为以下内容后尝试:
{"input": "Banjara Hills, Telangana, 500034, India",
"output":
"""
Address: Banjara Hills
City: Not present
State: Telangana
Country: India
"""
},
也可以参考这个 https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/few_shot_examples