LlamaIndex(二)——LlamaIndex Models

一、LLMs

LLM是LlamaIndex的核心组件。它们可以作为独立的模块使用，也可以插入到LlamaIndex的其他核心模块中（例如索引、检索器、查询引擎）。在响应合成步骤中始终使用LLM（例如检索之后）。根据所使用的索引类型，LLM也可能在索引构建、插入和查询遍历过程中使用。

LlamaIndex提供了一个统一的接口来定义LLM模块，无论是来自OpenAI、Hugging Face还是LangChain。这个接口包括以下几个方面：

支持 text completion和chat
支持 streaming和non-streaming
支持同步synchronous和异步synchronous

1.1 LLM的使用

安装依赖

1	`pip install llama-index-llms-openai`

1.1.1 Text Completion例子

from llama_index.llms.openai import OpenAI

# non-streaming
resp = OpenAI().complete("Paul Graham is ")
print(resp)

# using streaming endpoint
from llama_index.llms.openai import OpenAI

llm = OpenAI()
resp = llm.stream_complete("Paul Graham is ")
for delta in resp:
    print(delta, end="")

1.1.2 Chat例子

from llama_index.core.llms import ChatMessage
from llama_index.llms.openai import OpenAI

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = OpenAI().chat(messages)
print(resp)

输出

1	`assistant: Ahoy matey! The name's Captain Rainbowbeard! Aye, I be a pirate with a love for all things colorful and bright, from me beard to me ship's sails. What can I do for ye today, me hearty?`

1.1.3 Tokenization

默认情况下，LlamaIndex使用一个全局的标记器来计算所有的token。这个默认设置是来自tiktoken的cl100k，这是与默认的大型语言模型（LLM）gpt-3.5-turbo相匹配的标记器。

如果您更改了LLM，您可能需要更新这个标记器，以确保准确的token计数、chunking和prompting。

对标记器的唯一要求是它是一个可调用的函数，它接受一个字符串，并返回一个列表。

可以像这样设置一个全局标记器：

from llama_index.core import Settings

# tiktoken
import tiktoken

Settings.tokenizer = tiktoken.encoding_for_model("gpt-3.5-turbo").encode

# huggingface
from transformers import AutoTokenizer

Settings.tokenizer = AutoTokenizer.from_pretrained(
    "HuggingFaceH4/zephyr-7b-beta"
)

1.2 自定义 LLM

可以定义模型名，模型最大输出token。以及模型更精细的控制，从上下文窗口（context window）到块重叠（chunk overlap）。

1.2.1 自定义LLM，输出token，上下文窗口

如果使用 langchain 中的其他 LLM 类，可以通过Settings显式配置 context_window 和 num_output。

from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

# define global LLM
Settings.llm = OpenAI(temperature=0, model="gpt-4-turbo", max_tokens=512)

# set context window
Settings.context_window = 4096
# set number of output tokens
Settings.num_output = 256

1.2.1 使用HuggingFace LLM

LlamaIndex 支持直接使用 HuggingFace 的 LLM。如果想要保证数据安全，最好Embedding模型也在本地搭建。HuggingFace 的许多开源模型在每个prompt之前都需要system_prompt。此外，查询时可能需要对 query_str 进行额外的包装。所有这些信息可以在使用的模型的 HuggingFace Model Card页面中获得。

下边的例子使用了system_prompt 和 query_wrapper_prompt，具体信息在HuggingFace

from llama_index.llms.huggingface import HuggingFaceLLM

def messages_to_prompt(messages):
    prompt = ""
    for message in messages:
        if message.role == 'system':
        prompt += f"<|system|>\n{message.content}</s>\n"
        elif message.role == 'user':
        prompt += f"<|user|>\n{message.content}</s>\n"
        elif message.role == 'assistant':
        prompt += f"<|assistant|>\n{message.content}</s>\n"

    # ensure we start with a system prompt, insert blank if needed
    if not prompt.startswith("<|system|>\n"):
        prompt = "<|system|>\n</s>\n" + prompt

    # add final assistant prompt
    prompt = prompt + "<|assistant|>\n"

    return prompt

def completion_to_prompt(completion):
    return f"<|system|>\n</s>\n<|user|>\n{completion}</s>\n<|assistant|>\n"

import torch
from transformers import BitsAndBytesConfig
from llama_index.core.prompts import PromptTemplate
from llama_index.llms.huggingface import HuggingFaceLLM

# quantize to save memory
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

llm = HuggingFaceLLM(
    model_name="HuggingFaceH4/zephyr-7b-beta",
    tokenizer_name="HuggingFaceH4/zephyr-7b-beta",
    context_window=3900,
    max_new_tokens=256,
    model_kwargs={"quantization_config": quantization_config},
    generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    device_map="auto",
)

response = llm.complete("What is the meaning of life?")
print(str(response))

输出

This is a question that has been asked for centuries, and there is no one definitive answer. However, there are many different perspectives and philosophies that offer insights into this question.

One perspective is that the meaning of life is to find happiness and fulfillment. This can be achieved through various means, such as pursuing one's passions, cultivating meaningful relationships, and contributing to society in a positive way.

Another perspective is that the meaning of life is to serve a higher purpose or to fulfill a divine plan. This can involve following a particular religious or spiritual path, or simply living a life that is in alignment with one's values and beliefs.

A third perspective is that the meaning of life is to learn and grow, both as individuals and as a society. This can involve seeking out knowledge and understanding, as well as working to improve the world around us.

Ultimately, the meaning of life is a deeply personal and subjective question. Each individual must find their own answers and live their lives in accordance with their own values and beliefs.

token_type_ids tokenizer经常会导致模型错误，可以通过以下方法解决：

HuggingFaceLLM(
    # ...
    tokenizer_outputs_to_remove=["token_type_ids"]
)

1.2.2 使用自定义的LLM

要使用自定义的大型语言模型（LLM），只需要实现LLM类（或者使用CustomLLM）

from typing import Optional, List, Mapping, Any

from llama_index.core import SimpleDirectoryReader, SummaryIndex
from llama_index.core.callbacks import CallbackManager
from llama_index.core.llms import (
    CustomLLM,
    CompletionResponse,
    CompletionResponseGen,
    LLMMetadata,
)
from llama_index.core.llms.callbacks import llm_completion_callback
from llama_index.core import Settings


class OurLLM(CustomLLM):
    context_window: int = 3900
    num_output: int = 256
    model_name: str = "custom"
    dummy_response: str = "My response"

    @property
    def metadata(self) -> LLMMetadata:
        """Get LLM metadata."""
        return LLMMetadata(
            context_window=self.context_window,
            num_output=self.num_output,
            model_name=self.model_name,
        )

    @llm_completion_callback()
    def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:
        return CompletionResponse(text=self.dummy_response)

    @llm_completion_callback()
    def stream_complete(
        self, prompt: str, **kwargs: Any
    ) -> CompletionResponseGen:
        response = ""
        for token in self.dummy_response:
            response += token
            yield CompletionResponse(text=response, delta=token)


# define our LLM
Settings.llm = OurLLM()

# define embed model
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)


# Load the your data
documents = SimpleDirectoryReader("./data").load_data()
index = SummaryIndex.from_documents(documents)

# Query and print response
query_engine = index.as_query_engine()
response = query_engine.query("<query_text>")
print(response)

支持的模型列表：https://docs.llamaindex.ai/en/stable/module_guides/models/llms/modules/

使用这种方法，可以使用任何大型语言模型（LLM）。在本地运行的模型，或者在您自己的服务器上运行的模型。只要实现了类并且返回了生成的tokens，它就应该可以工作。需要使用提示助手来定制prompt的大小，因为每个模型的上下文长度都略有不同。

可能需要调整内部prompt以获得良好的性能。即便如此，也应该使用一个足够大的LLM，以确保它能够处理LlamaIndex内部使用的复杂查询。

LlamaIndex提供了一些默认的prompts，还有chat-specific prompts。也可以自定义Prompt。

二、Embeddings

在LlamaIndex中，Embeddings用于将文档转换成向量表示的模型。LlamaIndex中计算向量相似度使用余弦。LlamaIndex默认使用OpenAI的text-embedding-ada-002模型对文本进行向量化。LlamaIndex还支持Langchain提供的任何Embedding模型，并提供了一个易于扩展的基础类，以便实现自己的嵌入模型。

2.1 使用示例

在LlamaIndex中，Embedding模型通常在Settings对象中指定，然后用于向量索引。Embedding模型将用于向量化构建索引时使用的文档，以及稍后使用查询引擎进行的任何查询的向量化。还可以为每个索引指定Embedding模型。

安装依赖

1	`pip install llama-index-embeddings-openai`

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# global
Settings.embed_model = OpenAIEmbedding()

# per-index
index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)

#节省成本，可以使用HuggingFace的模型
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

2.2 定制化

2.2.1 Batch Size

默认情况下，Embedding请求以每批10个的量发送给OpenAI。对于需要Embedding许多文档的其他用户来说，这个批量大小可能太小了。

1 2	`# set the batch size to 42 embed_model = OpenAIEmbedding(embed_batch_size=42)`

2.2.2 本地Embedding模型

最简单的方式：

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

2.2.3 HuggingFace Optimum ONNX Embeddings

LlamaIndex 还支持使用 HuggingFace 的 Optimum 库创建和使用 ONNX Embeddings 简单创建并保存 ONNX Embeddings并使用它们。

安装依赖

指定模型和输出路径：

1 2	`pip install transformers optimum[exporters] pip install llama-index-embeddings-huggingface-optimum`

from llama_index.embeddings.huggingface_optimum import OptimumEmbedding

OptimumEmbedding.create_and_save_optimum_model(
    "BAAI/bge-small-en-v1.5", "./bge_onnx"
)

Settings.embed_model = OptimumEmbedding(folder_name="./bge_onnx")

2.2.4 LangChain集成

安装依赖

1	`pip install llama-index-embeddings-langchain`

使用LangChain的 embedding class从 Hugging Face 加载模型

from langchain.embeddings.huggingface import HuggingFaceBgeEmbeddings
from llama_index.core import Settings

Settings.embed_model = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-base-en")

2.2.5 自定义Embedding模型

下面的示例使用 Instructor Embeddings，并实现自定义Embedding类。 Instructor Embedding通过提供文本以及要Embedding的文本领域的“说明”来工作。当Embedding来自非常具体和专业的主题的文本时，这非常有用。

from typing import Any, List
from InstructorEmbedding import INSTRUCTOR
from llama_index.core.embeddings import BaseEmbedding


class InstructorEmbeddings(BaseEmbedding):
    def __init__(
        self,
        instructor_model_name: str = "hkunlp/instructor-large",
        instruction: str = "Represent the Computer Science documentation or question:",
        **kwargs: Any,
    ) -> None:
        self._model = INSTRUCTOR(instructor_model_name)
        self._instruction = instruction
        super().__init__(**kwargs)

        def _get_query_embedding(self, query: str) -> List[float]:
            embeddings = self._model.encode([[self._instruction, query]])
            return embeddings[0]

        def _get_text_embedding(self, text: str) -> List[float]:
            embeddings = self._model.encode([[self._instruction, text]])
            return embeddings[0]

        def _get_text_embeddings(self, texts: List[str]) -> List[List[float]]:
            embeddings = self._model.encode(
                [[self._instruction, text] for text in texts]
            )
            return embeddings

        async def _get_query_embedding(self, query: str) -> List[float]:
            return self._get_query_embedding(query)

        async def _get_text_embedding(self, text: str) -> List[float]:
            return self._get_text_embedding(text)

2.3 单独使用

还可以将Embedding作为一个独立的模块，用于项目、现有应用程序或一般的测试和探索。

1
2
3

embeddings = embed_model.get_text_embedding(
    "It is raining cats and dogs here!"
)

支持的Embedding模型:https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings/#list-of-supported-embeddings

多模态LMMs(Large Multi-modal Models (LMMs))的输入不再仅仅局限于文本，可以是图片和文本。LlamaINdex包含一个MultiModalLLM抽象类，可以支持文本+图像的多模态模型。

3.1 使用示例

以下代码片段展示了如何使用多模态LMM

from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core.multi_modal_llms.generic_utils import load_image_urls
from llama_index.core import SimpleDirectoryReader

# load image documents from urls
image_documents = load_image_urls(image_urls)

# load image documents from local directory
image_documents = SimpleDirectoryReader(local_directory).load_data()

# non-streaming
openai_mm_llm = OpenAIMultiModal(
    model="gpt-4-vision-preview", api_key=OPENAI_API_TOKEN, max_new_tokens=300
)
response = openai_mm_llm.complete(
    prompt="what is in the image?", image_documents=image_documents
)

以下代码展示了如何构建多模态向量存储/索引。

from llama_index.core.indices import MultiModalVectorStoreIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import SimpleDirectoryReader, StorageContext

import qdrant_client
from llama_index.core import SimpleDirectoryReader

# Create a local Qdrant vector store
client = qdrant_client.QdrantClient(path="qdrant_mm_db")

# if you only need image_store for image retrieval,
# you can remove text_sotre
text_store = QdrantVectorStore(
    client=client, collection_name="text_collection"
)
image_store = QdrantVectorStore(
    client=client, collection_name="image_collection"
)

storage_context = StorageContext.from_defaults(
    vector_store=text_store, image_store=image_store
)

# Load text and image documents from local folder
documents = SimpleDirectoryReader("./data_folder/").load_data()
# Create the MultiModal index
index = MultiModalVectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
)

以下代码展示了如何使用多模态检索器和查询引擎。

from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core import PromptTemplate
from llama_index.core.query_engine import SimpleMultiModalQueryEngine

retriever_engine = index.as_retriever(
    similarity_top_k=3, image_similarity_top_k=3
)

# retrieve more information from the GPT4V response
retrieval_results = retriever_engine.retrieve(response)

# if you only need image retrieval without text retrieval
# you can use `text_to_image_retrieve`
# retrieval_results = retriever_engine.text_to_image_retrieve(response)

qa_tmpl_str = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the query.\n"
    "Query: {query_str}\n"
    "Answer: "
)
qa_tmpl = PromptTemplate(qa_tmpl_str)

query_engine = index.as_query_engine(
    multi_modal_llm=openai_mm_llm, text_qa_template=qa_tmpl
)

query_str = "Tell me more about the Porsche"
response = query_engine.query(query_str)

LlamaIndex 多模态LMM使用的例子：https://docs.llamaindex.ai/en/stable/module_guides/models/multi_modal/#multi-modal-llm-models

官方资源

LlamaIndex

#LLM #LLM学习笔记 #LlamaIndex #Agent

LlamaIndex(二)——LlamaIndex Models

https://mztchaoqun.com.cn/posts/D15_LlamaIndex_Models/

作者

mztchaoqun

发布于

2024年3月25日

许可协议

LlamaIndex(三)——LlamaIndex Prompt 上一篇

LlamaIndex(一)——LlamaIndex简介下一篇

LlamaIndex(二)——LlamaIndex Models

一、LLMs

1.1 LLM的使用

1.1.1 Text Completion例子

1.1.2 Chat例子

1.1.3 Tokenization

1.2 自定义 LLM

1.2.1 自定义LLM，输出token，上下文窗口

1.2.1 使用HuggingFace LLM

1.2.2 使用自定义的LLM

二、Embeddings

2.1 使用示例

2.2 定制化

2.2.1 Batch Size

2.2.2 本地Embedding模型

2.2.3 HuggingFace Optimum ONNX Embeddings

2.2.4 LangChain集成

2.2.5 自定义Embedding模型

2.3 单独使用

三、Multi-modal models

3.1 使用示例

官方资源