module-attribute  ¶
 DecoderOnlyInputs = Union[
    TokenInputs, EmbedsInputs, "MultiModalInputs"
]
The inputs in LLMEngine before they are passed to the model executor. This specifies the data required for decoder-only models.
 module-attribute  ¶
 ProcessorInputs = Union[
    DecoderOnlyInputs, EncoderDecoderInputs
]
The outputs from vllm.inputs.preprocess.InputPreprocessor.
 module-attribute  ¶
 PromptType = Union[
    SingletonPrompt, ExplicitEncoderDecoderPrompt
]
Set of possible schemas for an LLM input, including both decoder-only and encoder/decoder input types:
- A text prompt (strorTextPrompt)
- A tokenized prompt (TokensPrompt)
- An embeddings prompt (EmbedsPrompt)
- A single data structure containing both an encoder and a decoder prompt (ExplicitEncoderDecoderPrompt)
 module-attribute  ¶
 SingletonInputs = Union[
    TokenInputs, EmbedsInputs, "MultiModalInputs"
]
A processed SingletonPrompt which can be passed to Sequence.
 module-attribute  ¶
 SingletonPrompt = Union[
    str, TextPrompt, TokensPrompt, EmbedsPrompt
]
Set of possible schemas for a single prompt:
- A text prompt (strorTextPrompt)
- A tokenized prompt (TokensPrompt)
- An embeddings prompt (EmbedsPrompt)
Note that "singleton" is as opposed to a data structure which encapsulates multiple prompts, i.e. of the sort which may be utilized for encoder/decoder models when the user desires to express both the encoder & decoder prompts explicitly, i.e. ExplicitEncoderDecoderPrompt
A prompt of type SingletonPrompt may be employed as (1) input to a decoder-only model, (2) input to the encoder of an encoder/decoder model, in the scenario where the decoder-prompt is not specified explicitly, or (3) as a member of a larger data structure encapsulating more than one prompt, i.e. ExplicitEncoderDecoderPrompt
 module-attribute  ¶
 _T1_co = TypeVar(
    "_T1_co",
    bound=SingletonPrompt,
    default=SingletonPrompt,
    covariant=True,
)
 module-attribute  ¶
 _T2_co = TypeVar(
    "_T2_co",
    bound=SingletonPrompt,
    default=SingletonPrompt,
    covariant=True,
)
 
  Bases: TypedDict
Represents generic inputs handled by IO processor plugins.
Source code in vllm/inputs/data.py
   
   
  Bases: TypedDict
Schema for a prompt provided via token embeddings.
Source code in vllm/inputs/data.py
   instance-attribute  ¶
 cache_salt: NotRequired[str]
Optional cache salt to be used for prefix caching.
 
  Bases: TypedDict
The inputs in LLMEngine before they are passed to the model executor.
This specifies the required data for encoder-decoder models.
Source code in vllm/inputs/data.py
  instance-attribute  ¶
 decoder: Union[TokenInputs, MultiModalInputs]
The inputs for the decoder portion.
 instance-attribute  ¶
 encoder: Union[TokenInputs, MultiModalInputs]
The inputs for the encoder portion.
 
  Bases: TypedDict, Generic[_T1_co, _T2_co]
Represents an encoder/decoder model input prompt, comprising an explicit encoder prompt and a decoder prompt.
The encoder and decoder prompts, respectively, may be formatted according to any of the SingletonPrompt schemas, and are not required to have the same schema.
Only the encoder prompt may have multi-modal data. mm_processor_kwargs should be at the top-level, and should not be set in the encoder/decoder prompts, since they are agnostic to the encoder/decoder.
Note that an ExplicitEncoderDecoderPrompt may not be used as an input to a decoder-only model, and that the encoder_prompt and decoder_prompt fields of this data structure themselves must be SingletonPrompt instances.
Source code in vllm/inputs/data.py
  
  Bases: TypedDict
Schema for a text prompt.
Source code in vllm/inputs/data.py
  instance-attribute  ¶
 cache_salt: NotRequired[str]
Optional cache salt to be used for prefix caching.
 instance-attribute  ¶
 mm_processor_kwargs: NotRequired[dict[str, Any]]
Optional multi-modal processor kwargs to be forwarded to the multimodal input mapper & processor. Note that if multiple modalities have registered mappers etc for the model being considered, we attempt to pass the mm_processor_kwargs to each of them.
 instance-attribute  ¶
 multi_modal_data: NotRequired[MultiModalDataDict]
Optional multi-modal data to pass to the model, if the model supports it.
 instance-attribute  ¶
 multi_modal_uuids: NotRequired[MultiModalUUIDDict]
Optional user-specified UUIDs for multimodal items, mapped by modality. Lists must match the number of items per modality and may contain None. For None entries, the hasher will compute IDs automatically; non-None entries override the default hashes for caching, and MUST be unique per multimodal item.
 
  Bases: TypedDict
Schema for a tokenized prompt.
Source code in vllm/inputs/data.py
  instance-attribute  ¶
 cache_salt: NotRequired[str]
Optional cache salt to be used for prefix caching.
 instance-attribute  ¶
 mm_processor_kwargs: NotRequired[dict[str, Any]]
Optional multi-modal processor kwargs to be forwarded to the multimodal input mapper & processor. Note that if multiple modalities have registered mappers etc for the model being considered, we attempt to pass the mm_processor_kwargs to each of them.
 instance-attribute  ¶
 multi_modal_data: NotRequired[MultiModalDataDict]
Optional multi-modal data to pass to the model, if the model supports it.
 instance-attribute  ¶
 multi_modal_uuids: NotRequired[MultiModalUUIDDict]
Optional user-specified UUIDs for multimodal items, mapped by modality. Lists must match the number of items per modality and may contain None. For None entries, the hasher will compute IDs automatically; non-None entries override the default hashes for caching.
 instance-attribute  ¶
 prompt: NotRequired[str]
The prompt text corresponding to the token IDs, if available.
 instance-attribute  ¶
  A list of token IDs to pass to the model.
 instance-attribute  ¶
 token_type_ids: NotRequired[list[int]]
A list of token type IDs to pass to the cross encoder model.
 
 build_explicit_enc_dec_prompt(
    encoder_prompt: _T1,
    decoder_prompt: Optional[_T2],
    mm_processor_kwargs: Optional[dict[str, Any]] = None,
) -> ExplicitEncoderDecoderPrompt[_T1, _T2]
Source code in vllm/inputs/data.py
  
 embeds_inputs(
    prompt_embeds: Tensor, cache_salt: Optional[str] = None
) -> EmbedsInputs
Construct EmbedsInputs from optional values.
Source code in vllm/inputs/data.py
  
 is_embeds_prompt(
    prompt: SingletonPrompt,
) -> TypeIs[EmbedsPrompt]
 
 is_tokens_prompt(
    prompt: SingletonPrompt,
) -> TypeIs[TokensPrompt]
 
 to_enc_dec_tuple_list(
    enc_dec_prompts: Iterable[
        ExplicitEncoderDecoderPrompt[_T1, _T2]
    ],
) -> list[tuple[_T1, Optional[_T2]]]
Source code in vllm/inputs/data.py
   
 token_inputs(
    prompt_token_ids: list[int],
    cache_salt: Optional[str] = None,
) -> TokenInputs
Construct TokenInputs from optional values.
Source code in vllm/inputs/data.py
  
 zip_enc_dec_prompts(
    enc_prompts: Iterable[_T1],
    dec_prompts: Iterable[Optional[_T2]],
    mm_processor_kwargs: Optional[
        Union[Iterable[dict[str, Any]], dict[str, Any]]
    ] = None,
) -> list[ExplicitEncoderDecoderPrompt[_T1, _T2]]
Zip encoder and decoder prompts together into a list of ExplicitEncoderDecoderPrompt instances.
mm_processor_kwargs may also be provided; if a dict is passed, the same dictionary will be used for every encoder/decoder prompt. If an iterable is provided, it will be zipped with the encoder/decoder prompts.