module-attribute  ¶
 VLLM_SUBCMD_PARSER_EPILOG = "For full list:            vllm {subcmd} --help=all\nFor a section:            vllm {subcmd} --help=ModelConfig    (case-insensitive)\nFor a flag:               vllm {subcmd} --help=max-model-len  (_ or - accepted)\nDocumentation:            https://docs.vllm.ai\n"
 
 _validate_truncation_size(
    max_model_len: int,
    truncate_prompt_tokens: Optional[int],
    tokenization_kwargs: Optional[dict[str, Any]] = None,
) -> Optional[int]
Source code in vllm/entrypoints/utils.py
  
  Source code in vllm/entrypoints/utils.py
  
    
 get_max_tokens(
    max_model_len: int,
    request: Union[
        ChatCompletionRequest, CompletionRequest
    ],
    input_length: int,
    default_sampling_params: dict,
) -> int
Source code in vllm/entrypoints/utils.py
  async  ¶
  Returns if a disconnect message is received
Source code in vllm/entrypoints/utils.py
  
  Source code in vllm/entrypoints/utils.py
  
 log_non_default_args(args: Union[Namespace, EngineArgs])
Source code in vllm/entrypoints/utils.py
  
  Decorator that allows a route handler to be cancelled by client disconnections.
This does not use request.is_disconnected, which does not work with middleware. Instead this follows the pattern from starlette.StreamingResponse, which simultaneously awaits on two tasks- one to wait for an http disconnect message, and the other to do the work that we want done. When the first task finishes, the other is cancelled.
A core assumption of this method is that the body of the request has already been read. This is a safe assumption to make for fastapi handlers that have already parsed the body of the request into a pydantic model for us. This decorator is unsafe to use elsewhere, as it will consume and throw away all incoming messages for the request while it looks for a disconnect message.
In the case where a StreamingResponse is returned by the handler, this wrapper will stop listening for disconnects and instead the response object will start listening for disconnects.