Skip to content

Question: Support for Continuous Batching and Asynchronous Requests #25

@Msiavashi

Description

@Msiavashi

Hi. I'm new to this LLM world. I have a few questions regarding the engine. Does it support continuous batching? I'm asking because I'm trying to set a request per second rate and wanted to know if I should implement my own batching strategy or if the framework provides any batching functionalities.

I see from the paper: "Multiple sequences are batched until they either reach a maximum batch size of 16 or a maximum waiting time of one second, both parameters referenced from AlpaServe."

According to this, is there any async version of the engine that allows adding requests at varying rates?

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions