Skip to content

Is there any way to reduce the inference time for this part #554

Open
@BugO0

Description

@BugO0

Is there any way to reduce the inference time for this part
The following is the output information of one inference, which seems to take much longer than pure llama.cpp inference, and mainly because this part takes time
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions