Misc. bug: llama-server SSE error messages should be compatible with the RFC8895 specification

### Name and Version

./llama-cli --version (llama.cpp compiled from source)
version: 6520 (4067f07f)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu



### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
CUDA_VISIBLE_DEVICES="1" ~/llama.cpp/llama-server -m ~/dev/llms/Qwen3-4B-Instruct-2507-UD-Q8_K_XL.gguf \
-c 100 \
--temp 0.7 --top_k 20 --top_p 0.8 --min_p 0.01 \
--port 8083
# doesn't really matter as it is not a model or config specific problem
```

### Problem description & steps to reproduce

I encountered this while working with the OpenAI compatible chat completion endpoint and `stream=true` (but also affects the `/completion` endpoint) and not getting any API errors thrown by the OpenAI python client library when exceeding the context size.
It seems like there is a small bug in the server implementation which leads to error messages not being emitted correctly using server sent events,
since the emitted event is not compatible with the [RFC8895](https://datatracker.ietf.org/doc/html/rfc8895#name-server-push-server-sent-eve) spec.
The specification only accounts for `data`, `id` and `retry` fieldnames, while the implementation uses an `error` fieldname.


<details>
  <summary>Additional Information</summary>


**Incorrect** (message is ignored and handled as empty):
```
error: {"code":400,"message":"the request exceeds the available context size. try increasing the context size or enable context shift","type":"invalid_request_error"}

data: [DONE]
```

**Correct**:
```
data: {"error":{"code":400,"message":"the request exceeds the available context size. try increasing the context size or enable context shift","type":"invalid_request_error"}}

data: [DONE]
```

When not streaming responses everything works as expected and a 400 with the following response body is returned:
```json
{
    "error": {
        "code": 400,
        "message": "the request exceeds the available context size. try increasing the context size or enable context shift",
        "type": "invalid_request_error"
    }
}
```
</details>

https://github.com/ggml-org/llama.cpp/blob/4b8560ab56fdd9819358b47c338bbc8ec357c57e/tools/server/server.cpp#L4692

When changing the line above to `server_sent_event(sink, "data", json{{"error", error_data}});` error messages are emitted in accordance with the specification.

Decoding based on the SSE spec will now work correctly e.g. the way it is done in the openai client library (see https://github.com/openai/openai-python/blob/0d85ca08c83a408abf3f03b46189e6bf39f68ac6/src/openai/_streaming.py#L322)

I would open a PR for this but I am unsure about the following points:

- is this maybe intended in any way? (the old webui used to parse the existing error message format specifically)
- should this be something covered in the tests?
- what about developers depending on the format emitted currently? This might break some existing error handling solutions when streaming responses

I guess this would also make the workaround used here https://github.com/ggml-org/llama.cpp/blob/4067f07fc5aa5722b87db303b561597004696f6c/tools/server/webui/src/lib/services/chat.ts#L278
obsolete, since errors could then be handled more gracefully via the parsed SSE data @allozaur (Love the new UI btw!)

### First Bad Commit

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: llama-server SSE error messages should be compatible with the RFC8895 specification #16104

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: llama-server SSE error messages should be compatible with the RFC8895 specification #16104

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions