Skip to content
Prev Previous commit
Next Next commit
Merge branch 'main' into ninhu/discovery_403
  • Loading branch information
ninghu committed Oct 28, 2024
commit 6297eb81c2623f61d4fba0e170510ec090c1d538
10 changes: 10 additions & 0 deletions sdk/evaluation/azure-ai-evaluation/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
conversation_turns=conversation_turns,
max_conversation_turns=1))
```
- Adding evaluator for multimodal use cases

### Breaking Changes
- Renamed environment variable `PF_EVALS_BATCH_USE_ASYNC` to `AI_EVALS_BATCH_USE_ASYNC`.
Expand Down Expand Up @@ -48,13 +49,22 @@
- Improved validation and error messaging for input parameters in the `evaluate` API.
- Refined error messages for storage access permission issues.
- Refined error messages for serviced-based evaluators and simulators.
- `GroundednessEvaluator` now supports `query` as an optional input in single-turn evaluation. If `query` is provided, a different prompt template will be used for the evaluation.
- To align with our support of a diverse set of models, the following evaluators will now have a new key in their result output without the `gpt_` prefix. To maintain backwards compatibility, the old key with the `gpt_` prefix will still be present in the output; however, it is recommended to use the new key moving forward as the old key will be deprecated in the future.
- `CoherenceEvaluator`
- `RelevanceEvaluator`
- `FluencyEvaluator`
- `GroundednessEvaluator`
- `SimilarityEvaluator`
- `RetrievalEvaluator`
- The following evaluators will now have a new key in their result output including LLM reasoning behind the score. The new key will follow the pattern "<metric_name>_reason". The reasoning is the result of a more detailed prompt template being used to generate the LLM response. Note that this requires the maximum number of tokens used to run these evaluators to be increased.
| Evaluator | New Token Limit |
| --- | --- |
| `CoherenceEvaluator` | 800 |
| `RelevanceEvaluator` | 800 |
| `FluencyEvaluator` | 800 |
| `GroundednessEvaluator` | 800 |
| `RetrievalEvaluator` | 1600 |
- Introduced environment variable `AI_EVALS_DISABLE_EXPERIMENTAL_WARNING` to disable the warning message for experimental features.

## 1.0.0b4 (2024-10-16)
Expand Down
You are viewing a condensed version of this merge commit. You can view the full changes here.