Merge branch 'main' into ninhu/discovery_403

Azure · ninghu · Oct 29, 2024 · Oct 25, 2024 · Oct 28, 2024 · Oct 28, 2024
commit 6297eb81c2623f61d4fba0e170510ec090c1d538
@@ -21,6 +21,7 @@
       conversation_turns=conversation_turns,
       max_conversation_turns=1))
   ```
+- Adding evaluator for multimodal use cases
 
 ### Breaking Changes
 - Renamed environment variable `PF_EVALS_BATCH_USE_ASYNC` to `AI_EVALS_BATCH_USE_ASYNC`.
@@ -48,13 +49,22 @@
   - Improved validation and error messaging for input parameters in the `evaluate` API.
   - Refined error messages for storage access permission issues.
   - Refined error messages for serviced-based evaluators and simulators.
+- `GroundednessEvaluator` now supports `query` as an optional input in single-turn evaluation. If `query` is provided, a different prompt template will be used for the evaluation.
 - To align with our support of a diverse set of models, the following evaluators will now have a new key in their result output without the `gpt_` prefix. To maintain backwards compatibility, the old key with the `gpt_` prefix will still be present in the output; however, it is recommended to use the new key moving forward as the old key will be deprecated in the future.
   - `CoherenceEvaluator`
   - `RelevanceEvaluator`
   - `FluencyEvaluator`
   - `GroundednessEvaluator`
   - `SimilarityEvaluator`
   - `RetrievalEvaluator`
+- The following evaluators will now have a new key in their result output including LLM reasoning behind the score. The new key will follow the pattern "<metric_name>_reason". The reasoning is the result of a more detailed prompt template being used to generate the LLM response. Note that this requires the maximum number of tokens used to run these evaluators to be increased.
+    | Evaluator | New Token Limit |
+    | --- | --- |
+    | `CoherenceEvaluator` | 800 |
+    | `RelevanceEvaluator` | 800 |
+    | `FluencyEvaluator` | 800 |
+    | `GroundednessEvaluator` | 800 |
+    | `RetrievalEvaluator` | 1600 |
 - Introduced environment variable `AI_EVALS_DISABLE_EXPERIMENTAL_WARNING` to disable the warning message for experimental features.
 
 ## 1.0.0b4 (2024-10-16)