Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 28 additions & 26 deletions sdk/evaluation/azure-ai-evaluation/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,35 +5,34 @@

### Features Added
- Groundedness detection in Non Adversarial Simulator via query/context pairs
```python
import importlib.resources as pkg_resources
package = "azure.ai.evaluation.simulator._data_sources"
resource_name = "grounding.json"
custom_simulator = Simulator(model_config=model_config)
conversation_turns = []
with pkg_resources.path(package, resource_name) as grounding_file:
with open(grounding_file, "r") as file:
data = json.load(file)
for item in data:
conversation_turns.append([item])
outputs = asyncio.run(custom_simulator(
target=callback,
conversation_turns=conversation_turns,
max_conversation_turns=1,
))
```
```python
import importlib.resources as pkg_resources
package = "azure.ai.evaluation.simulator._data_sources"
resource_name = "grounding.json"
custom_simulator = Simulator(model_config=model_config)
conversation_turns = []
with pkg_resources.path(package, resource_name) as grounding_file:
with open(grounding_file, "r") as file:
data = json.load(file)
for item in data:
conversation_turns.append([item])
outputs = asyncio.run(custom_simulator(
target=callback,
conversation_turns=conversation_turns,
max_conversation_turns=1))
```

### Breaking Changes
- Renamed environment variable `PF_EVALS_BATCH_USE_ASYNC` to `AI_EVALS_BATCH_USE_ASYNC`.
- AdversarialScenario enum does not include `ADVERSARIAL_INDIRECT_JAILBREAK`, invoking IndirectJailbreak or XPIA should be done with `IndirectAttackSimulator`
- Outputs of `Simulator` and `AdversarialSimulator` previously had `to_eval_qa_json_lines` and now has `to_eval_qr_json_lines`. Where `to_eval_qa_json_lines` had:
```json
{"question": <user_message>, "answer": <assistant_message>}
```
`to_eval_qr_json_lines` now has:
```json
{"query": <user_message>, "response": assistant_message}
```
```json
{"question": <user_message>, "answer": <assistant_message>}
```
`to_eval_qr_json_lines` now has:
```json
{"query": <user_message>, "response": <assistant_message>}
```

### Bugs Fixed
- Non adversarial simulator works with `gpt-4o` models using the `json_schema` response format
Expand All @@ -42,15 +41,18 @@ outputs = asyncio.run(custom_simulator(
- Non adversarial simulator now accepts context from the callback

### Other Changes
- Improved error messages for the `evaluate` API by enhancing the validation of input parameters. This update provides more detailed and actionable error descriptions.
- Enhanced error messages across the SDK to provide more context and actionable information.
- Improved validation and error messaging for input parameters in the `evaluate` API.
- Refined error messages for storage access permission issues.
- Refined error messages for serviced-based evaluators and simulators.
- To align with our support of a diverse set of models, the following evaluators will now have a new key in their result output without the `gpt_` prefix. To maintain backwards compatibility, the old key with the `gpt_` prefix will still be present in the output; however, it is recommended to use the new key moving forward as the old key will be deprecated in the future.
- `CoherenceEvaluator`
- `RelevanceEvaluator`
- `FluencyEvaluator`
- `GroundednessEvaluator`
- `SimilarityEvaluator`
- `RetrievalEvaluator`
- Improved the error message for storage access permission issues to provide clearer guidance for users.
- Introduced environment variable `AI_EVALS_DISABLE_EXPERIMENTAL_WARNING` to disable the warning message for experimental features.

## 1.0.0b4 (2024-10-16)

Expand Down
7 changes: 4 additions & 3 deletions sdk/evaluation/azure-ai-evaluation/TROUBLESHOOTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This guide walks you through how to investigate failures, common errors in the `

- [Handle Evaluate API Errors](#handle-evaluate-api-errors)
- [Troubleshoot Remote Tracking Issues](#troubleshoot-remote-tracking-issues)
- [Safety Metric Supported Regions](#safety-metric-supported-regions)
- [Troubleshoot Safety Evaluators Issues](#troubleshoot-safety-evaluators-issues)
- [Handle Simulation Errors](#handle-simulation-errors)
- [Adversarial Simulation Supported Regions](#adversarial-simulation-supported-regions)
- [Logging](#logging)
Expand All @@ -31,9 +31,10 @@ This guide walks you through how to investigate failures, common errors in the `

- Additionally, if you're using a virtual network or private link, and your evaluation run upload fails because of that, check out this [guide](https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network#access-data-using-the-studio).

### Safety Metric Supported Regions
### Troubleshoot Safety Evaluators Issues

Risk and safety evaluators depend on the Azure AI Studio safety evaluation backend service. For a list of supported regions, please refer to the documentation [here](https://aka.ms/azureaisafetyeval-regionsupport).
- Risk and safety evaluators depend on the Azure AI Studio safety evaluation backend service. For a list of supported regions, please refer to the documentation [here](https://aka.ms/azureaisafetyeval-regionsupport).
- If you encounter a 403 Unauthorized error when using safety evaluators, verify that you have the `Contributor` role assigned to your Azure AI project. `Contributor` role is currently required to run safety evaluations.

## Handle Simulation Errors

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------

import os
import functools
import inspect
import logging
Expand Down Expand Up @@ -149,6 +150,9 @@ def _get_indentation_size(doc_string: str) -> int:
def _should_skip_warning():
skip_warning_msg = False

if os.getenv("AI_EVALS_DISABLE_EXPERIMENTAL_WARNING", "false").lower() == "true":
skip_warning_msg = True

# Cases where we want to suppress the warning:
# 1. When converting from REST object to SDK object
for frame in inspect.stack():
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,27 +76,31 @@ async def ensure_service_availability(rai_svc_url: str, token: str, capability:
async with get_async_http_client() as client:
response = await client.get(svc_liveness_url, headers=headers)

if response.status_code != 200:
msg = f"RAI service is not available in this region. Status Code: {response.status_code}"
raise EvaluationException(
message=msg,
internal_message=msg,
target=ErrorTarget.UNKNOWN,
category=ErrorCategory.SERVICE_UNAVAILABLE,
blame=ErrorBlame.USER_ERROR,
)

capabilities = response.json()
if response.status_code != 200:
msg = (
f"RAI service is unavailable in this region, or you lack the necessary permissions "
f"to access the AI project. Status Code: {response.status_code}"
)
raise EvaluationException(
message=msg,
internal_message=msg,
target=ErrorTarget.RAI_CLIENT,
category=ErrorCategory.SERVICE_UNAVAILABLE,
blame=ErrorBlame.USER_ERROR,
tsg_link="https://aka.ms/azsdk/python/evaluation/safetyevaluator/troubleshoot",
)

if capability and capability not in capabilities:
msg = f"Capability '{capability}' is not available in this region"
raise EvaluationException(
message=msg,
internal_message=msg,
target=ErrorTarget.RAI_CLIENT,
category=ErrorCategory.SERVICE_UNAVAILABLE,
blame=ErrorBlame.USER_ERROR,
)
capabilities = response.json()
if capability and capability not in capabilities:
msg = f"The needed capability '{capability}' is not supported by the RAI service in this region."
raise EvaluationException(
message=msg,
internal_message=msg,
target=ErrorTarget.RAI_CLIENT,
category=ErrorCategory.SERVICE_UNAVAILABLE,
blame=ErrorBlame.USER_ERROR,
tsg_link="https://aka.ms/azsdk/python/evaluation/safetyevaluator/troubleshoot",
)


def generate_payload(normalized_user_text: str, metric: str) -> Dict:
Expand Down Expand Up @@ -346,15 +350,17 @@ async def _get_service_discovery_url(azure_ai_project: AzureAIProject, token: st
headers=headers,
)

if response.status_code != 200:
msg = "Failed to retrieve the discovery service URL."
raise EvaluationException(
message=msg,
internal_message=msg,
target=ErrorTarget.RAI_CLIENT,
category=ErrorCategory.SERVICE_UNAVAILABLE,
blame=ErrorBlame.UNKNOWN,
)
if response.status_code != 200:
msg = (
f"Unable to connect to your Azure AI project. Please verify that the project is correctly configured. "
f"Status code: {response.status_code}"
)
raise EvaluationException(
message=msg,
target=ErrorTarget.RAI_CLIENT,
category=ErrorCategory.SERVICE_UNAVAILABLE,
blame=ErrorBlame.USER_ERROR,
)

base_url = urlparse(response.json()["properties"]["discoveryUrl"])
return f"{base_url.scheme}://{base_url.netloc}"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,14 +74,17 @@ def _get_service_discovery_url(self):
timeout=5,
)
if response.status_code != 200:
msg = "Failed to retrieve the discovery service URL."
msg = (
f"Unable to connect to your Azure AI project. Please verify that the project is correctly configured. "
f"Status code: {response.status_code}"
)
raise EvaluationException(
message=msg,
internal_message=msg,
target=ErrorTarget.RAI_CLIENT,
category=ErrorCategory.SERVICE_UNAVAILABLE,
blame=ErrorBlame.UNKNOWN,
blame=ErrorBlame.USER_ERROR,
)

base_url = urlparse(response.json()["properties"]["discoveryUrl"])
return f"{base_url.scheme}://{base_url.netloc}"

Expand Down