- Adding evaluator for multimodal use cases
- Removed
numpydependency. All NaN values returned by the SDK have been changed to fromnumpy.nantomath.nan. credentialis now required to be passed in for all content safety evaluators andProtectedMaterialsEvaluator.DefaultAzureCredentialwill no longer be chosen if a credential is not passed.- Changed package extra name from "pf-azure" to "remote".
- Adversarial Conversation simulations would fail with
Forbidden. Added logic to re-fetch token in the exponential retry logic to retrive RAI Service response. - Fixed an issue where the Evaluate API did not fail due to missing inputs when the target did not return columns required by the evaluators.
- Enhance the error message to provide clearer instruction when required packages for the remote tracking feature are missing.
- Print the per-evaluator run summary at the end of the Evaluate API call to make troubleshooting row-level failures easier.
- Added
typefield toAzureOpenAIModelConfigurationandOpenAIModelConfiguration - The following evaluators now support
conversationas an alternative input to their usual single-turn inputs:ViolenceEvaluatorSexualEvaluatorSelfHarmEvaluatorHateUnfairnessEvaluatorProtectedMaterialEvaluatorIndirectAttackEvaluatorCoherenceEvaluatorRelevanceEvaluatorFluencyEvaluatorGroundednessEvaluator
- Surfaced
RetrievalScoreEvaluator, formally an internal part ofChatEvaluatoras a standalone conversation-only evaluator.
- Removed
ContentSafetyChatEvaluatorandChatEvaluator - The
evaluator_configparameter ofevaluatenow maps in evaluator name to a dictionaryEvaluatorConfig, which is aTypedDict. Thecolumn_mappingbetweendataortargetand evaluator field names should now be specified inside this new dictionary:
Before:
evaluate(
...,
evaluator_config={
"hate_unfairness": {
"query": "${data.question}",
"response": "${data.answer}",
}
},
...
)After
evaluate(
...,
evaluator_config={
"hate_unfairness": {
"column_mapping": {
"query": "${data.question}",
"response": "${data.answer}",
}
}
},
...
)- Simulator now requires a model configuration to call the prompty instead of an Azure AI project scope. This enables the usage of simulator with Entra ID based auth. Before:
azure_ai_project = {
"subscription_id": os.environ.get("AZURE_SUBSCRIPTION_ID"),
"resource_group_name": os.environ.get("RESOURCE_GROUP"),
"project_name": os.environ.get("PROJECT_NAME"),
}
sim = Simulator(azure_ai_project=azure_ai_project, credentails=DefaultAzureCredentials())After:
model_config = {
"azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
"azure_deployment": os.environ.get("AZURE_DEPLOYMENT"),
}
sim = Simulator(model_config=model_config)If api_key is not included in the model_config, the prompty runtime in promptflow-core will pick up DefaultAzureCredential.
- Fixed issue where Entra ID authentication was not working with
AzureOpenAIModelConfiguration
dataandevaluatorsare now required keywords inevaluate.
- The
syntheticnamespace has been renamed tosimulator, and sub-namespaces under this module have been removed - The
evaluateandevaluatorsnamespaces have been removed, and everything previously exposed in those modules has been added to the root namespaceazure.ai.evaluation - The parameter name
project_scopein content safety evaluators have been renamed toazure_ai_projectfor consistency with evaluate API and simulators. - Model configurations classes are now of type
TypedDictand are exposed in theazure.ai.evaluationmodule instead of coming frompromptflow.core. - Updated the parameter names for
questionandanswerin built-in evaluators to more generic terms:queryandresponse.
- First preview
- This package is port of
promptflow-evals. New features will be added only to this package moving forward. - Added a
TypedDictforAzureAIProjectthat allows for better intellisense and type checking when passing in project information