Update env-variable names

Azure · dargilco · Jun 7, 2024 · Mar 26, 2024 · Mar 27, 2024 · Mar 27, 2024
commit 58b3669d9db6fb1412cd3db6de33f5684f271eb4
@@ -1,15 +1,16 @@
 # Azure model client library for Python
 
-The Azure AI Model Client Library allows you to do inference against any of AI models in you deployed to Azure. It supports both "model as a service" and "models with hosted managed infrastructure". For more information see [Overview: Deploy models, flows, and web apps with Azure AI Studio](https://learn.microsoft.com/azure/ai-studio/concepts/deployments-overview).
+The ModelClient Library allows you to do inference using AI models you deployed to Azure. It supports both serverless endpoints (aka "model as a service" (MaaS) or "pay as you go") and selfhosted endpoints (aka "model as a platform" (MaaP) or "real-time endpoints"). The ModelClient library makes services calls using REST AP version `2024-04-01-preview` specificed here (TODO: insert link). For more information see [Overview: Deploy models, flows, and web apps with Azure AI Studio](https://learn.microsoft.com/azure/ai-studio/concepts/deployments-overview).
 
-Use the model client library to:
+Use the ModelClient library to:
 
 * Authenticate against the service
+* Get information about the model
 * Get chat completions
 * Get embeddings
 * Generate an image from a text prompt
 
-Note that for inference of OpenAI models hosted on azure you should be using the [OpenAI Python client library](https://github.com/openai/openai-python) instead of this client.
+Note that for inference using OpenAI models hosted on Azure you should be using the [OpenAI Python client library](https://github.com/openai/openai-python) instead of this client.
 
 [Product documentation](https://learn.microsoft.com/azure/ai-studio/concepts/deployments-overview)
 | [Samples](https://aka.ms/azsdk/model-client/samples/python)
@@ -23,93 +24,71 @@ Note that for inference of OpenAI models hosted on azure you should be using the
 
 * [Python 3.8](https://www.python.org/) or later installed, including [pip](https://pip.pypa.io/en/stable/).
 * An [Azure subscription](https://azure.microsoft.com/free).
-* A [TBD resource](https://azure.microsoft.com/) in your Azure subscription. You will need the key and endpoint from this resource to authenticate against the service.
+* An [AI Model from the catalog](https://ai.azure.com/explore/models) deployed through Azure AI Studio. To construct the `ModelClient`, you will need to pass in the endpoint URL and key associated with your deployed AI model.
+
+  * The endpoint URL has the form `https://your-deployment-name.your-azure-region.inference.ai.azure.com`, where `your-deployment-name` is your unique model deployment name and `your-azure-region` is the Azure region where the model is deployed (e.g. `eastus2`).
+
+  * The key is a 32-character string.
 
 ### Install the Model Client package
 
 ```bash
 pip install azure-ai-inferencing
 ```
 
-### Set environment variables
-
-To authenticate the `ModelClient`, you will need the endpoint and key from your TBD resource in the [Azure Portal](https://portal.azure.com). The code snippet below assumes these values are stored in environment variables:
-
-* Set the environment variable `MODEL_ENDPOINT` to the endpoint URL. It has the form `https://your-model-deployment-name.your-azure-region.inference.ai.azure.com`, where `your-model-deployment-name` is your unique TBD resource name.
-
-* Set the environment variable `MODEL_KEY` to the key. The key is a 32-character string.
-
-Note that the client library does not directly read these environment variable at run time. The endpoint and key must be provided to the constructor of `ModelClient` in your code. The code snippet below reads environment variables to promote the practice of not hard-coding secrets in your source code.
-
 ### Create and authenticate the client
 
-Once you define the environment variables, this Python code will create and authenticate a synchronous `ModelClient`:
+Assuming `endpoint` and `key` are strings holding your endpoint URL and key, this Python code will create and authenticate a synchronous `ModelClient`:
 
 <!-- SNIPPET:sample_chat_completions.create_client -->
 
 ```python
-import os
 from azure.ai.inference import ModelClient
-from azure.ai.inference.models import ChatRequestSystemMessage, ChatRequestUserMessage
 from azure.core.credentials import AzureKeyCredential
 
-# [START logging]
-import sys
-import logging
-
-# Acquire the logger for this client library. Use 'azure' to affect both
-# 'azure.core` and `azure.ai.vision.imageanalysis' libraries.
-logger = logging.getLogger("azure")
-
-# Set the desired logging level. logging.INFO or logging.DEBUG are good options.
-logger.setLevel(logging.DEBUG)
-
-# Direct logging output to stdout (the default):
-handler = logging.StreamHandler(stream=sys.stdout)
-# Or direct logging output to a file:
-# handler = logging.FileHandler(filename = 'sample.log')
-logger.addHandler(handler)
-
-# Optional: change the default logging format. Here we add a timestamp.
-formatter = logging.Formatter("%(asctime)s:%(levelname)s:%(name)s:%(message)s")
-handler.setFormatter(formatter)
+# Create Model Client for synchronous operations
+client = ModelClient(
+    endpoint=endpoint,
+    credential=AzureKeyCredential(key)
+)
 ```
 
 <!-- END SNIPPET -->
 
-A synchronous client supports synchronous inference methods, meaning they will block until the service responds with inference results. The code snippets below all use synchronous methods because it's easier for a getting-started guide. The SDK offers equivalent asynchronous APIs which are often preferred. To create an asynchronous client, do the following:
+A synchronous client supports synchronous inference methods, meaning they will block until the service responds with inference results. For simplicity the code snippets below all use synchronous methods. The client offers equivalent asynchronous methods which are more commonly used in production.
 
-* Update the above code to import `ModelClient` from the `aio` namespace:
+To create an asynchronous client, Install the additional package [aiohttp](https://pypi.org/project/aiohttp/):
 
-    ```python
-    from azure.ai.inference.aio import ModelClient
-    ```
+```bash
+    pip install aiohttp
+```
 
-* Install the additional package [aiohttp](https://pypi.org/project/aiohttp/):
+and update the code above to import `ModelClient` from the `aio` namespace:
 
-    ```bash
-    pip install aiohttp
-    ```
+```python
+    import asyncio
+    from azure.ai.inference.aio import ModelClient
+```
 
 ## Key concepts
 
 ### Chat Completions
 
-TBD
+TODO: Add overview and link to explain chat completions.
 
-Target the `/v1/chat/completions` route
+Chat completion operations target the URL route `/v1/chat/completions` on the provided endpoint.
 
 ### Embeddings
 
-TBD
+TODO: Add overview and link to explain embeddings.
 
-Target the `/v1/embeddings` route
+Embeddings operations target the URL route `/v1/embeddings` on the provided endpoint.
 
 ### Image Generation
 
-TBD
+TODO: Add overview and link to explain image generation.
 
-Target the `/images/generations` route
+Image generation operations target the URL route `/images/generations` on the provided endpoint.
 
 ## Examples
 
@@ -125,7 +104,7 @@ See the [Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai
 
 ### Chat completions example
 
-This example demonstrates how to generate chat completions.
+This example demonstrates how to generate a single chat completions.
 
 <!-- SNIPPET:sample_chat_completions.chat_completions -->
 
@@ -142,11 +121,10 @@ result = client.get_chat_completions(
 
 # Print results the the console
 print("Chat Completions:")
-for index, choice in enumerate(result.choices):
-    print(f"choices[{index}].message.content: {choice.message.content}")
-    print(f"choices[{index}].message.role: {choice.message.role}")
-    print(f"choices[{index}].finish_reason: {choice.finish_reason}")
-    print(f"choices[{index}].index: {choice.index}")
+print(f"choices[0].message.content: {result.choices[0].message.content}")
+print(f"choices[0].message.role: {result.choices[0].message.role}")
+print(f"choices[0].finish_reason: {result.choices[0].finish_reason}")
+print(f"choices[0].index: {result.choices[0].index}")
 print(f"id: {result.id}")
 print(f"created: {result.created}")
 print(f"model: {result.model}")
@@ -159,7 +137,7 @@ print(f"usage.total_tokens: {result.usage.total_tokens}")
 
 <!-- END SNIPPET -->
 
-To generate completions for additional messages, simply call `get_chat_completions` multiple times using the same `ModelClient`.
+To generate completions for additional messages, simply call `get_chat_completions` multiple times using the same `client`.
 
 ### Embeddings example
 
@@ -169,21 +147,17 @@ This example demonstrates how to get embeddings.
 
 ```python
 # Do a single embeddings operation. This will be a synchronously (blocking) call.
-result = client.get_embeddings(input=["first sentence", "second sentence", "third sentence"])
+result = client.get_embeddings(input=["first phrase", "second phrase", "third phrase"])
 
 # Print results the the console
 print("Embeddings result:")
-for index, item in enumerate(result.data):
-    len = item.embedding.__len__()
-    print(f"data[{index}].index: {item.index}")
-    print(f"data[{index}].embedding[0]: {item.embedding[0]}")
-    print(f"data[{index}].embedding[1]: {item.embedding[1]}")
-    print("...")
-    print(f"data[{index}].embedding[{len-2}]: {item.embedding[len-2]}")
-    print(f"data[{index}].embedding[{len-1}]: {item.embedding[len-1]}")
+for item in result.data:
+    length = len(item.embedding)
+    print(f"data[{item.index}]: length={length}, [{item.embedding[0]}, {item.embedding[1]}, ..., {item.embedding[length-2]}, {item.embedding[length-1]}]")
 print(f"id: {result.id}")
 print(f"model: {result.model}")
 print(f"object: {result.object}")
+print(f"usage.input_tokens: {result.usage.input_tokens}")
 print(f"usage.prompt_tokens: {result.usage.prompt_tokens}")
 print(f"usage.total_tokens: {result.usage.total_tokens}")
 ```
@@ -289,7 +263,7 @@ None redacted logs are generated for log level `logging.DEBUG` only. Be sure to
 
 ## Next steps
 
-* Have a look at the [Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/samples) folder, containing fully runnable Python code for Image Analysis (all visual features, synchronous and asynchronous clients, from image file or URL).
+* Have a look at the [Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/samples) folder, containing fully runnable Python code for doing inference using synchronous and asynchronous clients.
 
 ## Contributing
 

@@ -51,19 +51,34 @@ See [Prerequisites](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/
 
 ## Set environment variables
 
-See [Set environment variables](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/README.md#set-environment-variables) here.
+To construct the `ModelClient`, you will need to pass in the endpoint URL and key associated with your deployed AI model.
+
+* The endpoint URL has the form `https://your-deployment-name.your-azure-region.inference.ai.azure.com`, where `your-deployment-name` is your unique model deployment name and `your-azure-region` is the Azure region where the model is deployed (e.g. `eastus2`).
+
+* The key is a 32-character string.
+
+For convenience, and to promote the practice of not hard-coding secrets in your source code, all samples here assume the endpoint URL and key are stored in environment variables. You will need to set these environment variables before running the samples as-is. These are the environment variables used:
+
+| Sample type | Endpoint environment variable name | Key environment variable name  |
+|----------|----------|----------|
+| Chat completions | `CHAT_COMPLETIONS_ENDPOINT` | `CHAT_COMPLETIONS_KEY` |
+| Embeddings | `EMBEDDINGS_ENDPOINT` | `EMBEDDINGS_KEY` |
+| Image generation | `IMAGE_GENERATION_ENDPOINT` | `IMAGE_GENERATION_KEY` |
+
+Note that the client library does not directly read these environment variable at run time. The sample code reads the environment variables and constructs the `ModelClient` with this read values.
+
 
 ## Running the samples
 
 To run the first sample, type:
 ```bash
-python sample_chat_completion_async.py
+python sample_chat_completions.py
 ```
 similarly for the other samples.
 
 ## Example console output
 
-The sample `sample_chat_completion_async.py` sends the following system and user messages in a single call:
+The sample `sample_chat_completions.py` sends the following system and user messages in a single call:
 
 - System: "You are an AI assistant that helps people find information."
 - User: "How many feet are in a mile?"
@@ -72,17 +87,18 @@ And prints out the service response. It should look similar to the following:
 
 ```text
 Chat Completions:
-choices[0].message.content:   There are 5,280 feet in a mile.
+choices[0].message.content:   Hello! I'd be happy to help you find the answer to your question. There are 5,280 feet in a mile.
 choices[0].message.role: assistant
 choices[0].finish_reason: stop
 choices[0].index: 0
-id: 93f5bea2-11ec-4b31-af73-cb663196ebd5
-created: 1970-01-14 01:11:54+00:00
-model: Llama-2-70b-chat
+id: 77f08d7e-8127-431d-bed5-a814b78ddd80
+created: 1970-01-08 23:28:48+00:00
+model: Llama-2-13b-chat
 object: chat.completion
+usage.capacity_type: None
 usage.prompt_tokens: 41
-usage.completion_tokens: 15
-usage.total_tokens: 56
+usage.completion_tokens: 32
+usage.total_tokens: 73
 ```
 
 ## Troubleshooting

@@ -10,10 +10,11 @@
     python sample_chat_completion_async.py
 
     Set these two environment variables before running the sample:
-    1) MODEL_ENDPOINT - Your endpoint URL, in the form https://<deployment-name>.<azure-region>.inference.ai.azure.com
-                        where `deployment-name` is your unique AI Model deployment name, and
-                        `azure-region` is the Azure region where your model is deployed.
-    2) MODEL_KEY - Your model key (a 32-character string). Keep it secret.
+    1) CHAT_COMPLETIONS_ENDPOINT - Your endpoint URL, in the form
+        https://<your-deployment-name>.<your-azure-region>.inference.ai.azure.com
+        where `your-deployment-name` is your unique AI Model deployment name, and
+        `your-azure-region` is the Azure region where your model is deployed.
+    2) CHAT_COMPLETIONS_KEY - Your model key (a 32-character string). Keep it secret.
 """
 import asyncio
 
@@ -25,10 +26,10 @@ async def sample_chat_completions_async():
 
     # Read the values of your model endpoint and key from environment variables
     try:
-        endpoint = os.environ["MODEL_ENDPOINT"]
-        key = os.environ["MODEL_KEY"]
+        endpoint = os.environ["CHAT_COMPLETIONS_ENDPOINT"]
+        key = os.environ["CHAT_COMPLETIONS_KEY"]
     except KeyError:
-        print("Missing environment variable 'MODEL_ENDPOINT' or 'MODEL_KEY'")
+        print("Missing environment variable 'CHAT_COMPLETIONS_ENDPOINT' or 'CHAT_COMPLETIONS_KEY'")
         print("Set them before running this sample.")
         exit()
 
@@ -56,20 +57,19 @@ async def sample_chat_completions_async():
 
     # Print results the the console
     print("Chat Completions:")
-    for index, choice in enumerate(result.choices):
-        print(f"choices[{index}].message.content: {choice.message.content}")
-        print(f"choices[{index}].message.role: {choice.message.role}")
-        print(f"choices[{index}].finish_reason: {choice.finish_reason}")
-        print(f"choices[{index}].index: {choice.index}")
+    print(f"choices[0].message.content: {result.choices[0].message.content}")
+    print(f"choices[0].message.role: {result.choices[0].message.role}")
+    print(f"choices[0].finish_reason: {result.choices[0].finish_reason}")
+    print(f"choices[0].index: {result.choices[0].index}")
     print(f"id: {result.id}")
     print(f"created: {result.created}")
     print(f"model: {result.model}")
     print(f"object: {result.object}")
+    print(f"usage.capacity_type: {result.usage.capacity_type}")
     print(f"usage.prompt_tokens: {result.usage.prompt_tokens}")
     print(f"usage.completion_tokens: {result.usage.completion_tokens}")
     print(f"usage.total_tokens: {result.usage.total_tokens}")
 
-
 async def main():
     await sample_chat_completions_async()
 

@@ -10,10 +10,11 @@
     python sample_embeddings_async.py
 
     Set these two environment variables before running the sample:
-    1) MODEL_ENDPOINT - Your endpoint URL, in the form https://<deployment-name>.<azure-region>.inference.ai.azure.com
-                        where `deployment-name` is your unique AI Model deployment name, and
-                        `azure-region` is the Azure region where your model is deployed.
-    2) MODEL_KEY - Your model key (a 32-character string). Keep it secret.
+    1) EMBEDDINGS_ENDPOINT - Your endpoint URL, in the form
+        https://<your-deployment-name>.<your-azure-region>.inference.ai.azure.com
+        where `your-deployment-name` is your unique AI Model deployment name, and
+        `your-azure-region` is the Azure region where your model is deployed.
+    2) EMBEDDINGS_KEY - Your model key (a 32-character string). Keep it secret.
 """
 import asyncio
 
@@ -24,18 +25,18 @@ async def sample_embeddings_async():
 
     # Read the values of your model endpoint and key from environment variables
     try:
-        endpoint = os.environ["MODEL_ENDPOINT"]
-        key = os.environ["MODEL_KEY"]
+        endpoint = os.environ["EMBEDDINGS_ENDPOINT"]
+        key = os.environ["EMBEDDINGS_KEY"]
     except KeyError:
-        print("Missing environment variable 'MODEL_ENDPOINT' or 'MODEL_KEY'")
+        print("Missing environment variable 'EMBEDDINGS_ENDPOINT' or 'EMBEDDINGS_KEY'")
         print("Set them before running this sample.")
         exit()
 
     # Create an Image Analysis client for synchronous operations
     client = ModelClient(endpoint=endpoint, credential=AzureKeyCredential(key))
 
     # Do a single embeddings operation. Start the operation and get a Future object.
-    future = asyncio.ensure_future(client.get_embeddings(input=["first sentence", "second sentence", "third sentence"]))
+    future = asyncio.ensure_future(client.get_embeddings(input=["first phrase", "second phrase", "third phrase"]))
 
     # Loop until the operation is done
     while not future.done():
@@ -48,17 +49,13 @@ async def sample_embeddings_async():
 
     # Print results the the console
     print("Embeddings result:")
-    for index, item in enumerate(result.data):
-        len = item.embedding.__len__()
-        print(f"data[{index}].index: {item.index}")
-        print(f"data[{index}].embedding[0]: {item.embedding[0]}")
-        print(f"data[{index}].embedding[1]: {item.embedding[1]}")
-        print("...")
-        print(f"data[{index}].embedding[{len-2}]: {item.embedding[len-2]}")
-        print(f"data[{index}].embedding[{len-1}]: {item.embedding[len-1]}")
+    for item in result.data:
+        length = len(item.embedding)
+        print(f"data[{item.index}]: length={length}, [{item.embedding[0]}, {item.embedding[1]}, ..., {item.embedding[length-2]}, {item.embedding[length-1]}]")
     print(f"id: {result.id}")
     print(f"model: {result.model}")
     print(f"object: {result.object}")
+    print(f"usage.input_tokens: {result.usage.input_tokens}")
     print(f"usage.prompt_tokens: {result.usage.prompt_tokens}")
     print(f"usage.total_tokens: {result.usage.total_tokens}")