diff --git a/docs/source/commands/trtllm-serve/run-benchmark-with-trtllm-serve.md b/docs/source/commands/trtllm-serve/run-benchmark-with-trtllm-serve.md index 76713213708..b9aed01cf7f 100644 --- a/docs/source/commands/trtllm-serve/run-benchmark-with-trtllm-serve.md +++ b/docs/source/commands/trtllm-serve/run-benchmark-with-trtllm-serve.md @@ -161,34 +161,36 @@ P99 E2EL (ms): 1643.44 For a single request, ITLs are the time intervals between tokens, while TPOT is the average of those intervals: -```math -\text{TPOT (1\ request)} = \text{Avg(ITL)} = \frac{\text{E2E\ latency} - \text{TTFT}}{\text{\#Output\ Tokens} - 1} -``` +$$ +\text{TPOT (1 request)} = \text{Avg(ITL)} = \frac{\text{E2E latency} - \text{TTFT}}{\text{#Output Tokens} - 1} +$$ Across different requests, **average TPOT** is the mean of each request's TPOT (all requests weighted equally), while **average ITL** is token-weighted (all tokens weighted equally): -```math +$$ \text{Avg TPOT (N requests)} = \frac{\text{TPOT}_1 + \text{TPOT}_2 + \cdots + \text{TPOT}_N}{N} -``` +$$ -```math -\text{Avg ITL (N requests)} = \frac{\text{Sum of all ITLs across requests}}{\text{\#Output Tokens across requests}} -``` +$$ +\text{Avg ITL (N requests)} = \frac{\text{Sum of all ITLs across requests}}{\text{#Output Tokens across requests}} +$$ #### End-to-End (E2E) Latency * The typical total time from when a request is submitted until the final token of the response is received. #### Total Token Throughput * The combined rate at which the system processes both input (prompt) tokens and output (generated) tokens. -```math -\text{Total\ TPS} = \frac{\text{\#Input\ Tokens}+\text{\#Output\ Tokens}}{T_{last} - T_{first}} -``` + +$$ +\text{Total TPS} = \frac{\text{#Input Tokens}+\text{#Output Tokens}}{T_{last} - T_{first}} +$$ #### Tokens Per Second (TPS) or Output Token Throughput * how many output tokens the system generates each second. -```math -\text{TPS} = \frac{\text{\#Output\ Tokens}}{T_{last} - T_{first}} -``` + +$$ +\text{TPS} = \frac{\text{#Output Tokens}}{T_{last} - T_{first}} +$$ ### Request Time Breakdown diff --git a/docs/source/deployment-guide/deployment-guide-for-deepseek-r1-on-trtllm.md b/docs/source/deployment-guide/deployment-guide-for-deepseek-r1-on-trtllm.md index 8b0b89ec885..5ad959eaeb7 100644 --- a/docs/source/deployment-guide/deployment-guide-for-deepseek-r1-on-trtllm.md +++ b/docs/source/deployment-guide/deployment-guide-for-deepseek-r1-on-trtllm.md @@ -399,31 +399,33 @@ P99 E2EL (ms): [result] For a single request, ITLs are the time intervals between tokens, while TPOT is the average of those intervals: -```math -\text{TPOT (1\ request)} = \text{Avg(ITL)} = \frac{\text{E2E\ latency} - \text{TTFT}}{\text{\#Output\ Tokens} - 1} -``` +$$ +\text{TPOT (1 request)} = \text{Avg(ITL)} = \frac{\text{E2E latency} - \text{TTFT}}{\text{#Output Tokens} - 1} +$$ Across different requests, **average TPOT** is the mean of each request's TPOT (all requests weighted equally), while **average ITL** is token-weighted (all tokens weighted equally): -```math +$$ \text{Avg TPOT (N requests)} = \frac{\text{TPOT}_1 + \text{TPOT}_2 + \cdots + \text{TPOT}_N}{N} -``` +$$ -```math -\text{Avg ITL (N requests)} = \frac{\text{Sum of all ITLs across requests}}{\text{\#Output Tokens across requests}} -``` +$$ +\text{Avg ITL (N requests)} = \frac{\text{Sum of all ITLs across requests}}{\text{#Output Tokens across requests}} +$$ #### End-to-End (E2E) Latency * The typical total time from when a request is submitted until the final token of the response is received. #### Total Token Throughput * The combined rate at which the system processes both input (prompt) tokens and output (generated) tokens. -```math -\text{Total\ TPS} = \frac{\text{\#Input\ Tokens}+\text{\#Output\ Tokens}}{T_{last} - T_{first}} -``` + +$$ +\text{Total TPS} = \frac{\text{#Input Tokens}+\text{#Output Tokens}}{T_{last} - T_{first}} +$$ #### Tokens Per Second (TPS) or Output Token Throughput * how many output tokens the system generates each second. -```math -\text{TPS} = \frac{\text{\#Output\ Tokens}}{T_{last} - T_{first}} -``` + +$$ +\text{TPS} = \frac{\text{#Output Tokens}}{T_{last} - T_{first}} +$$ diff --git a/docs/source/deployment-guide/deployment-guide-for-gpt-oss-on-trtllm.md b/docs/source/deployment-guide/deployment-guide-for-gpt-oss-on-trtllm.md index 7c8c5511276..9378eec095a 100644 --- a/docs/source/deployment-guide/deployment-guide-for-gpt-oss-on-trtllm.md +++ b/docs/source/deployment-guide/deployment-guide-for-gpt-oss-on-trtllm.md @@ -349,31 +349,33 @@ P99 E2EL (ms): [result] For a single request, ITLs are the time intervals between tokens, while TPOT is the average of those intervals: -```math -\text{TPOT (1\ request)} = \text{Avg(ITL)} = \frac{\text{E2E\ latency} - \text{TTFT}}{\text{\#Output\ Tokens} - 1} -``` +$$ +\text{TPOT (1 request)} = \text{Avg(ITL)} = \frac{\text{E2E latency} - \text{TTFT}}{\text{#Output Tokens} - 1} +$$ Across different requests, **average TPOT** is the mean of each request's TPOT (all requests weighted equally), while **average ITL** is token-weighted (all tokens weighted equally): -```math +$$ \text{Avg TPOT (N requests)} = \frac{\text{TPOT}_1 + \text{TPOT}_2 + \cdots + \text{TPOT}_N}{N} -``` +$$ -```math -\text{Avg ITL (N requests)} = \frac{\text{Sum of all ITLs across requests}}{\text{\#Output Tokens across requests}} -``` +$$ +\text{Avg ITL (N requests)} = \frac{\text{Sum of all ITLs across requests}}{\text{#Output Tokens across requests}} +$$ #### End-to-End (E2E) Latency * The typical total time from when a request is submitted until the final token of the response is received. #### Total Token Throughput * The combined rate at which the system processes both input (prompt) tokens and output (generated) tokens. -```math -\text{Total\ TPS} = \frac{\text{\#Input\ Tokens}+\text{\#Output\ Tokens}}{T_{last} - T_{first}} -``` + +$$ +\text{Total TPS} = \frac{\text{#Input Tokens}+\text{#Output Tokens}}{T_{last} - T_{first}} +$$ #### Tokens Per Second (TPS) or Output Token Throughput * how many output tokens the system generates each second. -```math -\text{TPS} = \frac{\text{\#Output\ Tokens}}{T_{last} - T_{first}} -``` + +$$ +\text{TPS} = \frac{\text{#Output Tokens}}{T_{last} - T_{first}} +$$ diff --git a/docs/source/deployment-guide/deployment-guide-for-llama3.3-70b-on-trtllm.md b/docs/source/deployment-guide/deployment-guide-for-llama3.3-70b-on-trtllm.md index 6c16d1c2ca5..e7709a17aa5 100644 --- a/docs/source/deployment-guide/deployment-guide-for-llama3.3-70b-on-trtllm.md +++ b/docs/source/deployment-guide/deployment-guide-for-llama3.3-70b-on-trtllm.md @@ -354,31 +354,33 @@ P99 E2EL (ms): [result] For a single request, ITLs are the time intervals between tokens, while TPOT is the average of those intervals: -```math -\text{TPOT (1\ request)} = \text{Avg(ITL)} = \frac{\text{E2E\ latency} - \text{TTFT}}{\text{\#Output\ Tokens} - 1} -``` +$$ +\text{TPOT (1 request)} = \text{Avg(ITL)} = \frac{\text{E2E latency} - \text{TTFT}}{\text{#Output Tokens} - 1} +$$ Across different requests, **average TPOT** is the mean of each request's TPOT (all requests weighted equally), while **average ITL** is token-weighted (all tokens weighted equally): -```math +$$ \text{Avg TPOT (N requests)} = \frac{\text{TPOT}_1 + \text{TPOT}_2 + \cdots + \text{TPOT}_N}{N} -``` +$$ -```math -\text{Avg ITL (N requests)} = \frac{\text{Sum of all ITLs across requests}}{\text{\#Output Tokens across requests}} -``` +$$ +\text{Avg ITL (N requests)} = \frac{\text{Sum of all ITLs across requests}}{\text{#Output Tokens across requests}} +$$ #### End-to-End (E2E) Latency * The typical total time from when a request is submitted until the final token of the response is received. #### Total Token Throughput * The combined rate at which the system processes both input (prompt) tokens and output (generated) tokens. -```math -\text{Total\ TPS} = \frac{\text{\#Input\ Tokens}+\text{\#Output\ Tokens}}{T_{last} - T_{first}} -``` + +$$ +\text{Total TPS} = \frac{\text{#Input Tokens}+\text{#Output Tokens}}{T_{last} - T_{first}} +$$ #### Tokens Per Second (TPS) or Output Token Throughput * how many output tokens the system generates each second. -```math -\text{TPS} = \frac{\text{\#Output\ Tokens}}{T_{last} - T_{first}} -``` + +$$ +\text{TPS} = \frac{\text{#Output Tokens}}{T_{last} - T_{first}} +$$ diff --git a/docs/source/deployment-guide/deployment-guide-for-llama4-scout-on-trtllm.md b/docs/source/deployment-guide/deployment-guide-for-llama4-scout-on-trtllm.md index 9fb6b6165af..60c2b306c75 100644 --- a/docs/source/deployment-guide/deployment-guide-for-llama4-scout-on-trtllm.md +++ b/docs/source/deployment-guide/deployment-guide-for-llama4-scout-on-trtllm.md @@ -346,31 +346,33 @@ P99 E2EL (ms): [result] For a single request, ITLs are the time intervals between tokens, while TPOT is the average of those intervals: -```math -\text{TPOT (1\ request)} = \text{Avg(ITL)} = \frac{\text{E2E\ latency} - \text{TTFT}}{\text{\#Output\ Tokens} - 1} -``` +$$ +\text{TPOT (1 request)} = \text{Avg(ITL)} = \frac{\text{E2E latency} - \text{TTFT}}{\text{#Output Tokens} - 1} +$$ Across different requests, **average TPOT** is the mean of each request's TPOT (all requests weighted equally), while **average ITL** is token-weighted (all tokens weighted equally): -```math +$$ \text{Avg TPOT (N requests)} = \frac{\text{TPOT}_1 + \text{TPOT}_2 + \cdots + \text{TPOT}_N}{N} -``` +$$ -```math -\text{Avg ITL (N requests)} = \frac{\text{Sum of all ITLs across requests}}{\text{\#Output Tokens across requests}} -``` +$$ +\text{Avg ITL (N requests)} = \frac{\text{Sum of all ITLs across requests}}{\text{#Output Tokens across requests}} +$$ #### End-to-End (E2E) Latency * The typical total time from when a request is submitted until the final token of the response is received. #### Total Token Throughput * The combined rate at which the system processes both input (prompt) tokens and output (generated) tokens. -```math -\text{Total\ TPS} = \frac{\text{\#Input\ Tokens}+\text{\#Output\ Tokens}}{T_{last} - T_{first}} -``` + +$$ +\text{Total TPS} = \frac{\text{#Input Tokens}+\text{#Output Tokens}}{T_{last} - T_{first}} +$$ #### Tokens Per Second (TPS) or Output Token Throughput * how many output tokens the system generates each second. -```math -\text{TPS} = \frac{\text{\#Output\ Tokens}}{T_{last} - T_{first}} -``` + +$$ +\text{TPS} = \frac{\text{#Output Tokens}}{T_{last} - T_{first}} +$$