Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
feat(fault-injection): Add fault injection API service
- FastAPI service for remote fault injection
- Endpoints for GPU XID injection, network faults
- Dockerfile for containerized deployment
- Requirements with FastAPI, kubernetes client

Provides HTTP API for triggering fault injection from tests.
  • Loading branch information
nv-oviya committed Nov 2, 2025
commit c1d67871df2da1c94a0e282b10776f097bee1efe
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# SPDX-License-Identifier: Apache-2.0

FROM python:3.12-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
&& rm -rf /var/lib/apt/lists/*

# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY main.py .

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1

# Run as non-root user
RUN useradd -m -u 1000 faultinjection && chown -R faultinjection:faultinjection /app
USER faultinjection

EXPOSE 8080

CMD ["python", "main.py"]

Loading
Loading