Skip to content

Pipeline tracing and Hayhooks cannot work at the same time #187

@ArnaudWald

Description

@ArnaudWald

Hello,
I am trying to setup Haystack pipelines with hayhooks and Arize Phoenix tracing enabled.

After following the docs for setup and wrapping the pipelines, they run fine when I start them without tracing.

However, having both execution and traceability would be a really good feature for my usecase. I followed the Phoenix instrumentation setup and I can trace pipelines when ran manually (without hayhooks).

When I try to do both at the same time, I get hit with ImportErrors and path issues that prevent me from doing that.


I'll try to make a reproducible example below:

In one terminal, I run phoenix serve

In another i have hayhooks run with HAYHOOKS_SHOW_TRACEBACKS=true

And my hello world pipeline_wrapper.py looks like this

from hayhooks import BasePipelineWrapper
from haystack import Pipeline, component
from openinference.instrumentation.haystack import HaystackInstrumentor
from phoenix.otel import register

tracer_provider = register(
    project_name="test_project",
    auto_instrument=True,
    endpoint="http://localhost:4317",
)

HaystackInstrumentor().instrument(tracer_provider=tracer_provider)


@component
class Hello:
    @component.output_types(output=str)
    def run(self, word: str):
        return {"output": f"Hello, {word}!"}


class PipelineWrapper(BasePipelineWrapper):
    def setup(self) -> None:
        # Create components
        greeter = Hello()
        # Build pipeline
        self.pipeline = Pipeline()
        self.pipeline.add_component("greeter", greeter)

    def run_api(self, input_text: str) -> str:
        result = self.pipeline.run({"word": input_text})
        return result["greeter"]["output"]

Deploy it like this

hayhooks pipeline deploy-files -n hello hello_world

Query it like this:

curl -X 'POST' \
  'http://localhost:1416/hello/run' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "input_text": "string"
}'

And the error:

Pipeline execution error: No module named 'hello' - Traceback (most recent call last):
  File "/home/awald/work_space/.venv/lib/python3.12/site-packages/hayhooks/server/utils/deploy_utils.py", line 186, in wrapper
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/awald/work_space/.venv/lib/python3.12/site-packages/hayhooks/server/utils/deploy_utils.py", line 242, in run_endpoint_without_files
    result = await run_in_threadpool(pipeline_wrapper.run_api, **run_req.model_dump())  # type:ignore[attr-defined]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/awald/work_space/.venv/lib/python3.12/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/awald/work_space/.venv/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/awald/work_space/.venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2485, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/home/awald/work_space/.venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 976, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/awald/work_space/pipelines/hello/pipeline_wrapper.py", line 31, in run_api
    result = self.pipeline.run({"word": input_text})
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/awald/work_space/.venv/lib/python3.12/site-packages/openinference/instrumentation/haystack/_wrappers.py", line 214, in __call__
    response = wrapped(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/awald/work_space/.venv/lib/python3.12/site-packages/haystack/core/pipeline/pipeline.py", line 385, in run
    component_outputs = self._run_component(
                        ^^^^^^^^^^^^^^^^^^^^
  File "/home/awald/work_space/.venv/lib/python3.12/site-packages/openinference/instrumentation/haystack/_wrappers.py", line 81, in __call__
    self._wrap_component_run_method(component_cls, component_instance.run)
  File "/home/awald/work_space/.venv/lib/python3.12/site-packages/openinference/instrumentation/haystack/__init__.py", line 105, in wrap_component_run_method
    wrap_function_wrapper(
  File "/home/awald/work_space/.venv/lib/python3.12/site-packages/wrapt/patches.py", line 114, in wrap_function_wrapper
    return wrap_object(module, name, FunctionWrapper, (wrapper,))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/awald/work_space/.venv/lib/python3.12/site-packages/wrapt/patches.py", line 60, in wrap_object
    (parent, attribute, original) = resolve_path(module, name)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/awald/work_space/.venv/lib/python3.12/site-packages/wrapt/patches.py", line 17, in resolve_path
    __import__(module)
ModuleNotFoundError: No module named 'hello'

Like I said above, when you comment out the Instrument lines and redeploy the pipeline (and restart the server) it works fine.

Looks like it comes from some wrappers around the functions, but I'm not familiar enough with the codebase to investigate further.

Any idea on how to fix this ?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions