Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Mar 12, 2025

⚡️ This pull request contains optimizations for PR #7033

If you approve this dependent PR, these changes will be merged into the original PR branch codeflash/optimize-pr7032-2025-03-12T12.21.16.

This PR will be automatically closed if the original PR is merged.


📄 13% (0.13x) speedup for TableInput.validate_value in src/backend/base/langflow/inputs/inputs.py

⏱️ Runtime : 2.47 milliseconds 2.19 milliseconds (best of 22 runs)

📝 Explanation and details

Changes Made:

  1. Optimized the early return condition when the value is already a list of dictionaries or Data instances.
  2. Combined the type checks for dict and Data using Python 3.10's | operator (previously | was used inappropriately for type checking in isinstance).
  3. Simplified the logic by returning immediately once transformations are done or a valid list is identified.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 10 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage undefined
🌀 Generated Regression Tests Details
from typing import Any

# imports
import pytest  # used for our unit tests
from langflow.inputs.inputs import TableInput
from pandas import DataFrame
from pydantic import BaseModel, ValidationError, field_validator


# function to test
class Data(BaseModel):
    """Represents a record with text and optional data.

    Attributes:
        data (dict, optional): Additional data associated with the record.
    """

    text_key: str = "text"
    data: dict = {}
    default_value: str | None = ""

    @classmethod
    def validate_data(cls, values):
        if not isinstance(values, dict):
            msg = "Data must be a dictionary"
            raise ValueError(msg)
        if "data" not in values or values["data"] is None:
            values["data"] = {}
        if not isinstance(values["data"], dict):
            msg = (
                f"Invalid data format: expected dictionary but got {type(values).__name__}."
                " This will raise an error in version langflow==1.3.0."
            )
            print(msg)
        for key in values:
            if key not in values["data"] and key not in {"text_key", "data", "default_value"}:
                values["data"][key] = values[key]
        return values
from langflow.inputs.inputs import TableInput

# unit tests


def test_dataframe():
    # Test with a DataFrame input
    input_data = DataFrame({"name": ["Alice", "Bob"], "age": [30, 25]})
    expected_output = [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]
    codeflash_output = TableInput.validate_value(input_data, None)


def test_invalid_non_list_input():
    # Test with an invalid non-list input
    with pytest.raises(ValueError):
        TableInput.validate_value("Invalid String", None)

def test_invalid_list_with_non_dict_and_non_data():
    # Test with a list containing invalid items
    with pytest.raises(ValueError):
        TableInput.validate_value([{"name": "Alice", "age": 30}, "Invalid String"], None)

def test_large_scale_dataframe():
    # Test with a large DataFrame input
    input_data = DataFrame({"name": ["Alice"] * 1000, "age": [30] * 1000})
    expected_output = [{"name": "Alice", "age": 30}] * 1000
    codeflash_output = TableInput.validate_value(input_data, None)

def test_large_scale_list_of_dicts():
    # Test with a large list of dictionaries input
    input_data = [{"name": "Alice", "age": 30}] * 1000
    expected_output = input_data
    codeflash_output = TableInput.validate_value(input_data, None)


def test_edge_case_empty_dict():
    # Test with an empty dictionary
    input_data = {}
    expected_output = [{}]
    codeflash_output = TableInput.validate_value(input_data, None)


def test_edge_case_empty_list():
    # Test with an empty list
    input_data = []
    expected_output = []
    codeflash_output = TableInput.validate_value(input_data, None)

def test_edge_case_empty_dataframe():
    # Test with an empty DataFrame
    input_data = DataFrame(columns=["name", "age"])
    expected_output = []
    codeflash_output = TableInput.validate_value(input_data, None)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import copy
from typing import Any, cast

import pandas as pd  # used for DataFrame creation
# imports
import pytest  # used for our unit tests
# function to test
from langflow.inputs.input_mixin import (BaseInputMixin, ListableInputMixin,
                                         MetadataTraceMixin, TableMixin,
                                         ToolModeMixin)
from langflow.inputs.inputs import TableInput
from pandas import DataFrame
from pydantic import (BaseModel, ConfigDict, field_validator, model_serializer,
                      model_validator)


class Data(BaseModel):
    """Represents a record with text and optional data.

    Attributes:
        data (dict, optional): Additional data associated with the record.
    """

    model_config = ConfigDict(validate_assignment=True)

    text_key: str = "text"
    data: dict = {}
    default_value: str | None = ""

    @model_validator(mode="before")
    @classmethod
    def validate_data(cls, values):
        if not isinstance(values, dict):
            msg = "Data must be a dictionary"
            raise ValueError(msg)  # noqa: TRY004
        if "data" not in values or values["data"] is None:
            values["data"] = {}
        if not isinstance(values["data"], dict):
            msg = (
                f"Invalid data format: expected dictionary but got {type(values).__name__}."
                " This will raise an error in version langflow==1.3.0."
            )
            logger.warning(msg)
        # Any other keyword should be added to the data dictionary
        for key in values:
            if key not in values["data"] and key not in {"text_key", "data", "default_value"}:
                values["data"][key] = values[key]
        return values

    @model_serializer(mode="plain", when_used="json")
    def serialize_model(self):
        return {k: v.to_json() if hasattr(v, "to_json") else v for k, v in self.data.items()}

    def get_text(self):
        """Retrieves the text value from the data dictionary.

        If the text key is present in the data dictionary, the corresponding value is returned.
        Otherwise, the default value is returned.

        Returns:
            The text value from the data dictionary or the default value.
        """
        return self.data.get(self.text_key, self.default_value)

    def set_text(self, text: str | None) -> str:
        r"""Sets the text value in the data dictionary.

        The object's `text` value is set to `text parameter as given, with the following modifications:

         - `text` value of `None` is converted to an empty string.
         - `text` value is converted to `str` type.

        Args:
            text (str): The text to be set in the data dictionary.

        Returns:
            str: The text value that was set in the data dictionary.
        """
        new_text = "" if text is None else str(text)
        self.data[self.text_key] = new_text
        return new_text

    @classmethod
    def from_document(cls, document: Document) -> "Data":
        """Converts a Document to a Data.

        Args:
            document (Document): The Document to convert.

        Returns:
            Data: The converted Data.
        """
        data = document.metadata
        data["text"] = document.page_content
        return cls(data=data, text_key="text")

    @classmethod
    def from_lc_message(cls, message: BaseMessage) -> "Data":
        """Converts a BaseMessage to a Data.

        Args:
            message (BaseMessage): The BaseMessage to convert.

        Returns:
            Data: The converted Data.
        """
        data: dict = {"text": message.content}
        data["metadata"] = cast("dict", message.to_json())
        return cls(data=data, text_key="text")

    def __add__(self, other: "Data") -> "Data":
        """Combines the data of two data by attempting to add values for overlapping keys.

        Combines the data of two data by attempting to add values for overlapping keys
        for all types that support the addition operation. Falls back to the value from 'other'
        record when addition is not supported.
        """
        combined_data = self.data.copy()
        for key, value in other.data.items():
            # If the key exists in both data and both values support the addition operation
            if key in combined_data:
                try:
                    combined_data[key] += value
                except TypeError:
                    # Fallback: Use the value from 'other' record if addition is not supported
                    combined_data[key] = value
            else:
                # If the key is not in the first record, simply add it
                combined_data[key] = value

        return Data(data=combined_data)

    def to_lc_document(self) -> Document:
        """Converts the Data to a Document.

        Returns:
            Document: The converted Document.
        """
        data_copy = self.data.copy()
        text = data_copy.pop(self.text_key, self.default_value)
        if isinstance(text, str):
            return Document(page_content=text, metadata=data_copy)
        return Document(page_content=str(text), metadata=data_copy)

    def to_lc_message(
        self,
    ) -> BaseMessage:
        """Converts the Data to a BaseMessage.

        Returns:
            BaseMessage: The converted BaseMessage.
        """
        # The idea of this function is to be a helper to convert a Data to a BaseMessage
        # It will use the "sender" key to determine if the message is Human or AI
        # If the key is not present, it will default to AI
        # But first we check if all required keys are present in the data dictionary
        # they are: "text", "sender"
        if not all(key in self.data for key in ["text", "sender"]):
            msg = f"Missing required keys ('text', 'sender') in Data: {self.data}"
            raise ValueError(msg)
        sender = self.data.get("sender", MESSAGE_SENDER_AI)
        text = self.data.get("text", "")
        files = self.data.get("files", [])
        if sender == MESSAGE_SENDER_USER:
            if files:
                contents = [{"type": "text", "text": text}]
                for file_path in files:
                    image_url = create_data_url(file_path)
                    contents.append({"type": "image_url", "image_url": {"url": image_url}})
                human_message = HumanMessage(content=contents)
            else:
                human_message = HumanMessage(
                    content=[{"type": "text", "text": text}],
                )

            return human_message

        return AIMessage(content=text)

    def __getattr__(self, key):
        """Allows attribute-like access to the data dictionary."""
        try:
            if key.startswith("__"):
                return self.__getattribute__(key)
            if key in {"data", "text_key"} or key.startswith("_"):
                return super().__getattr__(key)
            return self.data[key]
        except KeyError as e:
            # Fallback to default behavior to raise AttributeError for undefined attributes
            msg = f"'{type(self).__name__}' object has no attribute '{key}'"
            raise AttributeError(msg) from e

    def __setattr__(self, key, value) -> None:
        """Set attribute-like values in the data dictionary.

        Allows attribute-like setting of values in the data dictionary.
        while still allowing direct assignment to class attributes.
        """
        if key in {"data", "text_key"} or key.startswith("_"):
            super().__setattr__(key, value)
        elif key in self.model_fields:
            self.data[key] = value
            super().__setattr__(key, value)
        else:
            self.data[key] = value

    def __delattr__(self, key) -> None:
        """Allows attribute-like deletion from the data dictionary."""
        if key in {"data", "text_key"} or key.startswith("_"):
            super().__delattr__(key)
        else:
            del self.data[key]

    def __deepcopy__(self, memo):
        """Custom deepcopy implementation to handle copying of the Data object."""
        # Create a new Data object with a deep copy of the data dictionary
        return Data(data=copy.deepcopy(self.data, memo), text_key=self.text_key, default_value=self.default_value)

    # check which attributes the Data has by checking the keys in the data dictionary
    def __dir__(self):
        return super().__dir__() + list(self.data.keys())

    def __str__(self) -> str:
        # return a JSON string representation of the Data atributes
        try:
            data = {k: v.to_json() if hasattr(v, "to_json") else v for k, v in self.data.items()}
            return serialize_data(data)  # use the custom serializer
        except Exception:  # noqa: BLE001
            logger.opt(exception=True).debug("Error converting Data to JSON")
            return str(self.data)

    def __contains__(self, key) -> bool:
        return key in self.data

    def __eq__(self, /, other):
        return isinstance(other, Data) and self.data == other.data
from langflow.inputs.inputs import TableInput

# unit tests

# Basic Valid Inputs

To edit these changes git checkout codeflash/optimize-pr7033-2025-03-12T12.38.02 and push.

Codeflash

…odeflash/optimize-pr7032-2025-03-12T12.21.16`)

**Changes Made:**
1. Optimized the early return condition when the value is already a list of dictionaries or Data instances.
2. Combined the type checks for `dict` and `Data` using Python 3.10's `|` operator (previously `|` was used inappropriately for type checking in `isinstance`).
3. Simplified the logic by returning immediately once transformations are done or a valid list is identified.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Mar 12, 2025
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. enhancement New feature or request labels Mar 12, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 12, 2025

CodSpeed Performance Report

Merging #7035 will degrade performances by 10.49%

Comparing codeflash/optimize-pr7033-2025-03-12T12.38.02 (a447cef) with codeflash/optimize-pr7032-2025-03-12T12.21.16 (4221dec)

Summary

⚡ 1 improvements
❌ 1 regressions
✅ 17 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
test_build_flow_invalid_job_id 12.5 ms 9.1 ms +37.99%
test_cancel_nonexistent_build 9.4 ms 10.5 ms -10.49%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI enhancement New feature or request size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant