Skip to content

Conversation

outtersg
Copy link
Contributor

@outtersg outtersg commented Oct 5, 2024

As stated in the UPGRADING, using the passthrough ("single-row") mode of libpq (introduced in #15287) forbids passing a new query while the current one's results have not been entirely consumed.
… But I didn't notice that ext/pdo_pgsql internally used new queries to fetch metadata (example use case: a call to getColumnMeta() while fetch()ing row by row will interleave the getColumnMeta()-triggered internal query to the database, with the results fetching for the user-called query).

This PR makes those internal calls return NULL for non-essential metadata, instead of letting libpq abort the user-called query.

It moreover includes a small tweak to table oid-to-name translation, with a 1-slot cache.
This may by chance allow the internal call to return something instead of NULL,
but it will nonetheless avoid 30 server calls to get the table name of 30 columns of the same table.

@outtersg
Copy link
Contributor Author

outtersg commented Oct 7, 2024

I noticed this week-end that pgsql_driver.c too was full of PQexec (for internal needs).
So those calls too need to be protected the same way (by ensuring that no running statement has been let in a running state).

So 44afde3 exposes pgsql_stmt_finish(), and makes pgsql_driver.c use it extensively (only when the last run statement is in single-row / unbuffered / lazy fetch mode).

… But then I noticed some patterns that could be factorized in pgsql_stmt_finish() (while((pgsql_result = PQgetResult(H->server))) PQclear(pgsql_result);).
I'm a bit stressed that factorized code could break things that presently work, so for now I just put a comment showing the mutualization opportunity (and the result is quite stable, I'm currently testing it with production cases),
but should I fully explore finishing the code mutualization?

@devnexen
Copy link
Member

devnexen commented Oct 8, 2024

… But then I noticed some patterns that could be factorized in pgsql_stmt_finish() (while((pgsql_result = PQgetResult(H->server))) PQclear(pgsql_result);). I'm a bit stressed that factorized code could break things that presently work, so for now I just put a comment showing the mutualization opportunity (and the result is quite stable, I'm currently testing it with production cases), but should I fully explore finishing the code mutualization?

Sure, you can always explore in your own fork for a while and see how it goes. 8.5 release is in 1 year + :)

@outtersg
Copy link
Contributor Author

@devnexen wrote:

Sure, you can always explore in your own fork for a while and see how it goes. 8.5 release is in 1 year + :)

That's a good point!

Now as I both rebased and amended, the changes are not that clear in the last diff :-\
Here starts the change to pdo_pgsql.

@outtersg outtersg force-pushed the gh15287-columnmeta branch 4 times, most recently from be8ac25 to 75fd500 Compare October 24, 2024 04:53
@outtersg outtersg marked this pull request as ready for review November 26, 2024 21:04
@outtersg
Copy link
Contributor Author

outtersg commented Nov 26, 2024

Now that 8.4 is out, it's time to think of 8.5.

Although the nature of the PR (fixing an incompatibility between prefetch mode and metadata reading) could have made it a candidate for a 8.4.x,
the changes make it sufficently risky that I prefer keeping the 8.5 target (and if necessary document the 8.4 with a warning that "for now, entering the prefetch mode prevents using getColumnMeta() or anythng else meta calling the database's information schema").

This is a first, working pass on the subject, while I found time to code it: it is a technically self-sustainable fix, although we could go further in what pgsql_finish_running_stmt() is a first try at: separating functional PQexecs (the ones from exec(), passing the user queries) from technical ones (called internally by getColumnMeta() and such).
Then:

  • any internal query, depending on a mandatory parameter, would either throw ("Cannot access the meta schema in the middle of a result consumption") or silently return nothing (e.g. in getColumnMeta(), if we cannot resolve a user-defined type to its pretty name, we can still return its OID).
  • later, when implementing libpq's pipeline mode, we could implement in this centralized function the complexity of taking advantage of the pipeline mode, to direct those synchronous meta-select queries to a dedicated "channel" where they could execute even while the user-pulled result is in the middle of its fetch.

But as I am not sure of the time I could devote to this ideal solution,
let's keep this PR focused on its first goal of just preventing unexplainable libpq error codes.

…lumnMeta()

- each call queried the DB to know the name associated with the table's OID:
  cache the result between two calls
- make pdo_pgsql_translate_oid_to_table higher-level,
  with the last parameter being the handle instead of the raw connection;
  thus the statement is cleaner, letting the handle do all memory handling on the table oid-to-name translation cache
  (which by the way is a driver feature more than a statement one)
doing an (internal) query to fetch metadata from the server broke the currently-running query;
additionnally, fix the condition to interrupt the previous request (we shall test if *it* was lazy, not if the *new* one will)
@outtersg outtersg force-pushed the gh15287-columnmeta branch from 75fd500 to d30395f Compare August 17, 2025 21:01
@outtersg
Copy link
Contributor Author

@devnexen, @SakiTakamachi, I didn't see the time go by, and it seems to me that including this robustness PR would be a good thing to avoid dragging ourselves around with a feature still holding the risk of unexpectedly truncated results or even garbled connection: the safeguards added in this PR (before each PQexec()) would be more reliable than simply documenting that "some functions that may internally call the DB server should not be used while in ATTR_PREFETCH mode".

May you take time to reevaluate its cleanliness?

@devnexen
Copy link
Member

devnexen commented Sep 2, 2025

Hi @outtersg neither did I notice the time passing. Once 8.5 branch is created, I ll circle back to it.

@devnexen
Copy link
Member

@devnexen, @SakiTakamachi, I didn't see the time go by, and it seems to me that including this robustness PR would be a good thing to avoid dragging ourselves around with a feature still holding the risk of unexpectedly truncated results or even garbled connection: the safeguards added in this PR (before each PQexec()) would be more reliable than simply documenting that "some functions that may internally call the DB server should not be used while in ATTR_PREFETCH mode".

May you take time to reevaluate its cleanliness?

Hi @outtersg now is more timely, I ll a deeper look sometime this weekend, thanks for your effort (and patience).

@devnexen devnexen self-assigned this Sep 25, 2025
@devnexen
Copy link
Member

looking good minus the minor nit.

…l queries

not only the statements, but the driver, love to make internal queries. We make sure no unfinished query still runs when having to pass an internal one.
by the way factorize the loops that consumed the preceding query's results
we unconditionally set it, but conditionally unset it, letting a dangling pointer that was tentatively freed a second time
@devnexen devnexen closed this in 8982351 Sep 28, 2025
@devnexen
Copy link
Member

@outtersg unfortunately the fix brings memory issues, one memory leak and above all a UAF. I ll have to revert it for the time being so we have time to have a closer look.

@devnexen
Copy link
Member

@outtersg
Copy link
Contributor Author

outtersg commented Sep 29, 2025

@outtersg unfortunately the fix brings memory issues, one memory leak and above all a UAF. I ll have to revert it for the time being so we have time to have a closer look.

OK, the memory leak is easy.
Inside the function I added an estrdup to the cache (in addition to the existing estrdup to the return value): 2 allocations.
But for the returned value, when calling the function, when I replaced the use of the function's return value in add_assoc_string() by the use of the cached value, I removed the efree of the returned value. Thus we loose this efree, and only have 1 cleanup left (the one of the cache that has its own cycle, freing when a new cache takes its place, or on destruction). 2 allocations vs 1 free make the leak.

I think my intention was to only rely on cached_table_name and its long lifecycle. However the direct use of it is bad design because cached_table_name should be considered private to pdo_pgsql_translate_oid_to_table (and to cleanup functions of course): a caller should never access it directly (supposing it has been updated in pdo_pgsql_translate_oid_to_table()), the only existing contract is that pdo_pgsql_translate_oid_to_table() returns the table name.

So in any case, the caller should use the returned value, not cached_table_name. What gets interesting is the relation between both:

  • as they are coded now, they are independent (each one is allocated, and is (or should be) freed)
    this is robust and respects principle of least astonishment (each caller that gets a frees it, period).
    However we get 3 allocations (one for cached_table_name, one for the returned table_name, and one internal to add_assoc_string()).
  • however here the returned table_name is immediately consumed by add_assoc_string(), then freed. If we can guarantee this evanescent use will always work this way, we could simplify by telling that pdo_pgsql_translate_oid_to_table's returned string should always be copied elsewhere before the next call to pdo_pgsql_translate_oid_to_table(), or risk being overwritten.

I've chosen this second, conservative solution, restoring the heavy but secure triple-alloc.

@outtersg
Copy link
Contributor Author

outtersg commented Sep 29, 2025

The UAF is an egg-and-chicken problem: when statements are emitted, each one gets a copy of the handle.

When the statements are destroyed, they ensure the handle does not reference them anymore as the current statement.

However, on shutdown, zend_gc_collect_cycles calls all destructors out-of-order:
As seen in the ASAN output, pdo_dbh_free_storage() has been called before pdo_dbstmt_free_storage().

We'd need pdo_dbh_free_storage() able to notify each of its linked statement to detach from the tangling handle.

At least notifying the handle's running_stmt would avoid the problem on the last statement created, however can we guaranty there are not two statements yet to be destroyed? (the one we can notify, thus prevent to crash, as it is referenced by the handle; and the one we don't know anymore, as the reference to it got overwritten).

But in fact, I feel focusing on only the current statement is wrong, and, even if not familiar with internal memory handling, I feel there should be some kind of reference counting on the handle, to avoiding it being freed before all of its statements have been.
I'll have a look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants