Issue
In dbSendQuery the disposition parameter is hardcoded and set to EXTERNAL_LINKS (for dbGetQuery on the other hand we can set the disposition ourselves). That is, in an environment where blob.core.windows.net is blocked by teh firewall, we cannot use dbSendQuery.
The problem becomes apparent if we want to use tbl, because under the hood it will eventually call dbSendQuery.
Call Chain: tbl() → dbSendQuery()
tbl(con, "table")
└─ dbplyr::tbl.DBIConnection()
└─ dbplyr::tbl_sql()
└─ builds a lazy "tbl_lazy" object
└─ NO SQL executed yet — only triggered by collect()!
collect(lazy_tbl)
└─ dbplyr::collect.tbl_sql()
└─ dbplyr::db_collect()
└─ DBI::dbFetch()
└─ DBI::dbSendQuery()
└─ brickster:::dbSendQuery,DatabricksConnection
└─ db_sql_exec_query(disposition = "EXTERNAL_LINKS") ← 💥
| Step |
Package |
What happens |
tbl() |
dbplyr |
Builds a query plan only, no SQL sent |
collect() |
dbplyr |
Triggers actual execution |
dbFetch() |
DBI |
Fetches the results |
dbSendQuery() |
DBI → brickster |
Sends SQL to Databricks — this is where the hardcoded disposition = "EXTERNAL_LINKS" is applied, with no way to override it |
tbl() itself does not interact with the database at all. The critical point is collect(), which triggers the full dbplyr → DBI → brickster chain — and there is no way to pass disposition = "INLINE" through this path.
Desired Behaviour
Ideally there would be an option, with which we could override the hard coded disposition to allow for using tbl also on systems with no connection to blob.core.windows.net.
Critical Code Part
|
#' Send query to Databricks (asynchronous) |
|
#' @param conn A DatabricksConnection object |
|
#' @param statement SQL statement to execute |
|
#' @param ... Additional arguments (ignored) |
|
#' @returns A DatabricksResult object |
|
#' @export |
|
setMethod( |
|
"dbSendQuery", |
|
signature = c(conn = "DatabricksConnection", statement = "character"), |
|
function(conn, statement, ...) { |
|
db_assert_statement(statement) |
|
|
|
# Execute query asynchronously |
|
resp <- db_sql_exec_query( |
|
warehouse_id = conn@warehouse_id, |
|
statement = statement, |
|
catalog = if (nzchar(conn@catalog)) conn@catalog else NULL, |
|
schema = if (nzchar(conn@schema)) conn@schema else NULL, |
|
disposition = "EXTERNAL_LINKS", |
|
format = "ARROW_STREAM", |
|
wait_timeout = "0s", # Async execution |
|
host = conn@host, |
|
token = conn@token |
|
) |
|
|
|
# Create result object |
|
new( |
|
"DatabricksResult", |
|
statement_id = resp$statement_id, |
|
statement = statement, |
|
connection = conn, |
|
completed = FALSE, |
|
rows_fetched = 0 |
|
) |
|
} |
|
) |
Error Message
library(DBI)
library(brickster)
library(dbplyr)
library(dplyr)
warehouse_id <- 12345
con <- dbConnect(
DatabricksSQL(),
warehouse_id = warehouse_id
)
con %>%
tbl(I("my_catalog.my_schema.my_table")) %>%
head(5L)
## Error in `collect()`:
## ! Failed to collect lazy table.
## Caused by error in `httr2::req_perform()`:
## ! Failed to perform HTTP request.
## Caused by error in `curl::curl_fetch_memory()`:
## ! SSL connect error [dbstoragejcmu2vp2x6lnw.blob.core.windows.net]:
## Recv failure: Connection was reset
Issue
In
dbSendQuerythe disposition parameter is hardcoded and set toEXTERNAL_LINKS(fordbGetQueryon the other hand we can set the disposition ourselves). That is, in an environment whereblob.core.windows.netis blocked by teh firewall, we cannot usedbSendQuery.The problem becomes apparent if we want to use
tbl, because under the hood it will eventually calldbSendQuery.Call Chain:
tbl()→dbSendQuery()tbl()collect()dbFetch()dbSendQuery()disposition = "EXTERNAL_LINKS"is applied, with no way to override ittbl()itself does not interact with the database at all. The critical point iscollect(), which triggers the full dbplyr → DBI → brickster chain — and there is no way to passdisposition = "INLINE"through this path.Desired Behaviour
Ideally there would be an option, with which we could override the hard coded
dispositionto allow for usingtblalso on systems with no connection toblob.core.windows.net.Critical Code Part
brickster/R/databricks-dbi.R
Lines 227 to 262 in 8ea7f4c
Error Message