Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support selecting directly from http:// / https:// urls in datafusion-cli #9133

Closed
alamb opened this issue Feb 5, 2024 · 6 comments · Fixed by #9150
Closed

Support selecting directly from http:// / https:// urls in datafusion-cli #9133

alamb opened this issue Feb 5, 2024 · 6 comments · Fixed by #9150
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Feb 5, 2024

Is your feature request related to a problem or challenge?

After #8753 it is now possible to read data from http via a create external table command:

❯ create external table hits stored as parquet location 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
0 rows in set. Query took 0.178 seconds.

❯ describe hits;
+-----------------------+-----------+-------------+
| column_name           | data_type | is_nullable |
+-----------------------+-----------+-------------+
| WatchID               | Int64     | YES         |
| JavaEnable            | Int16     | YES         |
| Title                 | Binary    | YES         |
...
| RefererHash           | Int64     | YES         |
| URLHash               | Int64     | YES         |
| CLID                  | Int32     | YES         |
+-----------------------+-----------+-------------+
105 rows in set. Query took 0.003 seconds.

After #9064 from @manoj-inukolunu it is possible to COPY to a remote url which is also great.

However, it is not yet possible to select directly from a remote store like

select * from 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';

Describe the solution you'd like

I would like to be able to select directly from a remote http source like

select * from 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet' limit 1;

Error during planning: table 'datafusion.public.https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet' not found

This works great for local files:

select * from '/Users/andrewlamb/Downloads/hits.parquet' limit 1;
+---------------------+------------+---------------+-----------+------------+-----------+-----------+------------+----------+----------------------+--------------+----+-----------+---------------------------------------+---------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+-------------+-------------+--------+------------+-------------+---------+------------------------------------------------------------+-----------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+---------------------+---------------------+------+
| WatchID             | JavaEnable | Title         | GoodEvent | EventTime  | EventDate | CounterID | ClientIP   | RegionID | UserID               | CounterClass | OS | UserAgent | URL                                   | Referer | IsRefresh | RefererCategoryID | RefererRegionID | URLCategoryID | URLRegionID | ResolutionWidth | ResolutionHeight | ResolutionDepth | FlashMajor | FlashMinor | FlashMinor2 | NetMajor | NetMinor | UserAgentMajor | UserAgentMinor | CookieEnable | JavascriptEnable | IsMobile | MobilePhone | MobilePhoneModel | Params | IPNetworkID | TraficSourceID | SearchEngineID | SearchPhrase | AdvEngineID | IsArtifical | WindowClientWidth | WindowClientHeight | ClientTimeZone | ClientEventTime | SilverlightVersion1 | SilverlightVersion2 | SilverlightVersion3 | SilverlightVersion4 | PageCharset | CodeVersion | IsLink | IsDownload | IsNotBounce | FUniqID | OriginalURL                                                | HID       | IsOldCounter | IsEvent | IsParameter | DontCountHits | WithHash | HitColor | LocalEventTime | Age | Sex | Income | Interests | Robotness | RemoteIP   | WindowName | OpenerName | HistoryLength | BrowserLanguage | BrowserCountry | SocialNetwork | SocialAction | HTTPError | SendTiming | DNSTiming | ConnectTiming | ResponseStartTiming | ResponseEndTiming | FetchTiming | SocialSourceNetworkID | SocialSourcePage | ParamPrice | ParamOrderID | ParamCurrency | ParamCurrencyID | OpenstatServiceName | OpenstatCampaignID | OpenstatAdID | OpenstatSourceID | UTMSource | UTMMedium | UTMCampaign | UTMContent | UTMTerm | FromTag | HasGCLID | RefererHash         | URLHash             | CLID |
+---------------------+------------+---------------+-----------+------------+-----------+-----------+------------+----------+----------------------+--------------+----+-----------+---------------------------------------+---------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+-------------+-------------+--------+------------+-------------+---------+------------------------------------------------------------+-----------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+---------------------+---------------------+------+
| 9153127107923182022 | 1          | Участи NEWSru | 1         | 1373034098 | 15891     | 225510    | 1703485140 | 2        | -6224091410790412093 | 0            | 2  | 3         | http://liver.ru/belgorod/page=1024&wi |         | 0         | 0                 | 0               | 14328         | 22          | 2038            | 730              | 23              | 15         | 2          | 502         | 0        | 0        | 5              | D�             | 1            | 1                | 0        | 0           |                  |        | 4168741     | 0              | 0              |              | 0           | 0           | 1058              | 549                | 135            | 2035708370      | 0                   | 0                   | 0                   | 0                   | windows     | 1601        | 0      | 0          | 0           | 0       | http://video.yandex.ru/uglichnevyj-97442434830%20%D1%8C%20 | 298722980 | 0            | 0       | 0           | 0             | 0        | 5        | 1373021451     | 0   | 0   | 0      | 0         | 0         | 1961866254 | -1         | -1         | -1            | S0              | h1             |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH            | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -296158784638538920 | 7011450103338277684 | 0    |
+---------------------+------------+---------------+-----------+------------+-----------+-----------+------------+----------+----------------------+--------------+----+-----------+---------------------------------------+---------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+-------------+-------------+--------+------------+-------------+---------+------------------------------------------------------------+-----------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+---------------------+---------------------+------+
1 row in set. Query took 0.121 seconds.

Describe alternatives you've considered

No response

Additional context

I think the trick is intercepting requested URLs / references in DynamaicFileCatalog and calling the appropriate object store registration function (e.g. what is in #9064 )

@alamb alamb added the enhancement New feature or request label Feb 5, 2024
@alamb alamb changed the title Support selecting directly from http:// urls in datafusion-cli Support selecting directly from http:// / https:// urls in datafusion-cli Feb 5, 2024
@alamb alamb added the good first issue Good for newcomers label Feb 5, 2024
@r3stl355
Copy link
Contributor

r3stl355 commented Feb 5, 2024

I'd like to give this a try, please

@r3stl355
Copy link
Contributor

r3stl355 commented Feb 5, 2024

Is there an easy way to run cli locally from source without going through Docker (I'm on Mac M2)? Don't mind a "no" answer, just want to find an easiest way to get started, thanks.

@alamb
Copy link
Contributor Author

alamb commented Feb 6, 2024

Is there an easy way to run cli locally from source without going through Docker (I'm on Mac M2)? Don't mind a "no" answer, just want to find an easiest way to get started, thanks.

I normally do

cd datafusion/datafusion-cli
cargo run

(the key is that datafusion-cli is not part of the workspace

@r3stl355
Copy link
Contributor

r3stl355 commented Feb 6, 2024

Thanks @alamb, that worked, I was clearly after something more complex and didn't think about that option 🤦

@r3stl355
Copy link
Contributor

r3stl355 commented Feb 6, 2024

take

@r3stl355
Copy link
Contributor

r3stl355 commented Feb 6, 2024

yay, it works 🎆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
2 participants