-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/geopackage reader #1603
base: master
Are you sure you want to change the base?
Conversation
As a follow up to this mr, I need to add
|
Thank you for this great work! My major concern is whether it works with GeoPackage files stored on cloud storage such as HDFS or S3. As far as I know |
@Kontinuation good point, I ll write the test to make sure it works. Integration test with minio would be enough I think. |
WIP |
// skip srid for now | ||
reader.getInt() | ||
|
||
skipEnvelope(resolvedFlags._1, reader) | ||
|
||
val wkb = new Array[Byte](reader.remaining()) | ||
reader.get(wkb) | ||
|
||
val wkbReader = new org.locationtech.jts.io.WKBReader() | ||
val geom = wkbReader.read(wkb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that we take SRID into consideration. val wkbReader = new WKBReader(new GeometryFactory(new PrecisionModel(), srid))
would be sufficient.
if (pathString.toLowerCase(Locale.ROOT).endsWith(".geopackage")) { | ||
val path = new Path(pathString) | ||
val fs = path.getFileSystem(hadoopConf) | ||
|
||
val isDirectory = Try(fs.getFileStatus(path).isDirectory).getOrElse(false) | ||
if (isDirectory) { | ||
pathString | ||
} else { | ||
pathString.substring(0, pathString.length - 3) + "???" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly, if pathString
ends with ".geopackage", and it is not a directory, it will be transformed to "****.geopack???". I cannot grasp the idea of this transformation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah you are right, I was sure that in this part I already have list of files in the directory, I am wondering how it would behave if I am passing list of files and if its actually needed (user specifying paths with different file formats).
val serializableConf = new SerializableConfiguration( | ||
sparkSession.sessionState.newHadoopConfWithOptions(options.asScala.toMap)) | ||
|
||
val tempFile = FileSystemUtils.copyToLocal(serializableConf.value, files.head.getPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we detect if the path is a local path skip calling copyToLocal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, sure
@Kontinuation thanks for the review ! |
767b8ca
to
5b88709
Compare
@@ -98,6 +97,36 @@ | |||
<groupId>org.locationtech.jts</groupId> | |||
<artifactId>jts-core</artifactId> | |||
</dependency> | |||
<dependency> | |||
<groupId>org.testcontainers</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why these dependencies only appear in the spark-3.3
profile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test testing loading from s3 (I used minio with test containers) is only in spark 3.3. I can duplicate it to other versions of spark as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move these to the spark-common
pom.xml so all Spark 3.X pom.xml will share it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support other Spark versions 3.0, 3.1, and 3.2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't test it with those I can add data sources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if the implementation differs among the versions, we should definitely replicate the tests and test them for each version
Did you read the Contributor Guide?
Is this PR related to a JIRA ticket?
What changes were proposed in this PR?
Geopackage datasource
How was this patch tested?
integration tests
Did this PR include necessary documentation updates?