Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double similarity function - null pointer exception #369

Merged
merged 4 commits into from
Jul 13, 2022

Conversation

navinrathore
Copy link
Contributor

  • fixed config.json as MatchType NUMERIC_WITH_UNITS changed to NUMBER_WITH_UNITS
  • 'null' checks in DoubleSimilarityFunction. Also, checked for Double.NAN

@navinrathore
Copy link
Contributor Author

Tested with febrl standard dataset and another updated with Double field/values
with amazon-google dataset
with iTunes-amazon dataset

@navinrathore
Copy link
Contributor Author

Noticed that, for amazon-google dataset, header for google.csv file is not compatible with names in schema. Though, zingg seems working fine but below is the message.

2022-06-28 11:39:15,950 [main] WARN  zingg.util.PipeUtil - Reading Pipe [name=google, format=CSV, preprocessors=null, props={location=examples/amazon-google/GoogleProducts.csv, delimiter=,, header=true}, schema=StructType(StructField(id,StringType,false), StructField(title,StringType,true), StructField(description,StringType,true), StructField(manufacturer,StringType,true), StructField(price,DoubleType,true))]
 2022-06-28 11:39:16,248 [Executor task launch worker for task 1.0 in stage 10.0 (TID 39)] WARN  org.apache.spark.sql.catalyst.csv.CSVHeaderChecker - CSV header does not conform to the schema.
 Header: id, name, description, manufacturer, price
 Schema: id, title, description, manufacturer, price
Expected: title but found: name
CSV file: file:///home/navin/workDir/zingg-1/examples/amazon-google/GoogleProducts.csv

@navinrathore
Copy link
Contributor Author

Copy link
Member

@sonalgoyal sonalgoyal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add tests for some doubles.
  • Rename variable


@Test
public void testFirstNumIsNull() {
DoubleSimilarityFunction isNull = new DoubleSimilarityFunction();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name of variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to simFn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some tests

@sonalgoyal sonalgoyal merged commit 6a68dd9 into zinggAI:main Jul 13, 2022
@navinrathore navinrathore deleted the DoubleSimilartyFnException branch July 13, 2022 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants