Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception handling in PipeUtil::read() #229

Merged
merged 6 commits into from
May 16, 2022

Conversation

navinrathore
Copy link
Contributor

Here is a replaced Fraft PR to handle PipeUtil::Read() exception handling.

  • The ZinggClientException has been used for the purpose. Should there be different exception used? If yes, it should better be derived from Exception
  • The two variations. read() and readWithException() may not be needed. As our code is already handling these scenario using catching "Exception" and do nothing. single read() with exception is fine.
  • Use of ZinggClientException in function signatures is done to satisfy exception throw mechanism and compiler
  • error messages like below is printed on cli.
zingg.client.Client - Apologies for this message. Zingg has encountered an error. Path does not exist: file:/home/work/product/final/zingg-1/models/900/trainingData/marked

Please provide your inputs on the approach.
Note: Though major flows have been tested. Still more testing required.

@@ -41,12 +41,12 @@ public void execute() throws ZinggClientException {

public void processRecordsCli(Dataset<Row> lines) throws ZinggClientException {
LOG.info("Processing Records for CLI updateLabelling");
getMarkedRecordsStat(lines);
printMarkedRecordsStat();
if (lines == null || lines.count() == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to have a clean flow without returns - please put in if lines != null as discussed earlier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed Return.
Earlier thought that following code block was already very big. It should not be cluttered further.

@@ -75,11 +75,11 @@ protected void getMarkedRecordsStat(Dataset<Row> markedRecords) {

public void processRecordsCli(Dataset<Row> lines) throws ZinggClientException {
LOG.info("Processing Records for CLI Labelling");
printMarkedRecordsStat();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see earlier comment about returns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed return statement.

@@ -51,7 +51,7 @@ public Dataset<Row> getUnmarkedRecords() throws ZinggClientException {
unmarkedRecords = PipeUtil.read(spark, false, false, PipeUtil.getTrainingDataUnmarkedPipe(args));
try {
markedRecords = PipeUtil.read(spark, false, false, PipeUtil.getTrainingDataMarkedPipe(args));
} catch (Exception e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont we still need to catch the other expcetions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read() throws only ZinggClientException.
Earlier, why didnot read() mandate to ctch Exception everywhere it is called. Is there any difference in Exception and ZinggClientException?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes ZCE extends from throwable

@@ -41,11 +41,12 @@ public Matcher() {
setZinggOptions(ZinggOptions.MATCH);
}

protected Dataset<Row> getTestData() {
return PipeUtil.read(spark, true, args.getNumPartitions(), true, args.getData());
protected Dataset<Row> getTestData() throws ZinggClientException{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To satisfy the requirement of catching or throwing the ZinggClientException. All the intermediate functions are mandating to add this.
In this regard, are types Exception and ZinggClientException different?

input = reader.load();
}
if (addSource) {
input = input.withColumn(ColName.SOURCE_COL, functions.lit(p.getName()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we throw an exception here, how will we handle cases where we dont want exceptions - eg findtrainingData

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't throwing any new exception here. Catching one and throwing another.
So if an exception is already thrown (that is indeed done in spark.read() in this case), it is handled somewhere (first catch block). So with this change, there will not be any impact in program behavior. Just things made explicit to have the exception handled by the caller. the specific type of the exception. Else, we are loosing the error message in the process.

Still, we have to see which one to use 'Exception' or 'ZinggClientException' in the flow.
We are already handling Exception in top level classes. Why are we loosing ex.getMessage(). I note that intermediate function calls do not h ave explicit exception specification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, makes sense. but the message here has to be that we could not read the and also wrap the actual exception thrown

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used wrapped exception.

@navinrathore navinrathore marked this pull request as ready for review May 5, 2022 10:39
getMarkedRecordsStat(lines);
printMarkedRecordsStat();
if (lines == null || lines.count() == 0) {
LOG.info("There is no marked record for updating. Please run findTrainingData/label jobs to generate training data.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to still print this out in else, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. made changes in Labeller and UpdateLabeller.

sonalgoyal and others added 2 commits May 16, 2022 10:44
Pipe's props is initialized with blank hashmap. makes getProps() never return null
@sonalgoyal sonalgoyal merged commit cb39e21 into zinggAI:main May 16, 2022
@navinrathore navinrathore deleted the zProperErrors branch June 1, 2022 04:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants