-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export labeled data from zingg learner and import it to new zingg instance/model #117
Comments
Thanks for reporting @delta824. What would be the values of the columns you want to add? |
Text and/or numbers |
What would be the default values? |
For example, the dataset(s) were labeled using zingg's learner based on 5 columns. Later on, an additional column will be appended to the dataset(s) to provide more details to help improve match accuracy. zingg can output the labeled data from the learner in the same format as the "Using preexisting training data" feature described in #115 and https://docs.zingg.ai/docs/setup/training/addOwnTrainingData.html |
I see your point. The problem here is what value should Zingg assign to the 6th newly added column in the old labelled data with 5 columns? When building the model, we need to learn from all the columns collectively hence we can’t leave it as blank or null as that is not representative of how the 6th column will be in the data |
Is it possible to rebuild the model from zero and learn from all columns collectively, including the 6th newly added column using the exported labeled data? Similar to how the model will be built when using pre-existing training data. So it won't be a modification of the existing model, but rather a full re-build and new values for all columns again. |
Yes that is clearly possible. In fact, every time you run train, a new model is created which overwrites the last one. Does something like this work? @delta824 -Zingg command to export the labelled data to csv
|
Yes, that would work!
|
@navinrathore can you please provide steps to convert parquet files to csv on spark-shell from the marked folder of the model? See if there is a way we can print/convert the schema in a way that can be used in the config json |
Here are the steps to run in python.
Note: a) Input is a directory where marked files are stored. replace <modeld> with actual value. |
Document for Exporting labeled data as training samples #117
Ability to export the labeled data from zingg’s learner and import it into a new zingg config/model. Currently there is no way to retain the labeled data if the config file is changed/updated. For example, adding an additional column to same dataset (labeled data still true/unchanged).
The text was updated successfully, but these errors were encountered: