-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File metadata going to DataCite is not escaped #5546
Comments
@pdurbin - it's tempting but... FWIW - it looks like the dataset metadata is html escaped which handles things like a bare ampersand (which is more than what happens for file metadata...) but really what's needed is the same StringEscapeUtils.escapeXml() call I'm making here, which I think handles anything that would make invalid xml such as or <>, etc. (just assuming, haven't checked). If that's the case, to cover #3328 (without just restricting what users can enter), I think you'd have to scan through DOIDataCiteRegisterService.generateXML() and escape each string before adding it to the template. |
@qqmyers Dumb question but if I put A & B in file description and fetch the xml from Datacite when your fix is in place, I see
Was that your intention? I see it is escaped in the xml Thanks for any info. Also cannot publish a file at all with EZID if set file descp as A & B, even with escaped code. |
@kcondon - yes. If you look in Fabrica, they display without the escaping: (without the escaping, they completely drop the &) |
@kcondon flagged in standup that this fix does not work for EZID. I said it was OK to merge this as it provides improvement for the large number of installations using DataCite. |
Yeah - I haven't looked at what is sent in EZID, but the formatting of the metadata is Provider/Provider API specific. If the metadata values could be made neutral-enough (no special characters w.r.t. xml, json, any format used by providers) a change outside the Provider classes might work, but if special characters are allowed, I think in general it would have to be the Provider classes handling the escaping/cleaning for their particular APIs/transfer formats. |
In developing #5506, which compares the XML returned from DataCite with what DV sends, we discovered that, unlike dataset metadata, file metadata (label, description are the only things sent to DataCite currently) are not escaped when sent to DataCite. So, for example, the Title "A & B" when sent to DataCite is currently stripped to read "A B" and only if it is escaped to read ("A & B") does DataCite capture the ampersand (correctly displaying it and returning the escaped title to DV so that the returned XML matches what was sent.
Long-term, it might make sense to escape the values being returned by DataFile.getDescription() and DataFile.getDisplayName() (overrides DvObject.getDisplayName()) and adjusting all the places that this information is viewed - to be more parallel to how Dataset works, but, for now, just escaping in the DataCite code is enough to address the issue. I'll submit a PR with that change...
The text was updated successfully, but these errors were encountered: