Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MARC 100 vs 700 author / contributor inconsistency #7723

Open
hornc opened this issue Mar 24, 2023 · 2 comments · May be fixed by #9797
Open

MARC 100 vs 700 author / contributor inconsistency #7723

hornc opened this issue Mar 24, 2023 · 2 comments · May be fixed by #9797
Assignees
Labels
Lead: @hornc Issues overseen by Charles (Staff: Data Engineering Lead) [managed] Module: Import Issues related to the configuration or use of importbot and other bulk import systems. [managed] Needs: Response Issues which require feedback from lead Priority: 3 Issues that we can consider at our leisure. [managed] Theme: MARC records Type: Question This issue doesn't require code. A question needs an answer. [managed]

Comments

@hornc
Copy link
Collaborator

hornc commented Mar 24, 2023

Compare:

Why are the other 700 field contributors not being imported as authors in the same way as the 700s from the Nihon no chasho example?

It looks like in the 100 + 700s case, only the 100 individual is made an author, the 700s are contributors.

In the only 700s case, each 700 is added as an equal author.

Is this desired behavior?

It looks like treating 700s as contributors unless there is no main1xx entries is deliberate behavior. I can't find anything that confirms either way that this is a correct or incorrect assumption. 700s seem flexible, and although there is provision for (multiple?) subfields to state the exact relationship of the name to the record -- https://www.loc.gov/marc/bibliographic/bd700.html these don't have to exist, and in practice often don't.
It seems like it works out, but there is a risk some contributors (illustrators / translators) might get added as authors, and conversely some equally responsible authors may get added as mere contributors.

It looks like there isn't a clear way to indicate all the possibilities in MARC, or at least actual cataloging practice varies considerably.

I won't change the 1xx / 7xx behavior in this PR. If a field is picked as an author rather than in the contributions list, and an 880 alternate script version exists, it will now be added to the author dict as an alternate_name, regardless of 1xx or 7xx.

contributions on editions are just plain text lists of single names and don't have room for extra annotations. Work authors of https://openlibrary.org/type/author_role look like they would handle this better, but the role field is not currently used by any of the imports (AFAIK).

Originally posted by @hornc in #7652 (comment)

@hornc hornc changed the title MARC 100 vs 700 author / contributor inconsistency. MARC 100 vs 700 author / contributor inconsistency Mar 24, 2023
@hornc hornc added Theme: MARC records Module: Import Issues related to the configuration or use of importbot and other bulk import systems. [managed] labels Mar 24, 2023
@tfmorris
Copy link
Contributor

I've been keeping an eye out for these and I'm pretty sure it's currently being done wrong / sub-optimally. I'm not sure if different catalogers use different rules, but there definitely seem to be a number of instances where only the first author goes in the 100 and all the rest go in the 700. Of course, 7xx fields with a relator of "illustrator", etc should stay in the contributions and not get promoted to authors.

Here are some examples that I've come across:

  1. Compare 245$c by_statement with 100 and 700 and resulting OpenLibrary record
    https://openlibrary.org/works/OL6044872W/Key_issues_in_the_new_knowledge_management?_compare=Comparer&b=5&a=4&m=diff
    https://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:940898024:809

  2. Similar case (arguably the 710s should be added along with the 700). There are four different MARC records with similar data.
    https://openlibrary.org/works/OL11150773W/Ground-water_data_for_West_Virginia_1974-84?_compare=Comparer&b=3&a=2&m=diff
    https://openlibrary.org/show-records/marc_oregon_summit_records/catalog_files/osu_bibs.mrc:769134252:1338

@mekarpeles mekarpeles added Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Needs: Lead Priority: 3 Issues that we can consider at our leisure. [managed] Lead: @hornc Issues overseen by Charles (Staff: Data Engineering Lead) [managed] and removed Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Needs: Lead labels Sep 15, 2023
@tfmorris
Copy link
Contributor

It looks like treating 700s as contributors unless there is no main1xx entries is deliberate behavior. I can't find anything that confirms either way that this is a correct or incorrect assumption.

The 1xx is the "Main entry" and what it is, and whether or not it exists, is determined by the cataloging rules which were used (e.g. AACR) which vary by time and geography. There's also the possibility that the cataloger didn't follow the rules that they were supposed to. Because there can only be a single 1xx, equal co-authors are always going to end up in 7xx fields.

Since OpenLibrary wants to list all authors, not just whoever is identified in the main entry, I think it makes sense to include all 7xx's EXCEPT those which can be clearly identified as non-author/editor contributors like illustrators, translators, etc.

I don't think it'll ever be possible to do it perfectly by reverse engineering human provided data with unknown cataloging rules, but I think it's possible to improve on the current situation.

Just taking a look at this again and I don't think the examples match the description. The Arabic/French example doesn't match the 100+700 description since it only has (three) 700s and a 710.

The binary MARC record from the test suite for the first example above is:
https://github.com/hornc/openlibrary-1/blob/880_alternate_scripts/openlibrary/catalog/marc/tests/test_data/bin_input/880_arabic_french_many_linkages.mrc

There are two online examples which are easier to visualize:
LC https://openlibrary.org/show-records/marc_loc_2016/BooksAll.2016.part37.utf8:212405343:2979
Columbia https://openlibrary.org/show-records/marc_columbia/Columbia-extract-20221130-017.mrc:84193714:3643

The binary for the second example is:
https://github.com/hornc/openlibrary-1/blob/880_alternate_scripts/openlibrary/catalog/marc/tests/test_data/bin_input/880_Nihon_no_chasho.mrc
and it's online at: https://openlibrary.org/show-records/marc_columbia/Columbia-extract-20221130-008.mrc:340428848:1828

In addition to the $0's for authors which we already discuss in #7724 these show the possibility of adding dates (one of the authors has a new death date) and alternate script names to existing author records. I'm not sure if attempting to improve/upgrade existing author records is something that should be done, but it's worth considering.

@github-actions github-actions bot added the Needs: Response Issues which require feedback from lead label Aug 23, 2024
@hornc hornc linked a pull request Aug 26, 2024 that will close this issue
@hornc hornc self-assigned this Aug 26, 2024
@hornc hornc added the Type: Question This issue doesn't require code. A question needs an answer. [managed] label Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Lead: @hornc Issues overseen by Charles (Staff: Data Engineering Lead) [managed] Module: Import Issues related to the configuration or use of importbot and other bulk import systems. [managed] Needs: Response Issues which require feedback from lead Priority: 3 Issues that we can consider at our leisure. [managed] Theme: MARC records Type: Question This issue doesn't require code. A question needs an answer. [managed]
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants