MARC 100 vs 700 author / contributor inconsistency #7723

hornc · 2023-03-24T02:44:24Z

Compare:

Why are the other 700 field contributors not being imported as authors in the same way as the 700s from the Nihon no chasho example?

It looks like in the 100 + 700s case, only the 100 individual is made an author, the 700s are contributors.

In the only 700s case, each 700 is added as an equal author.

Is this desired behavior?

It looks like treating 700s as contributors unless there is no main1xx entries is deliberate behavior. I can't find anything that confirms either way that this is a correct or incorrect assumption. 700s seem flexible, and although there is provision for (multiple?) subfields to state the exact relationship of the name to the record -- https://www.loc.gov/marc/bibliographic/bd700.html these don't have to exist, and in practice often don't.
It seems like it works out, but there is a risk some contributors (illustrators / translators) might get added as authors, and conversely some equally responsible authors may get added as mere contributors.

It looks like there isn't a clear way to indicate all the possibilities in MARC, or at least actual cataloging practice varies considerably.

I won't change the 1xx / 7xx behavior in this PR. If a field is picked as an author rather than in the contributions list, and an 880 alternate script version exists, it will now be added to the author dict as an alternate_name, regardless of 1xx or 7xx.

contributions on editions are just plain text lists of single names and don't have room for extra annotations. Work authors of https://openlibrary.org/type/author_role look like they would handle this better, but the role field is not currently used by any of the imports (AFAIK).

Originally posted by @hornc in #7652 (comment)

The text was updated successfully, but these errors were encountered:

tfmorris · 2023-04-14T23:34:00Z

I've been keeping an eye out for these and I'm pretty sure it's currently being done wrong / sub-optimally. I'm not sure if different catalogers use different rules, but there definitely seem to be a number of instances where only the first author goes in the 100 and all the rest go in the 700. Of course, 7xx fields with a relator of "illustrator", etc should stay in the contributions and not get promoted to authors.

Here are some examples that I've come across:

Compare 245$c by_statement with 100 and 700 and resulting OpenLibrary record
https://openlibrary.org/works/OL6044872W/Key_issues_in_the_new_knowledge_management?_compare=Comparer&b=5&a=4&m=diff
https://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:940898024:809
Similar case (arguably the 710s should be added along with the 700). There are four different MARC records with similar data.
https://openlibrary.org/works/OL11150773W/Ground-water_data_for_West_Virginia_1974-84?_compare=Comparer&b=3&a=2&m=diff
https://openlibrary.org/show-records/marc_oregon_summit_records/catalog_files/osu_bibs.mrc:769134252:1338

tfmorris · 2024-08-22T22:35:27Z

It looks like treating 700s as contributors unless there is no main1xx entries is deliberate behavior. I can't find anything that confirms either way that this is a correct or incorrect assumption.

The 1xx is the "Main entry" and what it is, and whether or not it exists, is determined by the cataloging rules which were used (e.g. AACR) which vary by time and geography. There's also the possibility that the cataloger didn't follow the rules that they were supposed to. Because there can only be a single 1xx, equal co-authors are always going to end up in 7xx fields.

Since OpenLibrary wants to list all authors, not just whoever is identified in the main entry, I think it makes sense to include all 7xx's EXCEPT those which can be clearly identified as non-author/editor contributors like illustrators, translators, etc.

I don't think it'll ever be possible to do it perfectly by reverse engineering human provided data with unknown cataloging rules, but I think it's possible to improve on the current situation.

Just taking a look at this again and I don't think the examples match the description. The Arabic/French example doesn't match the 100+700 description since it only has (three) 700s and a 710.

The binary MARC record from the test suite for the first example above is:
https://github.com/hornc/openlibrary-1/blob/880_alternate_scripts/openlibrary/catalog/marc/tests/test_data/bin_input/880_arabic_french_many_linkages.mrc

There are two online examples which are easier to visualize:
LC https://openlibrary.org/show-records/marc_loc_2016/BooksAll.2016.part37.utf8:212405343:2979
Columbia https://openlibrary.org/show-records/marc_columbia/Columbia-extract-20221130-017.mrc:84193714:3643

The binary for the second example is:
https://github.com/hornc/openlibrary-1/blob/880_alternate_scripts/openlibrary/catalog/marc/tests/test_data/bin_input/880_Nihon_no_chasho.mrc
and it's online at: https://openlibrary.org/show-records/marc_columbia/Columbia-extract-20221130-008.mrc:340428848:1828

In addition to the $0's for authors which we already discuss in #7724 these show the possibility of adding dates (one of the authors has a new death date) and alternate script names to existing author records. I'm not sure if attempting to improve/upgrade existing author records is something that should be done, but it's worth considering.

for #7723

hornc changed the title ~~MARC 100 vs 700 author / contributor inconsistency.~~ MARC 100 vs 700 author / contributor inconsistency Mar 24, 2023

hornc added Theme: MARC records Module: Import Issues related to the configuration or use of importbot and other bulk import systems. [managed] labels Mar 24, 2023

github-actions bot added the Needs: Response Issues which require feedback from lead label Aug 23, 2024

hornc linked a pull request Aug 26, 2024 that will close this issue

Improve MARC Author name importing #9797

Open

hornc self-assigned this Aug 26, 2024

hornc mentioned this issue Aug 26, 2024

Improve MARC string importing, part A #9806

Merged

hornc added the Type: Question This issue doesn't require code. A question needs an answer. [managed] label Sep 16, 2024

hornc added a commit that referenced this issue Sep 18, 2024

update expectations for 880_arabic_french_many_linkages

e44eb72

for #7723

hornc added a commit that referenced this issue Sep 19, 2024

update expectations for 880_arabic_french_many_linkages

503f989

for #7723

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MARC 100 vs 700 author / contributor inconsistency #7723

MARC 100 vs 700 author / contributor inconsistency #7723

hornc commented Mar 24, 2023 •

edited

Loading

tfmorris commented Apr 14, 2023

tfmorris commented Aug 22, 2024

MARC 100 vs 700 author / contributor inconsistency #7723

MARC 100 vs 700 author / contributor inconsistency #7723

Comments

hornc commented Mar 24, 2023 • edited Loading

tfmorris commented Apr 14, 2023

tfmorris commented Aug 22, 2024

hornc commented Mar 24, 2023 •

edited

Loading