Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cultures #5223

Merged
merged 2 commits into from
Jan 27, 2022
Merged

Cultures #5223

merged 2 commits into from
Jan 27, 2022

Conversation

impaktor
Copy link
Member

@impaktor impaktor commented Jun 27, 2021

Introduction

I thought it time to move on this code, that's been festering in my git repo since 2016, and open up for comments and feedback. The idea behind this PR is to improve the name generator, so that:

  1. First and last names match. No more "Nakamura Smith", or "Vladimir Pidgeon"

  2. Don't spam with unique names. For many languages I could probably find +1000 different male / female names, however, I want the player to learn which first/last names are typical for each language (and all languages +95% of people usually find their name in the top 100 most common), thus I've (in most places) limited myself to the top 100 most common last and first names, thus 300 names per culture (male, female, surname).

  3. The third, and main idea was initially to have each system or station dominated by some language/culture, to give stations some more character (e.g. "Now I'm on a station where NPCs have Russian names"). I got stuck on this for many years as I wanted to expose this to the custom system scripts, however, I've now decided to leave that to some future contributor, and just have random - but weighted - generated names on each station. Thus only point 1. & 2. are implemented here, and then each individual is assigned a culture by some weights I pulled out from my hat, such that English, or Russian, is more likely than Gaelic.

And before someone objects: It's true that airports are "mixed", and our space stations are like airports, but, when I'm at Oslo airport is probably 80%-90% Norwegian, (plus Danes and Swedes). I think one could always have 10% of NPCs sampled from "random" culture, on top of the dominant one, and Earth stations could of course be completely random, but that's the stuff I gave up on in point 3 above, so it's moot anyway.

Example

Here I've pre-fixed each name with language code, just to more clearly see what's going on:
many_languages

Implementation

I've added a data/culture folder where each culture adds its own rules and names, as well as a super class in data/cultures/common.lua that is inherited by each language file, and a data/cultures/cultures.lua file to expose them all to the outside world.

All language files are more or less identical, except ru.lua, el.lua, due to overloading the inherited lastname() function. Also, I've opted for lua rather than json, as different languages might want to do different funky stuff. Perhaps these can be expanded to also include geographical locations, and such.

For now, I've just made the original name generator in libs/NameGen call the new methods:

NameGen:FullName(isfemale, rand)
--
--      |
--      |
--      V
--
Culture:FullName(isfemale, rand, culture)

where rand is optional in the former but not the latter, so it's just a wrapper, and, at the moment, culture argument defaults according to probabilities specified in Culture.weight (but one can pass in a culture, e.g. "Spanish"). I think I might be almost trivial to have each space station define it's own weight matrix, and override the default in Culture.weigts, some how, so you could have a 90% Russian, 10% Dutch station, or similarly.

To do

This will be Save bump, I assume, so no point in merging this until we get some other save bumping PRs.

  • Some languages (nb, dk, se, de, is, gd) have non-ascii characters. Although having a "Gödel station" or "Schrödingers Village" would be cool, it would be hard to search for on a US keyboard. I though I could replace those at run-time (at least if it's to be used in a station name, as opposed to NPC name), but lua encodes those characters as two chars? Either way, I didn't get anywhere, and I'd rather not just "remove" them from the name list.

  • NameGen:Surname() Should this be expanded to take an isfemale variable (matters for Slavic and Greek)? It would require some changes on C++ side (I think), as it's used by system generation code.

  • Should we keep the old names? I have made a misc.lua "culture" that's the old stuff, but I've excluded it from this PR for now.

  • Remove debug printout (like pre-pending name with language code)

  • Squash commits

Closes #3601

@impaktor impaktor marked this pull request as draft June 27, 2021 12:07
@Web-eWorks
Copy link
Member

First of all, looking good impaktor! Glad to see your efforts over the last few years have been so fruitful, I think this easily quintuples the number of names we can generate in Pioneer! (You're upstaging the renderer rewrite PR in terms of line count, but I've only been working on that for six months so I don't really care :D)

First and last names match. No more "Nakamura Smith", or "Vladimir Pidgeon"

I've certainly seen plenty of mismatched names in the 'real world', usually in US immigrants who keep the family surname but give their children "western" / "Christian" first names, so it's not as jarring as you think. I'd think maybe 5-10% of names should follow this pattern of using the location's primary culture for first name and a random minority culture for the last name, but that's not exactly low-hanging fruit.

The third, and main idea was initially to have each system or station dominated by some language/culture, to give stations some more character (e.g. "Now I'm on a station where NPCs have Russian names"). I got stuck on this for many years as I wanted to expose this to the custom system scripts, however, I've now decided to leave that to some future contributor, and just have random - but weighted - generated names on each station.

I've been thinking about the idea of a generalized "early lua init" stage which loads a small subset of our lua scripts (pigui themes, custom system defs, etc.) for use in the game's startup procedures; I think providing a list of culture definitions to the c++ system generation stage would be a good candidate for this stage as well, and would allow C++ system and station generation to create the per-system weights as you described. I'm not sure it's something that will be realized during this release cycle, but it's room for future improvement.

Just a random pre-review comment: instead of require './de', you can just write require '.de' without the path separator - the path separator actually gets converted into a module name (.de) before the module name is converted into a file search path ($module/de.lua, $module/de/init.lua). It's not necessary to change, just figured I'd point it out.

Also: 'Finish' for fi.lua? Are you sure that's not supposed to be Finnish? 😄

@impaktor
Copy link
Member Author

impaktor commented Jun 29, 2021

You're upstaging the renderer rewrite PR in terms of line count

Ha! I hadn't realized until now, that it's 8k lines, in spite of (most languages) only including 100 names per variable. I see Greek is the biggest, thanks to @jimishol. (I assume @nozmajner has us all floored in line counts when pushing models)

require './de', you can just write require '.de'

Fixed.

Also: 'Finish' for fi.lua

Fixed.

I've certainly seen plenty of mismatched names in the 'real world'

Yeah, combining different last and first names would of course be trivial. (I think in the second or third "Ender's Game" book, they live on a Portuguese/Norwegian colony, so you could even specify to mix in a specific way like that, according to some coupling matrix).

I think providing a list of culture definitions to the c++ system generation stage would be a good candidate for this stage as well

Interesting. As long as no other person tries to merge a save bump, there's no rush on this PR.

Thanks for the comments!

@Zireael07
Copy link

Seconding the comment about "mismatched" names - with more and more people having a parent each of very different cultures, or moving elsewhere as a child, it's becoming more and more common to see people with "mismatched" names.

@fluffyfreak
Copy link
Contributor

I shall miss "Vladimir Pidgeon" 😁

@impaktor impaktor force-pushed the cultures branch 5 times, most recently from 4b4d981 to 7bcbf3b Compare January 23, 2022 18:08
@impaktor impaktor force-pushed the cultures branch 4 times, most recently from 5afc0c1 to 401aa29 Compare January 26, 2022 19:36
@impaktor impaktor marked this pull request as ready for review January 26, 2022 19:40
@impaktor
Copy link
Member Author

@Web-eWorks I think this is done, (except for squash merge the commits, which I can do). I haven't changed any code since you looked at it, except homogenize format, and remove debug statements, and some documentation.

I only have one concern, and that's the Surname method in NameGen.lua, since it doesn't take sex as input, it just flips a coin. It works with present code base, since it's only used to generate last names of places, and sex is coin flip, but if someone in the future want a specific sex in that part of the code, it will have to be fixed, and I suspect it would require changes on C++ side as well, since it's called from galaxy gen, I assume.

@fluffyfreak
Copy link
Contributor

I did not previously know that some languages and cultures had gendered surnames 🤔

@fluffyfreak fluffyfreak self-requested a review January 26, 2022 22:54
Copy link
Contributor

@fluffyfreak fluffyfreak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. I'm not great at Lua but it all seems logical, well documented and full of good comments and function explanations so easy to follow along.

Comment on lines 66 to 70
Culture.lookup = {}
print("Random generated names from:")
for k, v in pairs(Culture.weights) do
Culture.lookup[v.lang.name] = v.lang
print("* ", k, v.lang.code, v.lang.name, v.lang, v.weight)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend removing this debug print for merge.

@impaktor
Copy link
Member Author

Upped the weight of "misc" (our current names, i.e. contributors / AUTHOR.txt + mixed bag of stuff), and lowered Gaelic. These are the weights (in percent) for each language, descending order:

Language (%)
misc Miscellaneous 18.400
en English 11.000
us American 9.200
de German 5.500
es Spanish 5.500
fr French 5.500
it Italian 5.500
ja Japanese 5.500
ru Russian 5.500
zh Chinese 5.500
nl Dutch 3.700
pl Polish 3.700
da Danish 1.800
el Greek 1.800
fi Finnish 1.800
hu Hungarian 1.800
nb Norwegian 1.800
ro Romanian 1.800
sv Swedish 1.800
tr Turkish 1.800
gd Gaelic 0.400
is Icelandic 0.400

This moves name generation to data/cultures/, to make first and last name
match the language, as well as sex of first and last name (e.g. for Russian
and Greek). Old name gen is now a subset, as a "misc" language.

This also brings contributor names up to date in to misc.
(Needed for BBS refactor pioneerspacesim#5312, and Cultures based name gen pioneerspacesim#5223)
@impaktor impaktor merged commit 9ed0163 into pioneerspacesim:master Jan 27, 2022
@impaktor impaktor deleted the cultures branch January 27, 2022 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update NameGen.lua
4 participants