Fix PEP 0 name parsing #1386

AA-Turner · 2020-04-27T23:09:27Z

This fixes name parsing in PEP 0 for generation of the indicies, for example Ernest W. Durbin III, The Python core team and community, Eric N. Vander Weele.

This fix is also included in PR #1385, but as per @merwok's suggestion in #2 I'm creating this new pull request for the single issue.

Thanks,
Adam

pep0/pep.py

merwok · 2020-04-28T01:39:25Z

pep0/pep.py

+                surname = " ".join(name_parts[-2:])
+                name.update(forename=forename, surname=surname)
+
+            # handles double surnames after a middle initial (e.g. N. Vander Weele)


This is tough, a name like R. David Murray should give Murray as a surname for the PEP index.

That said, a name written in Japanese English convention like INADA Naoki should return INADA as surname.
In short, it’s not generally possible to parse world names into US categories of «first», «middle», «last»

I guess we have to accept imperfect for now, add special cases when we notice problems (what PEP will add the first Name MacName Sr 🙂), and maybe someday rework the system to have the proper way: metadata (or some author index dict) should include full name and short name for PEP 0.

Luckily (?!) in PEP 458 R. David Murray is listed without the full stop in his first initial, so the name is still parsed correctly. I don't think there's a good solution for INADA Naoki beyond adding a special-case exception

In short, it’s not generally possible to parse world names into US categories of «first», «middle», «last»

Or UK, Australia, Canada etc 😜. But I get your point. I'm reminded of this post about names - hopefully #40 doesn't apply to us...

I think that your last suggestion having some sort of lookup table is probably the best solution, as in all the PEPs there are still only a relativley small number of authors (248) - it's quite late here so will add that feature tommorow. It also keeps special cases etc. out of the code to keep it from becoming knobbly.

It would be OK if this PEP fixed the most egregious cases (III) without covering 100% of possibilities 🙂

Gives me something to do! Latest commit adds such a metadata lookup and therefore simplifies the name parsing code.

This should make it so that names can be correctly entered into AUTHORS.csv and PEP 0 will reflect this. I've also identified some duplicate entries (e.g. P.J. Eby & Phillip J. Eby, Greg and Gregory Ewing, Jim J. Jewett & Jim Jewett, Martin v. Löwis & Martin von Löwis). Is it acceptable to modify PEP headers to canonicalise these names?

I think I would add multiple entries to the data file rather than editing historical documents.

The latest data file (exceptions rather than full mapping) doesn’t de-duplicate these entries, should it?

The data file is checked first (in init), so unsure where duplicates would propogate from?

Always good to be preventative but not sure I understand this one, sorry!

Maybe my comment doesn’t make sense!
I don’t have a clear picture of the current behaviour of the code, so I wondered if the change from full data file to exceptions data file did preserve the feature you added of normalizing the duplicate names.

Aha! I forgot to add that back in, you're right - have done so now (only adding the less used variant and mapping it to the 'cannonical' variant, to keep the file smaller)

AUTHORS.csv

AA-Turner · 2020-05-04T01:04:02Z

Is there anything outstanding to do on this one @merwok? / What would next steps be?

merwok · 2020-05-04T03:29:56Z

Is there a preview up somewhere? If not, I’ll have a look on my laptop.

AA-Turner · 2020-05-04T23:52:02Z

Is there a preview up somewhere? If not, I’ll have a look on my laptop.

I haven't set one up, sorry

vstinner · 2020-06-17T09:07:16Z

Hum, I understand that the purpose of this PEP is to fix:

"I | 8100 | January 2019 steering council election | Smith, III"

"Ernest W. Durbin III" becomes "III".

How is the CSV file generated? How is it supposed to be maintained?

I would prefer to only store "exceptions" in this file: if a name is <first name> <surname> (two words), we can pick the second word as the surname.

I'm not sure why my name "Victor Stinner" is rendered as "Stinner, Victor". Why not simply copying the name unchanged? Is it a convention?
https://www.python.org/dev/peps/pep-0000/#authors-owners

Also I'm not sure if it's a good idea to provide a long list of email addresses.

AA-Turner · 2020-06-17T15:45:03Z

Hum, I understand that the purpose of this PEP is to fix:

"I | 8100 | January 2019 steering council election | Smith, III"

How is the CSV file generated? How is it supposed to be maintained?

I would prefer to only store "exceptions" in this file: if a name is <first name> <surname> (two words), we can pick the second word as the surname.

I think Eric also gave examples of surnames from a non-anglicised tradition (e.g. INADA Naoki) which this process should also fix. However I think I like the proposal of a sort of 'authors-exception' file better, which would also be easier to maintain as you note. I'll work on this now.

I'm not sure why my name "Victor Stinner" is rendered as "Stinner, Victor". Why not simply copying the name unchanged? Is it a convention?
python.org/dev/peps/pep-0000/#authors-owners

I'd be happy to render the name unchanged, but don't want to make unilateral changes for obvious reasons! Rendering the name unchanged would make all of these workarounds unnecessary, so would be easier from a maintenance perspective.

Having done a bit of looking, the Last, First format was in @benjaminp's original PEP0 generator from 12 years ago:

peps/pep0/output.py

Line 177 in 1fbe18e

(author.last_first.ljust(max_name_len), authors_dict[author]))

This seems to have originated from @warsaw in commit b7ac9d0 (Aug 2000). I might suggest that if the Authors/Owners key were to be changed, the names by each PEP should be updated to a similar format.

Also I'm not sure if it's a good idea to provide a long list of email addresses.

True!

… as per Victor's suggestion

AA-Turner · 2020-06-17T17:33:52Z

I would prefer to only store "exceptions" in this file: if a name is <first name> <surname> (two words), we can pick the second word as the surname.

Implemented in latest commit. I haven't removed the long list of names/emails (under heading Authors/Owners), but can do this. I wonder if it should be in a new PR though, to limit scope. Happy to do this if you'd like!

pep0/pep.py

merwok

Thanks for your efforts!

AUTHORS.csv

merwok · 2020-10-23T18:17:53Z

Hi! I will finalize this as soon as I can then turn to the Sphinx PRs!

AA-Turner · 2020-10-23T20:26:54Z

Thanks! Seems @hugovk is also doing a bunch of pep-infra PRs, so if mine get approval will look to ensure no conflict.

hugovk · 2020-10-23T20:54:34Z

I don't think there will be much overlap, if any, but feel free to ping me if you've any questions!

ewdurbin · 2020-12-14T16:04:44Z

@merwok Is there a list of outstanding concerns to help finalize this PR? I'd enthusiastically work on them.

it looks like per Zen this ended up going the explicit route for overrides, is it just cleaning up the attempt at automatic handling?

AA-Turner · 2020-12-14T16:19:42Z

Re what this PR does, in comparison to #1385 this is pretty single-issue!

I added an explicit overrides file, and cleaned up a bit of the automatic handling.

merwok · 2020-12-14T16:29:19Z

I only wanted to test the branch locally to check the output, but I’ve had no time so far.

@ewdurbin please feel free to take over to approve and merge! thanks for the help.

the-knights-who-say-ni added the CLA signed label Apr 27, 2020

AA-Turner mentioned this pull request Apr 27, 2020

Build PEPs using Sphinx #2

Open

merwok reviewed Apr 28, 2020

View changes

AUTHORS.csv Outdated Show resolved Hide resolved

merwok approved these changes May 4, 2020

View changes

AA-Turner added 5 commits Apr 27, 2020

Fix name parsing in PEP 0

545bea8

Fixes as per comments

90bbb4c

Move to author metadata lookup for PEP index

a1013ce

Move CSV to comma separated

552a7b6

Fix Mark Williams

Loading status checks…

7a0b5b5

AA-Turner force-pushed the AA-Turner:fix-pep0-name-parsing branch from 532f361 to 7a0b5b5 May 16, 2020

Rollback name parsing changes and move to using author exception file…

Loading status checks…

3c6520d

… as per Victor's suggestion

merwok reviewed Jun 17, 2020

View changes

pep0/pep.py Outdated Show resolved Hide resolved

merwok reviewed Jun 17, 2020

View changes

Move more special cases to exceptions file

Loading status checks…

ee33701

merwok reviewed Jun 17, 2020

View changes

AUTHORS.csv Outdated Show resolved Hide resolved

AA-Turner added 2 commits Jun 21, 2020

python-dev nickname

Loading status checks…

efdaf15

Add duplicate names and de-duping logic

Loading status checks…

8f9db05

hugovk mentioned this pull request Dec 14, 2020

PEP 8102: Update Ee W. Durbin III's name #1732

Closed

python / peps

Fix PEP 0 name parsing #1386

Fix PEP 0 name parsing #1386

AA-Turner commented Apr 27, 2020

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

AA-Turner commented May 4, 2020

merwok commented May 4, 2020

AA-Turner commented May 4, 2020

vstinner commented Jun 17, 2020 •

edited by encukou

AA-Turner commented Jun 17, 2020

AA-Turner commented Jun 17, 2020

merwok left a comment

merwok commented Oct 23, 2020 •

edited

AA-Turner commented Oct 23, 2020

hugovk commented Oct 23, 2020

ewdurbin commented Dec 14, 2020

AA-Turner commented Dec 14, 2020

merwok commented Dec 14, 2020 •

edited

python / peps

Sponsor python/peps

Fix PEP 0 name parsing #1386

Fix PEP 0 name parsing #1386

Conversation

AA-Turner commented Apr 27, 2020

This comment has been minimized.

merwok Apr 28, 2020 Member

This comment has been minimized.

AA-Turner Apr 28, 2020 Author

This comment has been minimized.

merwok Apr 28, 2020 Member

This comment has been minimized.

AA-Turner Apr 28, 2020 • edited Author

This comment has been minimized.

merwok Apr 29, 2020 Member

This comment has been minimized.

merwok Jun 17, 2020 Member

This comment has been minimized.

AA-Turner Jun 17, 2020 Author

This comment has been minimized.

merwok Jun 17, 2020 Member

This comment has been minimized.

AA-Turner Jun 21, 2020 Author

AA-Turner commented May 4, 2020

merwok commented May 4, 2020

AA-Turner commented May 4, 2020

vstinner commented Jun 17, 2020 • edited by encukou

AA-Turner commented Jun 17, 2020

AA-Turner commented Jun 17, 2020

merwok left a comment

merwok commented Oct 23, 2020 • edited

AA-Turner commented Oct 23, 2020

hugovk commented Oct 23, 2020

ewdurbin commented Dec 14, 2020

AA-Turner commented Dec 14, 2020

merwok commented Dec 14, 2020 • edited

merwok Apr 28, 2020
Member

AA-Turner Apr 28, 2020
Author

merwok Apr 28, 2020
Member

AA-Turner Apr 28, 2020 •

edited

Author

merwok Apr 29, 2020
Member

merwok Jun 17, 2020
Member

AA-Turner Jun 17, 2020
Author

merwok Jun 17, 2020
Member

AA-Turner Jun 21, 2020
Author

vstinner commented Jun 17, 2020 •

edited by encukou

merwok commented Oct 23, 2020 •

edited

merwok commented Dec 14, 2020 •

edited