PDF build issue #35

JulienPalard · 2022-04-13T20:00:37Z

Since #34 and #31 I have issues building PDFs on docs.python.org, it can easily be reproduced using https://github.com/python/docsbuild-scripts/ as:

./build_docs.py --build-root ./build_root --www-root ./www --log-directory ./logs --group $(id -g) --skip-cache-invalidation --language ja --branch 3.9

(you can easily try other branches by changing the --branch argument)

The text was updated successfully, but these errors were encountered:

JulienPalard · 2023-01-02T16:44:02Z

This looks like to completly block the update of docs.python.org/ja/, as the make invocation fails, the docsbuild-script does not rsync the output.

See #35

take6 · 2023-01-09T00:33:02Z

I reproduced the error with Docker container based on Ubuntu 22.04. Here is the contents of Dockerfile. Essential part to reproduce the issue is a list of packages installed by apt-get.

FROM ubuntu:22.04

ENV DEBIAN_FRONTEND=noninteractive

WORKDIR /pydoc

RUN apt update \
    && apt-get install -y python3.11 python3-pip python3.11-venv git rsync zip \
    latexmk xindy texinfo \
    texlive-xetex texlive-latex-recommended texlive-fonts-extra texlive-lang-japanese \
    && apt-get clean
RUN python3.11 -m venv /pydoc/pydoc-venv

take6 · 2023-01-09T06:02:47Z

It turned out that U+C4CF is Korean character, 쓏.

http://www.unicode-symbol.com/u/C4CF.html

So, I attempted to use package kotex by editing docsbuild-scripts/build_docs.py. This is quite ad hoc way but it appears to be working.

PLATEX_DEFAULT = (
    "-D latex_engine=platex",
    "-D latex_elements.inputenc=",
    "-D latex_elements.fontenc=",
    r"-D latex_elements.preamble=\\usepackage{kotex}",
)

However, the build on my environment was failed due to another error, "TeX capacity exceeded, sorry [input stack size=5000].", which may be indicating the lack of enough memory. Maybe I can open pull request to docsbuild-scripts repo later so that anyone can try this (awkward) fix.

methane · 2023-01-09T07:30:59Z

Curiously, there are no "\uC4CF" in Japanese html doc.
I don't know why LaTeX searches this character.

m-aciek · 2023-01-09T07:49:47Z

For reference, this also has been discussed here: texjporg/platex#84

take6 · 2023-01-09T07:58:06Z

Curiously, there are no "\uC4CF" in Japanese html doc.
I don't know why LaTeX searches this character.

That is true. This is the reason why I call my fix "ad hoc way", which just avoid the error instead of fixing essential problem.

take6 · 2023-01-09T08:27:33Z

Pushed branch.

https://github.com/take6/docsbuild-scripts/tree/fix-japanese-doc-build-error

Could anyone try if it works?

take6 · 2023-01-09T10:31:14Z

Created pull request.

methane · 2023-01-09T12:08:19Z

I failed to build PDF with this error.

LaTeX Warning: Hyper reference `library/socket:module-socket' on page 4 undefin
ed on input line 167.

! TeX capacity exceeded, sorry [parameter stack size=10000].
\@inmathwarn #1->
                 \ifmmode \@latex@warning {Command \protect #1 invalid in ma...
l.170 ...tml\textgreater{}}{Emscripten Networking}
                                                  ^^M
If you really absolutely need more capacity,
you can ask a wizard to enlarge me.

atsuoishimoto · 2023-01-15T10:53:12Z

Building 3.9/10 branch(./build_docs.py ... --branch 3.9 or 3.10)

Causes following error. I confirmed the error in c-api and library, but it might happen in other document too.

"Improper discretionary list" error when creating divs

Building 3.11 branch

Error on dvi->pdf conversion while building files following(Actual Unicode character may vary): howto-unicode.pdf, howto-regex.pdf, whatsnew.pdf

! LaTeX Error: Unicode character 顛 (U+C4CF)
not set up for use with LaTeX
Error on dvi->pdf conversion while building library.pdf

library.pdf ! TeX capacity exceeded, sorry [input stack size=5000].

atsuoishimoto · 2023-01-15T14:21:47Z

Unicode character error in howto-regex is caused by Non-ASCII/Non-Japanese letters in the IGNORECASE section (https://docs.python.org/3/howto/regex.html#compilation-flags).

These Unicode letters are introduced in 2017(python/cpython@cd195e2).

I wonder why build starts failing. Are the build procedures changed?

atsuoishimoto · 2023-01-17T04:37:37Z

With python/docsbuild-scripts#145, I managed to build PDFs other than library.pdf.

To build library.pdf, I had to remove two occurrences of � (U+FFFD, the official REPLACEMENT CHARACTER) letters in the codecs.rst.

atsuoishimoto · 2023-01-17T05:05:54Z

Error in codecs.rst

! String contains an invalid utf-8 sequence.
l.13748 decoding, use \sphinxcode{\sphinxupquote{
                                               �}} (U+FFFD, the official
?
! Emergency stop.

rest src:

 decoding, use ``�`` (U+FFFD, the official

Generated TeX src

decoding, use \\sphinxcode{\\sphinxupquote{\xef\xbf\xbd}} (U+FFFD, the official

atsuoishimoto · 2023-01-17T05:09:33Z

Possible fix:

https://tex.stackexchange.com/questions/403769/lstlisting-gives-error-string-contains-an-invalid-utf-8-sequence

methane · 2023-01-17T07:44:25Z

To build library.pdf, I had to remove two occurrences of � (U+FFFD, the official REPLACEMENT CHARACTER) letters in the codecs.rst.

I think we should remove the character from the official doc.

JulienPalard · 2023-01-17T14:37:04Z

Multiple U+FFFD is used twice in codecs.rst:

$ git grep $'\xef\xbf\xbd'
Doc/library/codecs.rst:|                         | decoding, use ``�`` (U+FFFD, the official     |
Doc/library/codecs.rst:   Substitutes ``?`` (ASCII character) for encoding errors or ``�`` (U+FFFD,

methane · 2023-01-17T15:58:14Z

All other languages can show U+FFFD. And most fonts has glyph for it.
So it seems just a Japanese LaTeX issue. No strong reason to prohibit it in Python doc.

atsuoishimoto · 2023-01-20T01:59:30Z

Here's a minimum example we should build for a Japanese PDF document.

TeX source: sample.tex

\documentclass[a4paper,10pt,dvipdfmx]{ujreport}
\usepackage[T1]{fontenc}

\usepackage[noto-otc]{pxchfon}

\usepackage[utf8]{inputenc}
\usepackage[german]{babel}

\begin{document}

こんにちは

ſ:  (U+017F, LATIN SMALL LETTER LONG S) <- LaTeX Error: Unicode character ſ (U+017F) not set up for use with LaTeX.

�:  (U+FFFD, REPLACEMENT CHARACTER). <- Undefined control sequence

K: (U+212A, KELVIN SIGN) <- No error in uplatex, but dvipdfmx show warning
[1
dvipdfmx:warning: No character mapping available.
 CMap name: NotoSerifCJK-Regular.ttc:0:jp90-UCS4-H
  input str: <0000212a>
  ]

\end{document}

Build command:

$ uplatex sample.tex
$ dvipdfmx sample.dvi

Daku-on · 2023-01-20T02:41:33Z

Hope this help. (I can generate dvi file but cannot open. I'll make sure of the situation)
https://tex.stackexchange.com/questions/448465/unicode-character-%C5%BF-u17f-in-lyx-2-3

atsuoishimoto · 2023-01-20T02:47:29Z

Hope this help. (I can generate dvi file but cannot open. I'll make sure of the situation)

Thank you very much! It makes to render the ſ in the PDF!!!!!

Daku-on · 2023-01-20T02:58:38Z

Great!
I have to go now. But if there remains some problems when I go back home I'll check and try them later.

This was referenced Apr 13, 2022

Unicode character 顛 (U+C4CF) not set up for use with LaTeX. #31

Closed

Improper discretionary list #34

Closed

JulienPalard mentioned this issue Jan 2, 2023

Japanese translation has not been updated since 2022-08-01 python/docsbuild-scripts#142

Closed

This comment was marked as outdated.

Sign in to view

take6 mentioned this issue Jan 9, 2023

Fix japanese doc build error python/docsbuild-scripts#144

Merged

atsuoishimoto added a commit to atsuoishimoto/docsbuild-scripts that referenced this issue Jan 17, 2023

python/python-docs-ja#35 fix Unicode character

84386ed

atsuoishimoto mentioned this issue Jan 17, 2023

Fix Unicode character error building Japanese PDF documents python/docsbuild-scripts#145

Open

PDF build issue #35

PDF build issue #35

JulienPalard commented Apr 13, 2022

JulienPalard commented Jan 2, 2023

This comment was marked as outdated.

take6 commented Jan 9, 2023 •

edited

take6 commented Jan 9, 2023

methane commented Jan 9, 2023

m-aciek commented Jan 9, 2023

take6 commented Jan 9, 2023

take6 commented Jan 9, 2023

take6 commented Jan 9, 2023

methane commented Jan 9, 2023

atsuoishimoto commented Jan 15, 2023

atsuoishimoto commented Jan 15, 2023 •

edited

atsuoishimoto commented Jan 17, 2023

atsuoishimoto commented Jan 17, 2023

atsuoishimoto commented Jan 17, 2023 •

edited

methane commented Jan 17, 2023

JulienPalard commented Jan 17, 2023

methane commented Jan 17, 2023

atsuoishimoto commented Jan 20, 2023 •

edited

Daku-on commented Jan 20, 2023 •

edited

atsuoishimoto commented Jan 20, 2023 •

edited

Daku-on commented Jan 20, 2023 •

edited

PDF build issue #35

PDF build issue #35

Comments

JulienPalard commented Apr 13, 2022

JulienPalard commented Jan 2, 2023

This comment was marked as outdated.

take6 commented Jan 9, 2023 • edited

take6 commented Jan 9, 2023

methane commented Jan 9, 2023

m-aciek commented Jan 9, 2023

take6 commented Jan 9, 2023

take6 commented Jan 9, 2023

take6 commented Jan 9, 2023

methane commented Jan 9, 2023

atsuoishimoto commented Jan 15, 2023

Building 3.9/10 branch(./build_docs.py ... --branch 3.9 or 3.10)

Building 3.11 branch

atsuoishimoto commented Jan 15, 2023 • edited

atsuoishimoto commented Jan 17, 2023

atsuoishimoto commented Jan 17, 2023

atsuoishimoto commented Jan 17, 2023 • edited

methane commented Jan 17, 2023

JulienPalard commented Jan 17, 2023

methane commented Jan 17, 2023

atsuoishimoto commented Jan 20, 2023 • edited

Daku-on commented Jan 20, 2023 • edited

atsuoishimoto commented Jan 20, 2023 • edited

Daku-on commented Jan 20, 2023 • edited

take6 commented Jan 9, 2023 •

edited

atsuoishimoto commented Jan 15, 2023 •

edited

atsuoishimoto commented Jan 17, 2023 •

edited

atsuoishimoto commented Jan 20, 2023 •

edited

Daku-on commented Jan 20, 2023 •

edited

atsuoishimoto commented Jan 20, 2023 •

edited

Daku-on commented Jan 20, 2023 •

edited