Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF build issue #35

Open
JulienPalard opened this issue Apr 13, 2022 · 22 comments
Open

PDF build issue #35

JulienPalard opened this issue Apr 13, 2022 · 22 comments

Comments

@JulienPalard
Copy link
Member

Since #34 and #31 I have issues building PDFs on docs.python.org, it can easily be reproduced using https://github.com/python/docsbuild-scripts/ as:

./build_docs.py --build-root ./build_root --www-root ./www --log-directory ./logs --group $(id -g) --skip-cache-invalidation --language ja --branch 3.9

(you can easily try other branches by changing the --branch argument)

@JulienPalard
Copy link
Member Author

This looks like to completly block the update of docs.python.org/ja/, as the make invocation fails, the docsbuild-script does not rsync the output.

See #35

@methane

This comment was marked as outdated.

@take6
Copy link

take6 commented Jan 9, 2023

I reproduced the error with Docker container based on Ubuntu 22.04. Here is the contents of Dockerfile. Essential part to reproduce the issue is a list of packages installed by apt-get.

FROM ubuntu:22.04

ENV DEBIAN_FRONTEND=noninteractive

WORKDIR /pydoc

RUN apt update \
    && apt-get install -y python3.11 python3-pip python3.11-venv git rsync zip \
    latexmk xindy texinfo \
    texlive-xetex texlive-latex-recommended texlive-fonts-extra texlive-lang-japanese \
    && apt-get clean
RUN python3.11 -m venv /pydoc/pydoc-venv

@take6
Copy link

take6 commented Jan 9, 2023

It turned out that U+C4CF is Korean character, 쓏.

http://www.unicode-symbol.com/u/C4CF.html

So, I attempted to use package kotex by editing docsbuild-scripts/build_docs.py. This is quite ad hoc way but it appears to be working.

PLATEX_DEFAULT = (
    "-D latex_engine=platex",
    "-D latex_elements.inputenc=",
    "-D latex_elements.fontenc=",
    r"-D latex_elements.preamble=\\usepackage{kotex}",
)

However, the build on my environment was failed due to another error, "TeX capacity exceeded, sorry [input stack size=5000].", which may be indicating the lack of enough memory. Maybe I can open pull request to docsbuild-scripts repo later so that anyone can try this (awkward) fix.

@methane
Copy link
Member

methane commented Jan 9, 2023

Curiously, there are no "\uC4CF" in Japanese html doc.
I don't know why LaTeX searches this character.

@m-aciek
Copy link
Contributor

m-aciek commented Jan 9, 2023

For reference, this also has been discussed here: texjporg/platex#84

@take6
Copy link

take6 commented Jan 9, 2023

Curiously, there are no "\uC4CF" in Japanese html doc.
I don't know why LaTeX searches this character.

That is true. This is the reason why I call my fix "ad hoc way", which just avoid the error instead of fixing essential problem.

@take6
Copy link

take6 commented Jan 9, 2023

Pushed branch.

https://github.com/take6/docsbuild-scripts/tree/fix-japanese-doc-build-error

Could anyone try if it works?

@take6
Copy link

take6 commented Jan 9, 2023

Created pull request.

@methane
Copy link
Member

methane commented Jan 9, 2023

I failed to build PDF with this error.

LaTeX Warning: Hyper reference `library/socket:module-socket' on page 4 undefin
ed on input line 167.

! TeX capacity exceeded, sorry [parameter stack size=10000].
\@inmathwarn #1->
                 \ifmmode \@latex@warning {Command \protect #1 invalid in ma...
l.170 ...tml\textgreater{}}{Emscripten Networking}
                                                  ^^M
If you really absolutely need more capacity,
you can ask a wizard to enlarge me.

@atsuoishimoto
Copy link
Contributor

Building 3.9/10 branch(./build_docs.py ... --branch 3.9 or 3.10)

Causes following error. I confirmed the error in c-api and library, but it might happen in other document too.

"Improper discretionary list" error when creating divs

Building 3.11 branch

  • Error on dvi->pdf conversion while building files following(Actual Unicode character may vary): howto-unicode.pdf, howto-regex.pdf, whatsnew.pdf

    ! LaTeX Error: Unicode character 顛 (U+C4CF)
    not set up for use with LaTeX

  • Error on dvi->pdf conversion while building library.pdf

    library.pdf ! TeX capacity exceeded, sorry [input stack size=5000].

@atsuoishimoto
Copy link
Contributor

atsuoishimoto commented Jan 15, 2023

Unicode character error in howto-regex is caused by Non-ASCII/Non-Japanese letters in the IGNORECASE section (https://docs.python.org/3/howto/regex.html#compilation-flags).

These Unicode letters are introduced in 2017(python/cpython@cd195e2).

I wonder why build starts failing. Are the build procedures changed?

@atsuoishimoto
Copy link
Contributor

With python/docsbuild-scripts#145, I managed to build PDFs other than library.pdf.

To build library.pdf, I had to remove two occurrences of (U+FFFD, the official REPLACEMENT CHARACTER) letters in the codecs.rst.

@atsuoishimoto
Copy link
Contributor

Error in codecs.rst

! String contains an invalid utf-8 sequence.
l.13748 decoding, use \sphinxcode{\sphinxupquote{
                                               �}} (U+FFFD, the official
?
! Emergency stop.

rest src:

 decoding, use ``�`` (U+FFFD, the official

Generated TeX src

decoding, use \\sphinxcode{\\sphinxupquote{\xef\xbf\xbd}} (U+FFFD, the official

@atsuoishimoto
Copy link
Contributor

atsuoishimoto commented Jan 17, 2023

@methane
Copy link
Member

methane commented Jan 17, 2023

To build library.pdf, I had to remove two occurrences of (U+FFFD, the official REPLACEMENT CHARACTER) letters in the codecs.rst.

I think we should remove the character from the official doc.

@JulienPalard
Copy link
Member Author

Multiple U+FFFD is used twice in codecs.rst:

$ git grep $'\xef\xbf\xbd'
Doc/library/codecs.rst:|                         | decoding, use ```` (U+FFFD, the official     |
Doc/library/codecs.rst:   Substitutes ``?`` (ASCII character) for encoding errors or ```` (U+FFFD,

@methane
Copy link
Member

methane commented Jan 17, 2023

All other languages can show U+FFFD. And most fonts has glyph for it.
So it seems just a Japanese LaTeX issue. No strong reason to prohibit it in Python doc.

@atsuoishimoto
Copy link
Contributor

atsuoishimoto commented Jan 20, 2023

Here's a minimum example we should build for a Japanese PDF document.

TeX source: sample.tex

\documentclass[a4paper,10pt,dvipdfmx]{ujreport}
\usepackage[T1]{fontenc}

\usepackage[noto-otc]{pxchfon}

\usepackage[utf8]{inputenc}
\usepackage[german]{babel}

\begin{document}

こんにちは

ſ:  (U+017F, LATIN SMALL LETTER LONG S) <- LaTeX Error: Unicode character ſ (U+017F) not set up for use with LaTeX.

�:  (U+FFFD, REPLACEMENT CHARACTER). <- Undefined control sequence

K: (U+212A, KELVIN SIGN) <- No error in uplatex, but dvipdfmx show warning
[1
dvipdfmx:warning: No character mapping available.
 CMap name: NotoSerifCJK-Regular.ttc:0:jp90-UCS4-H
  input str: <0000212a>
  ]

\end{document}

Build command:

$ uplatex sample.tex
$ dvipdfmx sample.dvi

@Daku-on
Copy link

Daku-on commented Jan 20, 2023

Hope this help. (I can generate dvi file but cannot open. I'll make sure of the situation)
https://tex.stackexchange.com/questions/448465/unicode-character-%C5%BF-u17f-in-lyx-2-3

@atsuoishimoto
Copy link
Contributor

atsuoishimoto commented Jan 20, 2023

Hope this help. (I can generate dvi file but cannot open. I'll make sure of the situation)

Thank you very much! It makes to render the ſ in the PDF!!!!!

@Daku-on
Copy link

Daku-on commented Jan 20, 2023

Great!
I have to go now. But if there remains some problems when I go back home I'll check and try them later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants