-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(feature) :lang special attribute syntax #895
Comments
This seems to duplicate #162 in part. I'll close that one and keep the link here. |
Well, this issue has been opened for almost 15 months. Although the proposal should be more complete, I want to focus in this minimal part: would it be possible that extended markdown has the special attribute I think that hardcoding the language tag in HTML should be avoided. And as a special attribute the user has less to type. Otherwise, a simple example such as:
should be written now in extended markdown:
instead of the simpler proposal:
The proposal has the following benefits:
@jgm, what do you think about this? |
Using |
I don't think there is high enough demand for language specific syntax but this is also something which would be easier with syntax for a generic span. |
Many thanks for your comment, @mpickering. I think pandoc needs first to be able to set the document language with only one attribute value for YAML |
I'm not sure either, but whatever the outcome of the discussion on syntax is, I feel that now that we have the mapping from For babel and polyglossia, e.g.:
If/when a consensus on a language specific syntax emerges, support for it could be added easily later. |
@nickbart1980 Are the mappings from |
I edited the table above to include the babel syntax; I have no idea about LuaTeX and ConTeXt, though. EDIT: It seems we might be able to use the babel syntax for polyglossia, too:
I haven’t tested this, however, and I’m not sure how well this works for language variants etc. EDIT 2: As I suspected: With polyglossia/xelatex, |
… or we could define our own commands using BCP47 tags, e.g. These could be defined in the latex template, using xparse, along these lines (this is an example for polyglossia only): \ExplSyntaxOn
\NewDocumentCommand{\IETFlang}{ m m }
{
\str_case:nnn { #1 }
{
{ ar } { \textarabic{#2} }
{ de-DE } { \textgerman{#2} }
{ de-AT } { \textgerman[variant=austrian]{#2} }
{ en-US } { \textenglish{#2} }
{ en-GB } { \textenglish[variant=british]{#2} }
{ fr-FR } { \textfrench{#2} }
% others
}
{
#2 % I~don't~know~what~to~do~with~`#1'
}
}
\ExplSyntaxOff |
If I recall correctly, in Context you use
|
It would be better to not pollute the templates with too much redefining and we already have the BCP47-to-polyglossia/babel functions in the LaTeX writer. But the problem with the other LaTeX engines is bothersome. Maybe we'll just have to decide that when people start writing |
I agree. But what else can we do if we want pandoc to create engine-agnostic LaTeX documents?
Fine with me. And if I understand it correctly, LuaLaTeX works with polyglossia, too. So we’d just map from |
I tend to favour the approach of just outputting the polyglossia commands. The only major downside I can think of if someone does |
Thoughts anyone? Also, should the |
+++ mb21 [Oct 10 15 03:55 ]:
I'm torn. I like the idea of emitting the commands So that makes me incline towards the idea of defining new Arguments against this: (a) it clutters up the I think we can eliminate concern (a) by generating the
Maybe we could use the following approach: set a default |
@jgm: Sounds all good to me. |
Well, we have support for the main language with babel, but not multilingual. But yeah... if we emit @nickbart1980 do you know which way has the easier/simpler LaTeX definitions? |
I’m not sure how to define polyglossia wrappers for the the babel commands though. What would work, relatively easy and transparent, is defining our own commands, e.g. The latex writer could put all required definitions into one pandoc variable, so they won’t clutter up the template (though they will of course appear in the document itself). |
I've started on this... @jgm is there a way to extract all lang attributes from both |
It's annoying. If I'd just made Attr a newtype, we could So for now I think you need to do two queries. But it
+++ mb21 [Oct 14 15 13:46 ]:
|
Also collect lang and dir attributes on spans and divs to set the lang, otherlangs and dir variables if they aren’t set already. See jgm#895.
Thanks jgm, makes sense... About the mapping... I implemented outputting the polyglossia commands, now trying to come up with LaTeX mappings from polyglossia to babel. I have it working for most languages with e.g.:
until I realized that for spanish, babel itself defines a |
For LaTeX, also collect lang and dir attributes on spans and divs to set the lang, otherlangs and dir variables if they aren’t set already. See jgm#895.
FWIW, a 2009 version of the babel manual, http://www.pvv.ntnu.no/~berland/latex/docs/babel.pdf contains references to |
Good tip, I seem to have |
I’m afraid |
Maybe you could use |
@nickbart1980 yeah, although I still think it should be possible to Edit: never mind, got a great answer over there that solves the issue. |
For LaTeX, also collect lang and dir attributes on spans and divs to set the lang, otherlangs and dir variables if they aren’t set already. See jgm#895.
For LaTeX, also collect lang and dir attributes on spans and divs to set the lang, otherlangs and dir variables if they aren’t set already. See jgm#895.
For LaTeX, also collect lang and dir attributes on spans and divs to set the lang, otherlangs and dir variables if they aren’t set already. See jgm#895.
For LaTeX, also collect lang and dir attributes on spans and divs to set the lang, otherlangs and dir variables if they aren’t set already. See jgm#895.
pandoc now emits |
@nickbart1980 thanks for the correction, is in pull #2481 |
For non-Latin scripts, it seems we need to add sensible font defaults and methods for overriding them. I’ve just been looking at xelatex/polyglossia so far: The default font used by xelatex (some form of Computer Modern, it seems), e.g., supports neither Greek nor Arabic - example:
Result:
Adding, e.g.,
to Interestingly enough, the My suggestion is to solve this issue by adding sensible defaults and also introducing pandoc variables like For Arabic, which does not have separate
For other scripts we will probably need separate variables, e.g., We could also try to use, by default, fonts that support more scripts; Times New Roman, e.g., seems to contain both Greek and Arabic (though here, again, an extra definition I’m not sure why |
BTW, my solution to the script problem in XeTeX (I'm still using pandoc 1.12.3.3) is to use the
I'm mentioning this just for reference, since a solution that understands and uses language tags is preferable for many reasons. (Note that I don't have non-latin scripts in the headers, so there aren't proper commands for setting up sans-serif fonts.) If you want to see this in action, have a look at the ttfautohint package (the PDF and the constructed pandoc input files are part of the release tarball only). |
For monolingual documents, setting the mainfont to a font that supports all the characters in the doc should solve the issue, right? And everyone who's serious about typesetting bilingual documents will need to manually select some fonts that work together anyway and can include the necessary
This may not be ideal (maybe we should mention it in the README). But if we were to include something like For ConTeXt it seems we could specify fallback fonts for certain unicode ranges. Unfortunately, while ucharclasses provides the same functionality for Polyglossia, it seems the So, I'm not sure there's much we can do (except mentioning this in the README, and maybe set up something for ConTeXt), although recommendations are welcome. |
I’d agree as far as serious work is concerned, but I’m more than a little worried if casual users trying to use a non-Latin script do not get any output but just an error message. In other words, I’d expect pandoc to generate some at least halfway decent output even if a user does not actively specify any non-Latin fonts at all. Couldn’t we introduce at least one So, whenever a document contains the language tag
would be added to These definitions could then still be overridden by, e.g.,
|
Indeed that would be nice, however currently not the case either: e.g. arab characters require you to specify
I see, indeed if they can be overriden (and they can, just tested) then I'm in favour as well. So I suppose now we need a good list of fonts widely available for lots of languages... |
@nickbart1980, the default font in XeTeX and LuaTeX (Latin Modern) is set by fontspec. Perhaps an issue for adding defaults for other languages should be added to its repository, to see whether a broader solution can be found? |
Could we also discuss the syntax for the language attribute?
If we could agree on this, the language special attribute could be already implemented in the elements that allow it. These are mainly titles and code. At least for code, when writing technical documents in other languages than English, it is extremely useful to be able to tag inline code as being written in English. Otherwise hyphenation for that part will be highly probable wrong. And when issue #168 will be solved, we would benefit a lot from the special syntax for language attributes in text divisions and spans. So, could we reach an agreement about the syntax for the language attribute? |
@jgm, after issue #168 is fixed, could we discuss this issue? This issue is older than #168. It comes comes from #162, [which I originally reported at https://code.google.com/archive/p/pandoc/issues/201 (more than six years ago). I think that And a comment on the issue: it is about the syntax (or at least, that was my original report). As I reported the original issue at Google Code, you discarded it because it looked like recreating LaTeX in pandoc (see comment 7 there). Well, time flies. And the vast majority of comments in this issue explain how LaTeX (its babel and polyglossia packages) deals with languages. I think we have to set a special language syntax first. |
I have tried to discuss it at the mailing list, but I guess this should be the proper place to discuss it (since I got no reply there). Special language syntax is needed to have different document sections in different languages, such as in: # The US Constitution {:en}
[English text]
# Das deutsche Grundgesetz {:de}
[deutscher Text] I think it is clear we need that special syntax for the language attribute. it is essential for multilingual documents. |
I want to echo on the use of ucharclasses @lemzwerg mentioned. Basically what I was trying to do is to modify the default pandoc latex template, together with the use of some new yaml variables, to provide a general method to setup ucharclasses in the yaml front matter. But it doesn't work so far, and would probably require some change in the pandoc program itself: Example PreambleFor example, I want this in the preamble: \usepackage[Latin, Greek, Hebrew]{ucharclasses}
\usepackage{xltxtra,xunicode}
\usepackage{unicode-math}
\newcommand{\latinfont}{\renewcommand\rmdefault{lmr}\renewcommand\sfdefault{lmss}\renewcommand\ttdefault{lmtt}\defaultfontfeatures[\rmfamily,\sffamily]{Ligatures=TeX}}
\setTransitionsForLatin{\latinfont}{}
\newfontfamily\greekfont{Cardo} % Download at http://scholarsfonts.net/cardofnt.html
\setTransitionsForGreek{\greekfont}{}
\newfontfamily\hebrewfont{Cardo} % same as above
\setTransitionsFor{Hebrew}{\hebrewfont\setRTL}{\setLTR} Modifying Pandoc Template for LaTeXI modified the default latex template starting from line 85, in the section relevant to lang, like this (basically inserted \ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[shorthands=off,$for(babel-otherlangs)$$babel-otherlangs$,$endfor$main=$babel-lang$]{babel}
$if(babel-newcommands)$
$babel-newcommands$
$endif$
\else
$if(ucharclasses)$
\usepackage[Latin,$for(polyglossia-otherlangs)$$polyglossia-otherlangs.name$$sep$,$endfor$]{ucharclasses}
\usepackage{xltxtra,xunicode}
\usepackage{unicode-math}
\newcommand{\latinfont}{\renewcommand\rmdefault{lmr}\renewcommand\sfdefault{lmss}\renewcommand\ttdefault{lmtt}\defaultfontfeatures[\rmfamily,\sffamily]{Ligatures=TeX}}
\setTransitionsForLatin{\latinfont}{}
$for(polyglossia-otherlangs)$
\newfontfamily\$polyglossia-otherlangs.name$font{$$polyglossia-otherlangs.name$font$}
\setTransitionsFor{$polyglossia-otherlangs.name$}{\$polyglossia-otherlangs.name$font}{}
$endfor$
$else$
\usepackage{polyglossia}
\setmainlanguage[$polyglossia-lang.options$]{$polyglossia-lang.name$}
$for(polyglossia-otherlangs)$
\setotherlanguage[$polyglossia-otherlangs.options$]{$polyglossia-otherlangs.name$}
$endfor$
$endif$
\fi YAML Front MatterI then put the following in the front matter of the pandoc file: lang: en
otherlangs: [el,he]
ucharclasses: true
greekfont: Cardo
hebrewfont: Cardo Generated TeXThe resulted generated TeX file is: \ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[shorthands=off,greek,hebrew,main=english]{babel}
\else
\usepackage[Latin,greek,hebrew]{ucharclasses}
\usepackage{xltxtra,xunicode}
\usepackage{unicode-math}
\newcommand{\latinfont}{\renewcommand\rmdefault{lmr}\renewcommand\sfdefault{lmss}\renewcommand\ttdefault{lmtt}\defaultfontfeatures[\rmfamily,\sffamily]{Ligatures=TeX}}
\setTransitionsForLatin{\latinfont}{}
\newfontfamily\greekfont{$polyglossia-otherlangs.name}
\setTransitionsFor{greek}{\greekfont}{}
\newfontfamily\hebrewfont{$polyglossia-otherlangs.name}
\setTransitionsFor{hebrew}{\hebrewfont}{}
\fi Problems"Nested" Pandoc Variables?Comparing [Example Preamble] to [Generated TeX], the What I wanted to do is to use Any idea how to use nested variables?
|
Obsoleted by #3451 |
Hi John,
this comes from #675.
This is about enabling language notation for document parts.
The notation could be (in the spirit of
.class
and#identifier
):I think a new element
RawSpan
would be also needed to add language notation in some passages (not all of them, but some).Many thanks for your excellent work,
Pablo
The text was updated successfully, but these errors were encountered: