\documentclass[twoside,letterpaper]{rapport3}

%\nofiles

\usepackage{comment,makeidx}

\usepackage{times}
\renewcommand{\ttdefault}{cmtt}

\usepackage[plainpages=true,pagebackref=true]{hyperref}

\usepackage{german}
% german
\righthyphenmin=3
\mdqoff
\captionsenglish
\makeindex

\usepackage{fancyhdr}
% headers & footers
\pagestyle{fancy}
% foot
\lfoot[\thepage]{\protect\small\protect\it Victor Eijkhout -- \protect\TeX\ by Topic}
\rfoot[{\protect\small\protect\it Victor Eijkhout -- \protect\TeX\ by Topic}]{\thepage}
\cfoot{}
% head
\lhead[\let\\\relax \let\uppercase\relax \leftmark]{\relax}
\chead{}
\rhead[\relax]{\let\\\relax \let\uppercase\relax \rightmark}

\newdimen\tempdima \newdimen\tempdimb

% these are fine
\def\nl{\protect\\}\def\n#1{{\tt #1}}\def\cs#1{{\tt\char`\\#1}}\let\csc\cs
\def\lb{{\tt\char`\{}}\def\rb{{\tt\char`\}}}
\def\gr#1{$\langle$#1$\rangle$}\def\key#1{{\tt#1}}
\def\alt{}\def\altt{}%this way in manstijl
\def\ldash{\unskip\ --\nobreak\ \ignorespaces}
\def\rdash{\unskip\nobreak\ --\ \ignorespaces}
% check these
\def\hex{{\tt"}}
\def\ascii{{\sc ascii}}
\def\ebcdic{{\sc ebcdic}}
\def\IniTeX{Ini\TeX}\def\LamsTeX{LAMS\TeX}\def\VirTeX{Vir\TeX}
\def\AmsTeX{Ams\TeX}
\def\TeXbook{the \TeX\ book}\def\web{{\sc web}}
% needs major thinking
\newenvironment{disp}{\begin{quotation}}{\end{quotation}}
\newenvironment{Disp}{\begin{quotation}}{\end{quotation}}
\newenvironment{tdisp}{\begin{quotation}}{\end{quotation}}
\newenvironment{example}{\begin{quotation}}{\end{quotation}}
\newenvironment{inventory}{\begin{description}}{\end{description}}
\newenvironment{glossinventory}{\begin{description}}{\end{description}}
\def\gram#1{\gr{#1}}%???
%
% index
%
\def\term#1\par{\index{#1}}
\def\howto#1\par{}
\def\cstoidx#1\par{\index{#1@\cs{#1}@}}
\def\csterm#1\par{\cstoidx #1\par\cs{#1}}
\def\csidx#1{\cstoidx #1\par\cs{#1}}

\begin{document}

\def\tmc{\tracingmacros=2 \tracingcommands\tracingmacros}

%%%%%%%%%%%%%%%%%%%
\makeatletter
\def\snugbox{\hbox\bgroup\setbox\z@\vbox\bgroup
    \leftskip\z@
    \bgroup\aftergroup\make@snug
    \let\next=}
\def\make@snug{\par\sn@gify\egroup \box\z@\egroup}
\def\sn@gify
   {\skip\z@=\lastskip \unskip
    \advance\skip\z@\lastskip \unskip
    \unpenalty
    \setbox\z@\lastbox
    \ifvoid\z@ \nointerlineskip \else {\sn@gify} \fi
    \hbox{\unhbox\z@}\nointerlineskip
    \vskip\skip\z@
    }

\def\figfont{\SansSerif \PointSize:8 \Style:roman }

\newdimen\fbh \fbh=60pt % dimension for easy scaling:
\newdimen\fbw \fbw=60pt % height and width of character box

\newdimen\dh \newdimen\dw % height and width of current character box
\newdimen\lh % height of previous character box
\newdimen\lw \lw=.4pt % line weight, instead of default .4pt

\def\hdotfill{\noindent
    \leaders\hbox{\vrule width 1pt height\lw 
                  \kern4pt 
                  \vrule width.5pt height\lw}\hfill\hbox{}
    \par}
\def\hlinefill{\noindent
    \leaders\hbox{\vrule width 5.5pt height\lw         }\hfill\hbox{}
    \par}
\def\stippel{$\qquad\qquad\qquad\qquad$}
\makeatother
%%%%%%%%%%%%%%%%%%%

\begin{comment}
\def\SansSerif{\Typeface:macHelvetica }
\def\SerifFont{\Typeface:macTimes }
\def\SansSerif{\Typeface:bsGillSans }
\def\SerifFont{\Typeface:bsBaskerville }
\end{comment}
\let\SansSerif\relax \def\italic{\it}
\let\SerifFont\relax \def\MainFont{\rm}
\let\SansSerif\relax
\let\SerifFont\relax
\let\PopIndentLevel\relax \let\PushIndentLevel\relax
\let\ToVerso\relax \let\ToRecto\relax

\begin{comment}
\def\stop@command@suffix{stop}
\let\PopListLevel\PopIndentLevel
\let\FlushRight\relax
\let\flushright\FlushRight
\let\SetListIndent\LevelIndent
\def\awp{\ifhmode\vadjust{\penalty-10000 }\else
    \penalty-10000 \fi}
\end{comment}
\let\awp\relax
\let\PopIndentLevel\relax \let\PopListLevel\relax

\showboxdepth=-1

\def\endofchapter{\vfill\noindent}

\title{\TeX\ by Topic, A \TeX nician's Reference}
\date{}
\author{Victor Eijkhout}
\maketitle
  \begin{minipage}[h]{1.0\linewidth}
  Copyright \copyright\  2007 Victor Eijkhout.\\
  Permission is granted to copy, distribute and/or modify this document
  under the terms of the GNU Free Documentation License, Version 1.2
  or any later version published by the Free Software Foundation;
  with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
  Texts.  A copy of the license is included in the section entitled "GNU
  Free Documentation License".
\medskip
This document is based on the book \TeX\ by Topic,
copyright 1991-2007 Victor Eijkhout. This book was
printed in~1991 by Addison-Wesley UK, ISBN 0-201-56882-9, reprinted
in~1993, pdf version first made freely available in~2001.
  \end{minipage}

\tableofcontents

\pagebreak
\addcontentsline{toc}{section}{License}
\paragraph*{\bf License}
GNU Free Documentation License

Version 1.2, November 2002

  Copyright \copyright\ 2000,2001,2002  Free Software Foundation, Inc.
  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
  Everyone is permitted to copy and distribute verbatim copies
  of this license document, but changing it is not allowed.

0. PREAMBLE

The purpose of this License is to make a manual, textbook, or other
functional and useful document "free" in the sense of freedom: to
assure everyone the effective freedom to copy and redistribute it,
with or without modifying it, either commercially or noncommercially.
Secondarily, this License preserves for the author and publisher a way
to get credit for their work, while not being considered responsible
for modifications made by others.

This License is a kind of "copyleft", which means that derivative
works of the document must themselves be free in the same sense. It
complements the GNU General Public License, which is a copyleft
license designed for free software.

We have designed this License in order to use it for manuals for free
software, because free software needs free documentation: a free
program should come with manuals providing the same freedoms that the
software does. But this License is not limited to software manuals; it
can be used for any textual work, regardless of subject matter or
whether it is published as a printed book. We recommend this License
principally for works whose purpose is instruction or reference.

1. APPLICABILITY AND DEFINITIONS

This License applies to any manual or other work, in any medium, that
contains a notice placed by the copyright holder saying it can be
distributed under the terms of this License. Such a notice grants a
world-wide, royalty-free license, unlimited in duration, to use that
work under the conditions stated herein. The "Document", below, refers
to any such manual or work. Any member of the public is a licensee,
and is addressed as "you". You accept the license if you copy, modify
or distribute the work in a way requiring permission under copyright
law.

A "Modified Version" of the Document means any work containing the
Document or a portion of it, either copied verbatim, or with
modifications and/or translated into another language.

A "Secondary Section" is a named appendix or a front-matter section of
the Document that deals exclusively with the relationship of the
publishers or authors of the Document to the Document's overall
subject (or to related matters) and contains nothing that could fall
directly within that overall subject. (Thus, if the Document is in
part a textbook of mathematics, a Secondary Section may not explain
any mathematics.) The relationship could be a matter of historical
connection with the subject or with related matters, or of legal,
commercial, philosophical, ethical or political position regarding
them.

The "Invariant Sections" are certain Secondary Sections whose titles
are designated, as being those of Invariant Sections, in the notice
that says that the Document is released under this License. If a
section does not fit the above definition of Secondary then it is not
allowed to be designated as Invariant. The Document may contain zero
Invariant Sections. If the Document does not identify any Invariant
Sections then there are none.

The "Cover Texts" are certain short passages of text that are listed,
as Front-Cover Texts or Back-Cover Texts, in the notice that says that
the Document is released under this License. A Front-Cover Text may be
at most 5 words, and a Back-Cover Text may be at most 25 words.

A "Transparent" copy of the Document means a machine-readable copy,
represented in a format whose specification is available to the
general public, that is suitable for revising the document
straightforwardly with generic text editors or (for images composed of
pixels) generic paint programs or (for drawings) some widely available
drawing editor, and that is suitable for input to text formatters or
for automatic translation to a variety of formats suitable for input
to text formatters. A copy made in an otherwise Transparent file
format whose markup, or absence of markup, has been arranged to thwart
or discourage subsequent modification by readers is not Transparent.
An image format is not Transparent if used for any substantial amount
of text. A copy that is not "Transparent" is called "Opaque".

Examples of suitable formats for Transparent copies include plain
ASCII without markup, Texinfo input format, LaTeX input format, SGML
or XML using a publicly available DTD, and standard-conforming simple
HTML, PostScript or PDF designed for human modification. Examples of
transparent image formats include PNG, XCF and JPG. Opaque formats
include proprietary formats that can be read and edited only by
proprietary word processors, SGML or XML for which the DTD and/or
processing tools are not generally available, and the
machine-generated HTML, PostScript or PDF produced by some word
processors for output purposes only.

The "Title Page" means, for a printed book, the title page itself,
plus such following pages as are needed to hold, legibly, the material
this License requires to appear in the title page. For works in
formats which do not have any title page as such, "Title Page" means
the text near the most prominent appearance of the work's title,
preceding the beginning of the body of the text.

A section "Entitled XYZ" means a named subunit of the Document whose
title either is precisely XYZ or contains XYZ in parentheses following
text that translates XYZ in another language. (Here XYZ stands for a
specific section name mentioned below, such as "Acknowledgements",
"Dedications", "Endorsements", or "History".) To "Preserve the Title"
of such a section when you modify the Document means that it remains a
section "Entitled XYZ" according to this definition.

The Document may include Warranty Disclaimers next to the notice which
states that this License applies to the Document. These Warranty
Disclaimers are considered to be included by reference in this
License, but only as regards disclaiming warranties: any other
implication that these Warranty Disclaimers may have is void and has
no effect on the meaning of this License.

2. VERBATIM COPYING

You may copy and distribute the Document in any medium, either
commercially or noncommercially, provided that this License, the
copyright notices, and the license notice saying this License applies
to the Document are reproduced in all copies, and that you add no
other conditions whatsoever to those of this License. You may not use
technical measures to obstruct or control the reading or further
copying of the copies you make or distribute. However, you may accept
compensation in exchange for copies. If you distribute a large enough
number of copies you must also follow the conditions in section 3.

You may also lend copies, under the same conditions stated above, and
you may publicly display copies.

3. COPYING IN QUANTITY

If you publish printed copies (or copies in media that commonly have
printed covers) of the Document, numbering more than 100, and the
Document's license notice requires Cover Texts, you must enclose the
copies in covers that carry, clearly and legibly, all these Cover
Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
the back cover. Both covers must also clearly and legibly identify you
as the publisher of these copies. The front cover must present the
full title with all words of the title equally prominent and visible.
You may add other material on the covers in addition. Copying with
changes limited to the covers, as long as they preserve the title of
the Document and satisfy these conditions, can be treated as verbatim
copying in other respects.

If the required texts for either cover are too voluminous to fit
legibly, you should put the first ones listed (as many as fit
reasonably) on the actual cover, and continue the rest onto adjacent
pages.

If you publish or distribute Opaque copies of the Document numbering
more than 100, you must either include a machine-readable Transparent
copy along with each Opaque copy, or state in or with each Opaque copy
a computer-network location from which the general network-using
public has access to download using public-standard network protocols
a complete Transparent copy of the Document, free of added material.
If you use the latter option, you must take reasonably prudent steps,
when you begin distribution of Opaque copies in quantity, to ensure
that this Transparent copy will remain thus accessible at the stated
location until at least one year after the last time you distribute an
Opaque copy (directly or through your agents or retailers) of that
edition to the public.

It is requested, but not required, that you contact the authors of the
Document well before redistributing any large number of copies, to
give them a chance to provide you with an updated version of the
Document.

4. MODIFICATIONS

You may copy and distribute a Modified Version of the Document under
the conditions of sections 2 and 3 above, provided that you release
the Modified Version under precisely this License, with the Modified
Version filling the role of the Document, thus licensing distribution
and modification of the Modified Version to whoever possesses a copy
of it. In addition, you must do these things in the Modified Version:

A. Use in the Title Page (and on the covers, if any) a title distinct
from that of the Document, and from those of previous versions (which
should, if there were any, be listed in the History section of the
Document). You may use the same title as a previous version if the
original publisher of that version gives permission.  B. List on the
Title Page, as authors, one or more persons or entities responsible
for authorship of the modifications in the Modified Version, together
with at least five of the principal authors of the Document (all of
its principal authors, if it has fewer than five), unless they release
you from this requirement.  C. State on the Title page the name of the
publisher of the Modified Version, as the publisher.  D. Preserve all
the copyright notices of the Document.  E. Add an appropriate
copyright notice for your modifications adjacent to the other
copyright notices.  F. Include, immediately after the copyright
notices, a license notice giving the public permission to use the
Modified Version under the terms of this License, in the form shown in
the Addendum below.  G. Preserve in that license notice the full lists
of Invariant Sections and required Cover Texts given in the Document's
license notice.  H. Include an unaltered copy of this License.  I.
Preserve the section Entitled "History", Preserve its Title, and add
to it an item stating at least the title, year, new authors, and
publisher of the Modified Version as given on the Title Page. If there
is no section Entitled "History" in the Document, create one stating
the title, year, authors, and publisher of the Document as given on
its Title Page, then add an item describing the Modified Version as
stated in the previous sentence.  J. Preserve the network location, if
any, given in the Document for public access to a Transparent copy of
the Document, and likewise the network locations given in the Document
for previous versions it was based on. These may be placed in the
"History" section. You may omit a network location for a work that was
published at least four years before the Document itself, or if the
original publisher of the version it refers to gives permission.  K.
For any section Entitled "Acknowledgements" or "Dedications", Preserve
the Title of the section, and preserve in the section all the
substance and tone of each of the contributor acknowledgements and/or
dedications given therein.  L. Preserve all the Invariant Sections of
the Document, unaltered in their text and in their titles. Section
numbers or the equivalent are not considered part of the section
titles.  M. Delete any section Entitled "Endorsements". Such a section
may not be included in the Modified Version.  N. Do not retitle any
existing section to be Entitled "Endorsements" or to conflict in title
with any Invariant Section.  O. Preserve any Warranty Disclaimers.  If
the Modified Version includes new front-matter sections or appendices
that qualify as Secondary Sections and contain no material copied from
the Document, you may at your option designate some or all of these
sections as invariant. To do this, add their titles to the list of
Invariant Sections in the Modified Version's license notice. These
titles must be distinct from any other section titles.

You may add a section Entitled "Endorsements", provided it contains
nothing but endorsements of your Modified Version by various
parties--for example, statements of peer review or that the text has
been approved by an organization as the authoritative definition of a
standard.

You may add a passage of up to five words as a Front-Cover Text, and a
passage of up to 25 words as a Back-Cover Text, to the end of the list
of Cover Texts in the Modified Version. Only one passage of
Front-Cover Text and one of Back-Cover Text may be added by (or
through arrangements made by) any one entity. If the Document already
includes a cover text for the same cover, previously added by you or
by arrangement made by the same entity you are acting on behalf of,
you may not add another; but you may replace the old one, on explicit
permission from the previous publisher that added the old one.

The author(s) and publisher(s) of the Document do not by this License
give permission to use their names for publicity for or to assert or
imply endorsement of any Modified Version.

5. COMBINING DOCUMENTS

You may combine the Document with other documents released under this
License, under the terms defined in section 4 above for modified
versions, provided that you include in the combination all of the
Invariant Sections of all of the original documents, unmodified, and
list them all as Invariant Sections of your combined work in its
license notice, and that you preserve all their Warranty Disclaimers.

The combined work need only contain one copy of this License, and
multiple identical Invariant Sections may be replaced with a single
copy. If there are multiple Invariant Sections with the same name but
different contents, make the title of each such section unique by
adding at the end of it, in parentheses, the name of the original
author or publisher of that section if known, or else a unique number.
Make the same adjustment to the section titles in the list of
Invariant Sections in the license notice of the combined work.

In the combination, you must combine any sections Entitled "History"
in the various original documents, forming one section Entitled
"History"; likewise combine any sections Entitled "Acknowledgements",
and any sections Entitled "Dedications". You must delete all sections
Entitled "Endorsements."

6. COLLECTIONS OF DOCUMENTS

You may make a collection consisting of the Document and other
documents released under this License, and replace the individual
copies of this License in the various documents with a single copy
that is included in the collection, provided that you follow the rules
of this License for verbatim copying of each of the documents in all
other respects.

You may extract a single document from such a collection, and
distribute it individually under this License, provided you insert a
copy of this License into the extracted document, and follow this
License in all other respects regarding verbatim copying of that
document.

7. AGGREGATION WITH INDEPENDENT WORKS

A compilation of the Document or its derivatives with other separate
and independent documents or works, in or on a volume of a storage or
distribution medium, is called an "aggregate" if the copyright
resulting from the compilation is not used to limit the legal rights
of the compilation's users beyond what the individual works permit.
When the Document is included in an aggregate, this License does not
apply to the other works in the aggregate which are not themselves
derivative works of the Document.

If the Cover Text requirement of section 3 is applicable to these
copies of the Document, then if the Document is less than one half of
the entire aggregate, the Document's Cover Texts may be placed on
covers that bracket the Document within the aggregate, or the
electronic equivalent of covers if the Document is in electronic form.
Otherwise they must appear on printed covers that bracket the whole
aggregate.

8. TRANSLATION

Translation is considered a kind of modification, so you may
distribute translations of the Document under the terms of section 4.
Replacing Invariant Sections with translations requires special
permission from their copyright holders, but you may include
translations of some or all Invariant Sections in addition to the
original versions of these Invariant Sections. You may include a
translation of this License, and all the license notices in the
Document, and any Warranty Disclaimers, provided that you also include
the original English version of this License and the original versions
of those notices and disclaimers. In case of a disagreement between
the translation and the original version of this License or a notice
or disclaimer, the original version will prevail.

If a section in the Document is Entitled "Acknowledgements",
"Dedications", or "History", the requirement (section 4) to Preserve
its Title (section 1) will typically require changing the actual
title.

9. TERMINATION

You may not copy, modify, sublicense, or distribute the Document
except as expressly provided for under this License. Any other attempt
to copy, modify, sublicense or distribute the Document is void, and
will automatically terminate your rights under this License. However,
parties who have received copies, or rights, from you under this
License will not have their licenses terminated so long as such
parties remain in full compliance.

10. FUTURE REVISIONS OF THIS LICENSE

The Free Software Foundation may publish new, revised versions of the
GNU Free Documentation License from time to time. Such new versions
will be similar in spirit to the present version, but may differ in
detail to address new problems or concerns. See
http://www.gnu.org/copyleft/.

Each version of the License is given a distinguishing version
number. If the Document specifies that a particular numbered version
of this License "or any later version" applies to it, you have the
option of following the terms and conditions either of that specified
version or of any later version that has been published (not as a
draft) by the Free Software Foundation. If the Document does not
specify a version number of this License, you may choose any version
ever published (not as a draft) by the Free Software Foundation.

\pagebreak
\paragraph*{\bf Preface}
To the casual observer, \TeX\
is not a state-of-the-art typesetting system.
No flashy multilevel menus and interactive manipulation
of text and graphics dazzle the onlooker.
On a less superficial level, however, \TeX\ is a very sophisticated
program, first of all because of the ingeniousness of its
built-in algorithms for such things as paragraph breaking
and make-up of mathematical formulas, and
second because of its almost complete programmability.
The combination of these factors makes it possible for \TeX\
to realize almost every imaginable layout in a highly automated
fashion.

Unfortunately, it also means that \TeX\ has an
unusually large number of commands and parameters,
and that programming \TeX\ can be far from easy.
Anyone wanting to program in \TeX, and maybe
even the ordinary user, would seem to need two books:
a~tutorial that gives a first glimpse of the many
nuts and bolts of \TeX, and after that
a~systematic, complete reference manual.
This book tries to fulfil the latter function.
A~\TeX er who has already made a start
(using any of a number of introductory books
on the market)
should be able to use this book indefinitely thereafter.

In this volume the universe of \TeX\ is presented as
about forty different subjects, each in a separate
chapter.
Each chapter starts out with a list of control sequences
relevant to the topic of that chapter
and proceeds to treat the 
theory of the topic. 
Most chapters conclude with remarks and examples.

Globally, the chapters are ordered as follows. 
The chapters on basic mechanisms are first,
the chapters on text treatment and mathematics are next,
and finally there are some
chapters on output and aspects of \TeX's connections to
the outside world.
%
The book also contains a glossary of \TeX\
commands, tables,
and indexes by example, by control sequence, and by subject.
The subject index refers for most concepts to
only one page, where most of the information
on that topic can be found, as well as references
to the locations of related information.

This book does not treat any specific \TeX\ macro package.
Any parts of the plain format that are treated are those
parts that belong to the `core' of plain \TeX: they
are also present in, for instance, \LaTeX.
Therefore, most remarks about the plain format
are true for \LaTeX, as well as most other formats.
Putting it differently,
if the text refers to the plain format, this should be taken
as a contrast to pure \IniTeX, not to \LaTeX.
By way of illustration, occasionally macros from plain \TeX\
are explained that do not belong to the core.

\medskip\noindent
{\bf Acknowledgment}\nl
I am indebted to Barbara Beeton, Karl Berry, and Nico Poppelier,
who read previous versions of this book. Their comments
helped to improve the presentation.
Also I~would like to thank the participants of
the discussion lists \TeX hax, \TeX-nl, and {\tt comp.text.tex}.
Their questions and answers gave me much food for thought.
Finally, any acknowledgement in a book about \TeX\ ought to
include Donald Knuth for inventing \TeX\ in the
first place. This book is no exception.

\begin{flushright}
 Victor Eijkhout\\
 Urbana, Illinois, August 1991\\
 Knoxville, Tennessee, May 2001
\end{flushright}
\pagebreak

\chapter{The Structure of the \TeX\ Processor}

This book treats the various aspects of \TeX\ in chapters
that are concerned with relatively small, well-delineated,
topics. In this chapter, therefore, 
a global picture of the way \TeX\ operates will be given.
Of necessity, many details will be omitted here, but all of
these are treated in later chapters. On the other hand,
the few examples given in this chapter will be repeated
in the appropriate places later on; they are included here
to make this chapter self-contained.

%\point Four \TeX\ processors
\section{Four \TeX\protect\ processors}

The way \TeX\ processes its input can be viewed as
happening on four levels. One might  say that
the \TeX\ processor is split into four separate units,
each one accepting the output of the previous stage, and
delivering the input for the next stage. The input of
the first stage is then the \n{.tex} input file; the output
of the last stage is a \n{.dvi} file.

For many purposes it is most convenient, and most insightful,
to consider these four levels of processing as happening
after one another, each one accepting the {\em completed\/}
output of the previous level. In reality this is not true:
all levels are simultaneously
active, and there is interaction between them.

The four levels are (corresponding roughly
to the `eyes', `mouth', `stomach', and `bowels' respectively
in Knuth's original terminology) as follows.
\begin{enumerate}\item
The input processor. This is the piece of \TeX\ that
accepts input lines from the file system of whatever computer
\TeX\ runs on, and turns them into tokens.
Tokens are the internal objects of \TeX:
there are character tokens that constitute the typeset
text, and control sequence tokens that are commands 
to be processed by the next two levels.
\item The expansion processor. 
Some but not all of the tokens generated in the first level
\ldash macros, conditionals, and a number
of primitive \TeX\ commands \rdash  are subject to expansion.
Expansion is the process that replaces some (sequences of)
tokens by other (or no) tokens.
\item The execution processor. 
Control sequences that are not expandable are executable,
and this execution takes place on the third level of the
\TeX\ processor.

One part of the activity here concerns changes to
\TeX's internal state: assignments (including
macro definitions) are typical activities in this
\awp
category. The other major thing happening on this level
is the construction of horizontal, vertical, and
mathematical lists.
\item The visual processor. 
In the final level of processing
the visual part of \TeX\ processing is performed. Here
horizontal lists are broken into paragraphs, 
vertical lists are broken into pages,
and  formulas are built out of math lists. 
Also the output to the \n{dvi} file takes place on this level.
The algorithms working here are not accessible to the user,
but they can be influenced by a number of parameters.
\end{enumerate}

%\point The input processor
\section{The input processor}

The input processor of \TeX\ is that part of \TeX\ that
translates whatever characters it gets from the input file
into tokens. The output of this processor is a stream
of tokens: a token list. Most tokens fall into one of two categories:
character tokens and control sequence tokens. 
The remaining category is that of the parameter tokens;
these will not be treated in this chapter.

%\spoint Character input
\subsection{Character input}

For simple input text, characters are made into
character tokens. However, \TeX\ can ignore input characters:
a row of spaces in the input is usually equivalent to just one
space. Also, \TeX\ itself can insert tokens that do not correspond
to any character in the input, for instance the space token
at the end of the line, or the \cs{par} token after an empty line.

Not all character tokens signify characters to be typeset.
\altt
Characters fall into sixteen categories \ldash each one
specifying a certain function that a character can have \rdash 
of which only two contain the characters that will be
typeset. The other categories contain such characters 
as~\n{\char`\{}, \n{\char`\}}, 
\n\&, and~\n\#. A~character token can be considered
as a pair of numbers: the character code \ldash typically the \ascii\
code \rdash  and the category code.
It is possible to change
the category code that is associated with a particular
character code.

When the escape character (by default~\cs{}$\,$) appears in the input,
\TeX's behaviour in forming tokens is more complicated. 
Basically,
\TeX\ builds a control sequence by taking a number of characters
from the input and lumping them together into a single token.

The behaviour with which \TeX's input processor 
reacts to category codes can be described
as a machine that switches between three internal states:
$N$,~new line; $M$,~middle of line; $S$,~skipping spaces.
These states and the transitions between them are treated
in Chapter~\ref{mouth}.

%\spoint Two-level input processing
\subsection{Two-level input processing}

\TeX's input processor is in fact itself a two-level processor.
Because of limitations of the terminal, the editor, or the operating
\awp
system, the user may not be able to input certain desired characters.
Therefore, \TeX\ provides a mechanism to access
with two superscript characters all of the available character
positions. This may be considered
a separate stage of \TeX\ processing, taking place
prior to the three-state machine mentioned above.

For instance, the sequence \verb>^^+> is replaced by~\n{k} because
the \ascii{} codes of \n k and \n + differ by~64. 
Since this replacement takes place before tokens are formed,
writing \verb>\vs^^+ip 5cm> has the same effect as
\verb>\vskip 5cm>. Examples more useful than this exist.

Note that this first stage is a transformation from
characters to characters, without considering category
codes. These come into play only in the second phase
of input processing where characters are converted
to character tokens by coupling the category code
to the character code.

%\point The expansion processor
\section{The expansion processor}

\TeX's  expansion processor accepts a stream of tokens
and, if possible, 
expands the tokens in this stream one by one
until only unexpandable tokens remain.
Macro expansion is the clearest example of this:
if a control sequence is a macro name, it is replaced
(together possibly with parameter tokens) by 
the definition text of the macro.

Input for the expansion processor is provided mainly
by the input processor. The stream of tokens coming
from the first stage of \TeX\ processing is subject
to the expansion process, and the result is a stream
of unexpandable tokens which is fed to the execution processor.

However, the expansion processor comes into play 
also when (among others) an \cs{edef} or \cs{write} is processed.
The parameter token list of these commands is
expanded very much as if the lists had been
on the top level, instead of the argument to a command.

%\spoint The process of expansion
\subsection{The process of expansion}

Expanding a token consists of the following steps:
\begin{enumerate}
\item See whether the token is expandable. 
\item If the token is unexpandable, pass it to the token
      list currently being built, and take on the next token. 

\item If the token is expandable, replace it by its expansion.
      For macros without parameters, and a few primitive commands
      such as \cs{jobname}, this is indeed a simple replacement.
      Usually, however, \TeX\ needs to absorb some argument tokens from
      the stream in order to be able to form the replacement
      of the current token.
      For instance, if the token was a macro with parameters,
      sufficiently many tokens need to be absorbed to form
      the arguments corresponding to  these parameters.

\item Go on expanding, starting with the first token of the
      expansion. 
\end{enumerate}
%
Deciding whether a token is expandable is
a simple decision. Macros and active characters, 
conditionals, and a number of primitive \TeX\ commands
\awp
(see the list on page~\pageref{expand:lijst})
are expandable, other tokens are not.
Thus the expansion processor replaces macros by their expansion,
it evaluates conditionals and eliminates any irrelevant parts of 
these, but tokens such as \cs{vskip} and character tokens,
including characters such as dollars and braces, are passed untouched.
%\endinput
%\spoint Special cases: \cs{expandafter}, \cs{noexpand}, and \cs{the}
\subsection{Special cases: \cs{expandafter}, \cs{noexpand}, and \cs{the}}

As stated above,
after a token has been expanded, \TeX\ will start expanding
the resulting tokens. At first sight the \cs{expandafter}
command would seem to be an exception to this rule, because
it expands only one step. What actually happens is that
the sequence \begin{disp}\cs{expandafter}\gr{token$_1$}\gr{token$_2$}\end{disp}
is replaced by 
\begin{disp}\gr{token$_1$}\gr{\italic expansion of token$_2$}\end{disp}
and this replacement is in fact reexamined by the expansion
processor.

Real exceptions do exist, however. If the 
current token is the \cs{noexpand} command, the next
token is considered for the moment to be unexpandable:
it is handled as if it were \cs{relax}, and it is
passed to the token list being built.

For example,
in the macro definition
\begin{verbatim}
\edef\a{\noexpand\b}
\end{verbatim}
the replacement text \verb>\noexpand\b> is expanded at definition 
time. The expansion of \cs{noexpand} is the next token, with
a temporary meaning of \cs{relax}. Thus, when the expansion
processor tackles the next token, the~\cs{b}, it will consider
that to be unexpandable, and just pass it to the token list
being built, which is the replacement text of the macro.

Another exception is that the tokens
resulting from \cs{the}\gr{token variable}
are not expanded further if this statement occurs
inside an \cs{edef} macro definition.

%\spoint Braces in the expansion processor
\subsection{Braces in the expansion processor}

Above, it was said that braces are passed as unexpandable
character tokens. In general this is true. For instance,
the \cs{romannumeral} command is handled by the expansion
processor; when confronted with 
\begin{verbatim}
\romannumeral1\number\count2 3{4 ...
\end{verbatim} 
\TeX\ will expand until the brace is encountered:
if \cs{count2} has the value of zero, the result will be
the roman numeral representation of~\n{103}.

As another example, \begin{verbatim}
\iftrue {\else }\fi
\end{verbatim}
is handled by the expansion processor 
completely analogous to
\begin{disp}\cs{iftrue} {\italic a}\cs{else} {\italic b}\cs{fi}\end{disp}
\awp
The result is a character token, independent of its category.

However, in the context of macro expansion 
the expansion  processor will 
recognize braces. 
First of all, a balanced pair of braces marks off a group of tokens
to be passed as one argument.
If a macro has an argument \begin{verbatim}
\def\macro#1{ ... }
\end{verbatim}
one can call it with a single token, as in
\begin{verbatim}
\macro 1 \macro \$
\end{verbatim}
or with a group of tokens, surrounded by braces
\begin{verbatim}
\macro {abc} \macro {d{ef}g}
\end{verbatim}


Secondly, when the arguments for a macro with
parameters are read, no expressions with unbalanced braces
are accepted. In 
\begin{verbatim}
\def\a#1\stop{ ... }
\end{verbatim}
the argument consists of all
tokens up to the first occurrence of \cs{stop}
that is not in braces: in
\begin{verbatim}
\a bc{d\stop}e\stop
\end{verbatim}
the argument of~\cs{a} is \verb>bc{d\stop}e>.
Only balanced expressions
are accepted here.

%\point The execution processor
\section{The execution processor}

The execution processor builds lists: horizontal, vertical,
and math lists. Corresponding to these lists, it works
in horizontal, vertical, or math mode. Of these three modes
`internal' and `external' variants exist.
In addition to building lists, this part of the \TeX\ processor
also performs mode-independent processing, such as
assignments.

Coming out of the expansion processor is a stream of
unexpandable tokens to be processed by
the execution processor. 
\relax From the point of view of the execution processor, this
stream contains two types of tokens:
\begin{itemize}
\item Tokens signalling an assignment (this includes
      macro definitions), and
      other tokens signalling actions
      that are independent of the mode, such
      as \cs{show} and \cs{aftergroup}.
\item Tokens that build lists:
      characters, boxes, and glue. The way they are handled
      depends on the current mode.
\end{itemize}

Some objects can be used in any mode; for instance boxes
can appear in horizontal, vertical, and math lists.
The effect of such an object will of course still depend on the mode.
Other objects are  specific for one mode.
For instance, characters (to be more precise:
character tokens of categories 11 and~12), 
are intimately connected to horizontal mode:
if the execution processor 
is in vertical mode when it encounters a character, it will
switch to horizontal mode.

Not all character tokens signal characters to be typeset:
the execution processor can also encounter math shift
\awp
characters (by default~\n{\char`\$}) and beginning/end of group
characters (by default \n{\char`\{} and~\n{\char`\}}).
Math shift characters let \TeX\ enter or exit
math mode, and braces let it enter or exit a~new level of
grouping.

One control sequence handled by the execution processor 
deserves special mention: \cs{relax}.
This control sequence is not expandable, but the execution
is to do nothing. Compare the effect of \cs{relax} in
\begin{verbatim}
\count0=1\relax 2
\end{verbatim}
with that of \cs{empty}
defined by \begin{verbatim}
\def\empty{}
\end{verbatim}
in 
\begin{verbatim}
\count0=1\empty 2
\end{verbatim}
In the first case the expansion
process that is forming the number stops at \cs{relax} and
the number {\tt 1} is assigned; in the second case 
\cs{empty} expands to nothing, so {\tt 12} is assigned.

%\point The visual processor
\section{The visual processor}

\TeX's output processor encompasses those algorithms that
are outside direct user control: paragraph breaking,
alignment, page breaking, math typesetting, and \n{dvi} file
generation. Various parameters control the operation
of these parts of \TeX.

Some of these algorithms return their results in a form that
can be handled by the execution processor. For instance,
a paragraph that has been broken into lines is added to
the main vertical list as a sequence of horizontal boxes
with intermediate glue and penalties. Also, the page breaking
algorithm stores its result in \cs{box255}, so output
routines can dissect it. On the other hand, a math formula
can not be broken into pieces, and, naturally, 
shipping a box to the \n{dvi} file is irreversible.

%\point Examples
\section{Examples}

%\spoint Skipped spaces
\subsection{Skipped spaces}

Skipped spaces provide an illustration of the view that
\TeX's levels of processing accept the completed input
of the previous level. Consider the commands
\begin{verbatim}
\def\a{\penalty200}
\a 0
\end{verbatim} 
This is {\italic not\/} equivalent to
\begin{verbatim}
\penalty200 0
\end{verbatim} 
\awp
which would place a penalty of \n{200}, and
typeset the digit~\n0. Instead it expands to
\begin{verbatim}
\penalty2000
\end{verbatim}
because the space after \cs{a} is skipped in the
input processor. Later stages of processing then receive
the sequence \begin{verbatim}
\a0
\end{verbatim}

%\spoint Internal quantities and their representations
\subsection{Internal quantities and their representations}

\TeX\ uses various sorts of internal quantities,
such as integers and dimensions. These internal
quantities have an external representation,
which is a string of characters, such as 
\n{4711} or~\n{91.44cm}.

Conversions between the internal value and the external
representation take place on two different levels,
depending on what direction the conversion goes.
A~string of characters is converted to an internal
value in assignments such as
\begin{verbatim}
\pageno=12 \baselineskip=13pt
\end{verbatim}
or statements such as
\begin{verbatim}
\vskip 5.71pt
\end{verbatim}
and all of these statements are handled by the execution
processor.

On the other hand, the conversion of the internal
values into a representation as a string of
characters is handled by the expansion processor.
For instance, \begin{verbatim}
\number\pageno \romannumeral\year
\the\baselineskip
\end{verbatim}
are all processed by expansion.

As a final example, suppose \verb>\count2=45>, and
consider the statement
\begin{verbatim}
\count0=1\number\count2 3
\end{verbatim}
The expansion processor tackles \verb>\number\count2>
to give the characters \n{45}, and the space after
the \n 2 does not end the number being assigned:
it only serves as a delimiter
of the number of the \cs{count} register.
In the next stage of processing, the execution processor
will then see the statement
\begin{verbatim}
\count0=1453
\end{verbatim}
and execute this.

%\endinput

%%%% end of input file [bigpic]

%\InputFile:mouth
%%%% this is input file [mouth]
%\tracingmacros=2 \tracingcommands\tracingmacros
%\subject[mouth] Category Codes \nl and Internal States
\endofchapter
\chapter{Category Codes and Internal States}\label{mouth}

When characters are read, 
\TeX\ assigns them
category codes. The reading mechanism has three internal
states, and transitions between these states are effected
by category codes of characters in the input.
This chapter describes how \TeX\ reads its input and
how the category codes of characters influence the
reading behaviour. Spaces and line ends are discussed.

\begin{inventory}
\item [\cs{endlinechar}]  
      The character code of the end-of-line character 
      appended to input lines.
      \IniTeX\ default:~13.
\item [\cs{par}]  
      Command to close off a paragraph and go into vertical mode.
      Is generated by empty lines.

\item [\cs{ignorespaces}]   
      Command that reads and expands until something is
      encountered that is not a \gr{space token}.

\item [\cs{catcode}] 
      Query or set category codes.

\item [\cs{ifcat}]  
      Test whether two characters have the same category code.

\item [\cs{\char32}]
      Control space.
      Insert the same amount of space that a space token would
      when \cs{spacefactor}${}=1000$.

\item [\cs{obeylines}]
      Macro in plain \TeX\ to make line ends significant.

\item [\cs{obeyspaces}]
      Macro in plain \TeX\ to make (most) spaces significant.
\end{inventory}

%\point Introduction
\section{Introduction}

\TeX's input processor scans input lines from a file or terminal, and
makes tokens out of the characters.
The input processor can be viewed as
a simple finite state automaton with three internal states; 
depending on the state its scanning behaviour may differ.
This automaton will be treated here both from the point of view of the
internal states and of the category codes governing the
transitions.

%\point Initial processing
\section{Initial processing}

Input from a file (or from the user terminal, but this
will not be mentioned specifically
most of the time) is handled one line at a time.
Here follows a discussion of what exactly is an input line
for \TeX.

Computer systems differ with respect to 
\term line! input\par\term line! end\par\term machine independence\par
the exact definition of an input
\mdqon
line. The carriage return/""line feed
\mdqoff
\awp
\message{slash-dash}%
sequence terminating a line is most common,
but some systems use just a line feed, and
some systems with fixed record length (block) storage do not have
a line terminator at all. Therefore \TeX\ has its
own way of terminating an input line.

\begin{enumerate}
\item An input line is read from an input file  (minus the
line terminator, if any).
\item Trailing spaces are removed (this is for the systems
with block storage, and it prevents confusion because these
spaces are hard to see in an editor).
\item The \cstoidx endlinechar\par, by default \gram{return}
(code~13) is appended.
If the value of \cs{endlinechar} is negative
\label{append:elc}%
or more than~255 (this was 127 in versions of \TeX\ older
than version~3; see page~\pageref{2vs3} for more differences),
no character is appended. 
The effect then is the same as
if the line were to end with a comment character.
\end{enumerate}


Computers may also differ in the character encoding
(the most common schemes are \ascii{} and \ebcdic{}), so \TeX\
converts the characters that are read from the file to its
own character codes. These codes are then used exclusively,
so that \TeX\ will perform the same on any system.
For more on this, see Chapter~\ref{char}.

%\point Category codes
\section{Category codes}

Each of the 256 character codes (0--255) has an
\term category codes\par
associated category code, though not necessarily always the same one.
There are 16 categories, numbered 0--15. 
When scanning the input, \TeX\
thus forms character-code--category-code pairs.
The input processor sees only these pairs; from them are formed
character tokens, control sequence tokens, and parameter tokens.
These tokens are then passed to \TeX's expansion and execution
processes.

A~character token is a character-code--category-code
pair that is passed unchanged.
A~control sequence token consists of one or more characters
preceded by an escape character; see below.
Parameter tokens are also explained below.

This is the list of the categories, together with a brief
description. More elaborate explanations follow in this and
later chapters.
\begin{enumerate} \message{set counter}%\SetCounter:item=-1
\setcounter{enumi}{-1}
\item\label{ini:esc} Escape character; this signals the start of a control
      sequence. \IniTeX\ makes the backslash \verb-\- (code~92)
      an escape character.
\item Beginning of group; such a character causes \TeX\ to enter a new
      level of grouping. The plain format makes the open brace \verb-{-
\mdqon
      a beginning"-of-group character.
\mdqoff
\item End of group; \TeX\ closes the current level of grouping.
      Plain \TeX\ has  the closing brace \verb-}- as end-of-group
      character.
\item Math shift; this is the opening and closing delimiter for
      math formulas. Plain \TeX\ uses the dollar sign~\verb-$-
      for this.
\item Alignment tab; the column (row) separator in tables
      made with \cs{halign} (\cs{valign}). In plain
      \TeX\ this is the ampersand~\verb-&-.
\item\label{ini:eol} End of line; a character that \TeX\ considers
      to signal the
      end of an input line.
      \IniTeX\ assigns this code to the \gram{return}, that is, code~13.
      Not coincidentally, 13~is also the value that \IniTeX\
      assigns to the \cs{endlinechar} parameter; see above.
\awp
\item Parameter character; this indicates parameters for macros.
      In plain \TeX\ this is the hash sign~\verb-#-.
\item Superscript; this precedes superscript expressions 
      in math mode. It is also used to denote character
      codes that cannot
      be entered in an input file; see below. 
      In plain \TeX\ this is the circumflex~\verb-^-.
\item Subscript; this precedes subscript expressions in math mode.
      In plain \TeX\ the underscore~\verb-_- is used for this.
\item Ignored; characters of this category are removed
      from the input, and have therefore no influence on
      further \TeX\ processing. In plain \TeX\ this is
      the \gr{null} character, that is, code~0.
\item\label{ini:sp} Space; space characters receive special treatment.
      \IniTeX\ assigns this category to the \ascii{} \gr{space}
      character, code~32.
\item\label{ini:let} Letter; in \IniTeX\ only the characters \n{a..z}, \n{A..Z}
      are in this category. Often, macro packages make
      some `secret' character (for instance~\n@) into a letter.
\item\label{ini:other} Other; \IniTeX\ puts everything that is 
      not in the other categories into this category. Thus
      it includes, for instance, digits and punctuation.
\item Active; active characters function as a \TeX\ command,
      without being preceded by an escape character.
      In plain \TeX\ this is only the tie character~\verb-~-,
      which is defined to produce
      an unbreakable space; see page~\pageref{tie}.
\item\label{ini:comm} Comment character; from a comment character onwards,
      \TeX\ considers the rest of an input line to be
      comment and ignores it. In \IniTeX\ the  per cent sign \verb-%-
      is made a comment character.
\item\label{ini:invalid} Invalid character; this category is for characters that
      should not appear in the input. \IniTeX\ assigns the
      \ascii\ \gr{delete} character, code~127, to this category.
\end{enumerate}

The user can change the mapping 
of character codes to category codes
with the \cstoidx catcode\par\ command (see Chapter~\ref{gramm}
for the explanation of concepts such as~\gr{equals}):
\begin{disp}\cs{catcode}\gram{number}\gr{equals}\gram{number}.\end{disp}
In such a statement, the first number is often given in the form
\begin{disp}\verb>`>\gr{character}\quad or\quad \verb>`\>\gr{character}\end{disp}
both of which denote the character code of the character
(see pages \pageref{char:code} and~\pageref{int:denotation}).

The plain format defines
\csterm active\par
\begin{verbatim}
\chardef\active=13
\end{verbatim} 
so that one can write statements such as
\begin{verbatim}
\catcode`\{=\active
\end{verbatim}
The \cs{chardef} command is  treated
on pages \pageref{chardef} and~\pageref{num:chardef}.

The \LaTeX\ format has the control sequences
\begin{verbatim}
\def\makeatletter{\catcode`@=11 }
\def\makeatother{\catcode`@=12 }
\end{verbatim}
in order to switch on and off the `secret' character~\n@
(see below).
\awp

The \cs{catcode} command can also be used to query category
codes: in \begin{verbatim}
\count255=\catcode`\{
\end{verbatim}
it yields a number, which can be assigned.

Category codes can be tested by
\begin{disp}\cs{ifcat}\gr{token$_1$}\gr{token$_2$}\end{disp}
\TeX\ expands whatever is after \cs{ifcat} until two 
unexpandable tokens are found; these are then compared
with respect to their category codes. Control sequence
tokens are considered to have category code~16,
which makes them all equal to each other, and unequal to
all character tokens.
Conditionals are treated further in Chapter~\ref{if}.

%\point From characters to tokens
\section{From characters to tokens}

The input processor
of \TeX\ scans input lines from a file or from the
user terminal, and converts the characters in the input
to tokens. There are three types of tokens.
\begin{itemize}\item Character tokens: any character that is
	passed on its own to \TeX's
further levels of processing with an appropriate
category code attached.
\item Control sequence tokens, of which there are two kinds:
	an escape character 
\ldash that is,\message{ldash nobreak?}
a character of category~0 \rdash  followed
by a string of `letters' is
lumped together into a {\em control word}, which is a single token.
An escape character followed by a single character that is not of
category~11, letter, is made into a 
{\em control symbol}\term control! symbol\par.
If the distinction between control word and control symbol is
irrelevant, both are called 
{\em control sequences}\term control! sequence\par.

The control symbol that results from an escape character followed
\csterm \char32\par
by a space character is called 
{\em control space}\term control! space\par.

\item Parameter tokens: a parameter character 
	\ldash that is, a character of category~6, by default~\verb=#= \rdash 
followed by a digit \n{1..9} is replaced by a parameter token.
Parameter tokens are allowed only in the context of
macros (see Chapter~\ref{macro}).

A macro parameter character followed by another macro parameter
character (not necessarily with the same character code)
is replaced by a single character token.
This token has category~6 (macro parameter), and the character
code of the second parameter character.
The most common instance is of this is
replacing \n{\#\#} by~\n{\#$_6$}, where the subscript
denotes the category code.

\end{itemize}

%\point[input:states] The input processor as a finite state automaton
\section{The input processor as a finite state automaton}
\label{input:states}

\TeX's input processor can be considered to be a finite state 
automaton with three internal states,
that is, at any moment in time it is in one of three states,
\term state! internal\par
and after transition to another state there is no memory of the
\awp
previous states. 

%\spoint State {\italic N}: new line
\subsection{State {\italic N}: new line}

State {\italic N} is entered at the beginning of each new input line,
and that is the only time \TeX\ is in this state.
In state~{\italic N} all space tokens (that is, characters of category~10)
are ignored; an end-of-line character is converted
into a \cs{par} token.
All other tokens bring \TeX\ into state~{\italic M}.

%\spoint State {\italic S}: skipping spaces
\subsection{State {\italic S}: skipping spaces}

State {\italic S} is entered in any mode after a control word or
control space (but after no other control symbol),
or, when in state~{\italic M}, after a space.
In this state all subsequent spaces or end-of-line characters
in this input line are discarded.

%\spoint State {\italic M}: middle of line
\subsection{State {\italic M}: middle of line}

By far the most common state is~{\italic M}, `middle of line'.
It is entered after characters of categories
1--4, 6--8, and 11--13, and after control symbols
other than control space.
An end-of-line character encountered in this state
results in a space token.

\input figflow \message{left align flow diagram}
\vskip12pt plus 1pt minus 4pt\relax %before spoint skip
\begin{tdisp}%\PopIndentLevel
\leavevmode\relax
%\figmouth
\message{fig mouth missing}
\end{tdisp}


%\point[hathat] Accessing the full character set
\section{Accessing the full character set}
\label{hathat}

Strictly speaking, \TeX's input processor
is not a finite state automaton.
This is because during the scanning of the input line
all trios consisting of two {\sl equal\/} superscript characters 
\term \char94\char94\ replacement\par
(category code~7) and a subsequent character
(with character code~$<128$)
are replaced by a single character with a character
code in the range 0--127,
differing by 64 from that of the original character.

This mechanism can be used, for instance, to access positions in a font
corresponding to character codes that cannot
be input, for instance because they are \ascii{} control characters.
The most obvious examples are the \ascii{} \gr{return}
and \gr{delete} characters; the corresponding 
positions 13 and 127 in a font are
accessible as \verb>^^M> and~\verb>^^?>.
However, since the category of \verb>^^?> is 15, invalid,
that has to be changed before character 127 can be accessed.
\awp

In \TeX3 this mechanism has been 
modified and extended to access 256 characters:
any quadruplet \verb-^^xy- where both \n x and \n y are lowercase
hexadecimal digits \n0--\n9, \n a--\n f, 
is replaced by a character in the
range 0--255, namely the character the number of which is
represented hexadecimally as~\n{xy}.
This imposes a slight restriction on the applicability
of the earlier mechanism: if, for instance, \verb>^^a>
is typed to produce character~33, then a following
\n0--\n9, \n{a}--\n{f} will be misunderstood.

While this process makes \TeX's input processor
somewhat more powerful
than a true finite state automaton,
it does not interfere with the rest of
the scanning. Therefore it is conceptually simpler to pretend that
such a replacement of triplets or quadruplets
of characters, starting with~\verb>^^>, is performed in advance. 
In actual practice this is not possible,
because an
input line may assign category code~7 to some 
character other than the circumflex, thereby 
influencing its further processing.


%\point Transitions between internal states
\section{Transitions between internal states}

Let us now discuss the effects on the internal state
of \TeX's input processor when
certain category codes are encountered in the input. 

%\spoint 0: escape character
\subsection{0: escape character}

When an escape character is encountered\term character !escape\par,
\TeX\ starts forming a control sequence token.
Three different types of control sequence can result,
depending on the category code of the character that
follows the escape character.

\begin{itemize}\item
If the character following the escape is of category~11,
letter, then \TeX\ combines the escape,
that character and all following
characters of category~11, into a control word.
After that \TeX\
goes into state~{\italic S}, skipping spaces.
\item
With a character of category~10, space,
a control symbol called control space results, 
and \TeX\ goes into state~{\italic S}.
\item
With a character of any other category code 
a control symbol results, and \TeX\ goes into state~{\italic M},
middle of line.
\end{itemize}

The letters of a control sequence name have to be all on one line;
a control sequence name is not continued on the next line
if the current line ends with a comment sign, or if (by letting
\cs{endlinechar} be outside the range~0--255) 
there is no terminating character.

%\spoint 1--4, 7--8, 11--13: non-blank characters
\subsection{1--4, 7--8, 11--13: non-blank characters}

Characters of category codes 1--4, 7--8, and 11--13 are made
into tokens, and \TeX\ goes into state~{\italic M}.

%\spoint 5: end of line
\subsection{5: end of line}

Upon encountering an end-of-line character, 
\TeX\ discards the rest of the
line, and starts processing the next line,
in state~{\italic N}. If the current state was~{\italic N},
\awp
that is, if the
line so far contained at most spaces, a~\cs{par} token
is inserted; if the state was~{\italic M}, a~space token is inserted,
and in state~{\italic S} nothing is inserted.

Note that by `end-of-line character' a character with category
code~5 is meant. This is not necessarily the \cs{endlinechar},
nor need it appear at the end of the line.
See below for further remarks on line ends.

%\spoint 6: parameter
\subsection{6: parameter}

Parameter characters \ldash usually~\verb=#= \rdash  can be
\term character !parameter\par
followed by either a digit \n{1..9} 
in the context of macro definitions
\altt
or by another parameter character. 
In the first case a `parameter token' results,
in the second case only a single parameter character
is passed on as a character token for further processing.
In either case \TeX\ goes into state~{\italic M}.

A parameter character can also appear on its own in an
alignment preamble (see Chapter~\ref{align}).

%\spoint 7: superscript
\subsection{7: superscript}

A superscript character is handled like most non-blank
characters, except in the case where it is followed
by a  superscript character of the same character code.
The process
that replaces these two characters plus the following character
(possibly two characters in \TeX3) by another character
was described above.

%\spoint 9: ignored character
\subsection{9: ignored character}

Characters of category 9 are ignored; \TeX\ remains in the same state.

%\spoint 10: space
\subsection{10: space}

A token with category code 10 \ldash this is called a \gr{space token},
irrespective of the character code \rdash 
is ignored in states {\italic N} and~{\italic S} 
(and the state does not change); 
in state~{\italic M} \TeX\ goes into state~{\italic S}, inserting
a token that has category~10 and character code~32 
(\ascii{} space)\term character !space\par,
that is, the character code of the space token may change
from the character that was actually input.

%\spoint 14: comment
\subsection{14: comment}

A comment character causes \TeX\ to discard 
the rest of the line, including the comment character.
In particular, the end-of-line character is not seen,
so even if the comment was encountered in state~{\italic M}, no space
token is inserted.

%\spoint 15: invalid
\subsection{15: invalid}

Invalid characters cause an error message. \TeX\ remains in
the state it was in.
However, in the context of a control symbol an invalid character
is acceptable. Thus \verb>\^^?> does not cause any error messages.
\awp

%\point[cat12] Letters and other characters
\section{Letters and other characters}
\label{cat12}

In most programming languages identifiers can consist
of both letters and digits (and possibly some other
character such as the underscore), but control sequences in \TeX\
are only allowed to be formed out of characters of category~11,
letter. Ordinarily, the digits and punctuation symbols have
category~12, other character.
However, there are contexts where \TeX\ itself
generates a string of characters, all of which have
category code~12, even if that is not their usual
category code.

This happens when the operations 
\cs{string},
\cs{number},
\cs{romannumeral},
\cs{jobname},
\cs{fontname},
\cs{meaning},
and \cs{the}
are used to generate a stream of character tokens.
If any of the characters delivered by such a command
is a space character (that is, character code~32), 
it receives category code~10, space.

For the extremely rare case where a hexadecimal digit has been
hidden in a control sequence, \TeX\ allows \n A$_{12}$--\n F$_{12}$
to be hexadecimal digits, in addition to the ordinary
\n A$_{11}$--\n F$_{11}$ (here
the subscripts denote the category codes).

For example,
\begin{disp}\verb>\string\end>\quad gives four character tokens\quad
\n{\char92$_{12}$e$_{12}$n$_{12}$d$_{12}$} \end{disp}
Note that
\n{\char92$_{12}$}\term character !escape\par\label{use:escape}
is used in the output only because the
value of \cs{escapechar} is the character code for the
backslash. Another value of \cs{escapechar} leads to another
character in the output of \cs{string}. 
The \cs{string} command is treated further in Chapter~\ref{char}.

Spaces can wind up in control sequences:
\begin{disp}\verb>\csname a b\endcsname>\end{disp} gives a control sequence
token in which one of the three characters is a space.
Turning this control sequence token into a string of characters
\begin{disp}\verb>\expandafter\string\csname a b\endcsname>\end{disp}
gives \n{\char92$_{12}$a$_{12}$\char32$_{10}$b$_{12}$}.


As a more practical example, suppose there exists a sequence
of input files \n{file1.tex}, \n{file2.tex}\label{ex:jobnumber},
and we want to
write a macro that finds the number of the input file
that is being processed. One approach would be to write
\begin{verbatim}
\newcount\filenumber  \def\getfilenumber file#1.{\filenumber=#1 }
\expandafter\getfilenumber\jobname.
\end{verbatim}
where the letters \n{file} in the parameter text of the
macro (see Section~\ref{param:text}) absorb that part of the
jobname, leaving the number as the sole parameter.

However, this is slightly incorrect: the letters \n{file} resulting
from the \cs{jobname} command have category code~12, instead of
11 for the ones in the definition of \cs{getfilenumber}.
This can be repaired as follows:
\begin{verbatim}
{\escapechar=-1
 \expandafter\gdef\expandafter\getfilenumber
       \string\file#1.{\filenumber=#1 }
}
\end{verbatim}
\awp
Now the sequence \verb>\string\file> gives the four
letters \n{f$_{12}$i$_{12}$l$_{12}$e$_{12}$}; 
the \cs{expandafter} commands let this be executed prior to
the macro definition;
the backslash is omitted because we put \verb>\escapechar=-1>.
Confining this value to a group makes it necessary to use~\cs{gdef}.


%\global\def\pppar.{\par}
%\point The \lowercase{\n{\char92par}} token
\section{The \lowercase{\n{\char92par}} token}

\TeX\ inserts a \cstoidx par\par\ token into the input after
\term line !empty\par
encountering a character with category code~5,
end of line, in state~{\italic N}.
It is good to realize when exactly this happens:
since \TeX\ leaves state~{\italic N}
when it encounters any token but a space,
a~line giving a \cs{par} can only contain characters
of category~10. In particular, it cannot end with a comment
character. Quite often this fact is used the other way around:
if an empty line is wanted for the layout of the input
one can put a comment sign on that line.


Two consecutive empty lines generate two \cs{par} tokens.
For all practical purposes this is equivalent to one \cs{par},
because after the first one \TeX\ enters vertical mode, and
in vertical mode a \cs{par} only
exercises the page builder,
and clears the paragraph shape parameters.

A \cs{par} is also inserted into the input when \TeX\ sees a
\gram{vertical command} in unrestricted horizontal mode.
After the \cs{par} has been read and expanded, the
vertical command is examined anew (see Chapters~\ref{hvmode}
and~\ref{par:end}).

The \cs{par} token may also be inserted by the \cs{end}
command that finishes off the run of \TeX; see Chapter~\ref{output}.

It is important to realize that \TeX\ does what it normally does
when encountering an empty line
(which is ending a paragraph)
only because of the default definition of the \cs{par} token.
By redefining \cs{par} the behaviour
caused by empty lines and vertical commands can be changed completely,
and  interesting special effects can be achieved.
In order to continue to be able  to cause the actions normally
associated with \cs{par}, the synonym \cs{endgraf} is
available in the plain format. See further Chapter~\ref{par:end}.

The \cs{par} token is not allowed to be part of a macro
argument, unless the macro has been declared to be \cs{long}.
A \cs{par} in the argument of a non-\cs{long} macro
prompts \TeX\ to give a `runaway argument' message.
Control sequences that have been \cs{let} to \cs{par}
(such as \cs{endgraf}) are allowed, however.

%\point Spaces
\section{Spaces}

This section treats some of the aspects of
\term token !space\par
space characters and space tokens in the initial processing
stages of \TeX. The topic of spacing in text typesetting
is treated in Chapter~\ref{space}.


%\spoint Skipped spaces
\subsection{Skipped spaces}

From the discussion of the internal states of \TeX's 
input processor
it is clear that some spaces in the input never reach the
\awp
output; in fact they never get past the input processor.
These are for instance the spaces at the beginning
of an input line, and the spaces following the one
that lets \TeX\ switch to state~{\italic S}.


On the other hand, line ends can generate spaces (which are not
in the input) that may wind up in the output.
There is a third kind of space: the spaces that get past the
input processor,
or are even generated there, but still do not wind up in the
output. These are the \gram{optional spaces} that the 
syntax of \TeX\ allows in various places.

%\spoint Optional spaces
\subsection{Optional spaces}

The syntax of \TeX\ has the concepts of `optional spaces'
\term space! optional \par
and `one optional space':
\begin{disp}\gr{one optional space} $\longrightarrow$
\gr{space token} $|$ \gr{empty}\nl
\gr{optional spaces} $\longrightarrow$
\gr{empty} $|$ \gr{space token}\gr{optional spaces}\end{disp}
In general, \gr{one optional space} is allowed after
numbers and glue specifications, while \gr{optional spaces} are
allowed whenever a space can occur inside a number
(for example, between a minus sign and the digits of the number)
or glue specification (for example, between \n{plus} and \n{1fil}).
Also, the definition of \gr{equals} allows \gr{optional spaces}
before the \n= sign.

Here are some examples of optional spaces.

\begin{itemize} 
\item A number can be delimited by \gr{one optional space}. 
This prevents accidents (see Chapter~\ref{number}), 
and it speeds up processing, as \TeX\ can 
detect more easily where the \gram{number} being read ends.
Note, however, that not every `number' is a \gram{number}:
for instance the {\tt 2} in \cs{magstep2} is not a number,
but the  single token that is the parameter of the
\cs{magstep} macro. Thus a space or line end after this
is significant. Another example is a parameter number,
for example~\n{\#1}: since at most nine parameters are allowed, scanning
one digit after the parameter character suffices.

\item From the grammar of \TeX\ 
it follows that the
keywords \n{fill} and \n{filll}
consist of \n{fil} and
separate {\tt l}$\,$s, each of which is a keyword
(see page~\pageref{keywords} for a more elaborate discussion),
and hence can be followed by optional spaces. 
Therefore forms such as \hbox{\n{fil L l}} are also valid.
This is a potential source of strange accidents.
In most cases, appending a \cs{relax} token prevents
such mishaps.

\item The primitive command \cstoidx ignorespaces\par\ 
may come in handy as the final command in a macro definition.
As it gobbles up
optional spaces, it can be used to prevent spaces following the
closing brace of an argument from winding up in the output
inadvertently. For example, in
\begin{verbatim}
\def\item#1{\par\leavevmode
    \llap{#1\enspace}\ignorespaces}
\item{a/}one line \item{b/} another line \item{c/}
yet another
\end{verbatim} 
the \cs{ignorespaces} prevents spurious
spaces in the second and third item.
An empty line
after \cs{ignorespaces} will still insert a \cs{par}, however.
\end{itemize}
\awp

%\spoint Ignored and obeyed spaces
\subsection{Ignored and obeyed spaces}

After control words spaces are ignored. This is not an
instance of optional spaces, but it is due to the fact that
\TeX\ goes into state~{\italic S}, skipping spaces, after control
words. Similarly an end-of-line character is skipped
after a control word.

Numbers are delimited by only \gr{one optional space},
but still
\begin{disp}\n{a\char92 count0=3\char32\char32b}\quad gives\quad `ab',\end{disp}
because \TeX\ goes into state~{\italic S} after the first
space token. The second space is therefore skipped 
in the input processor of \TeX; it never becomes a space token.

Spaces are skipped furthermore when \TeX\ is in state~{\italic N},
newline. When \TeX\ is processing in vertical mode
space tokens (that is, spaces that were not skipped)
are ignored. For example, the space inserted (because of the line end)
after the first box in
\begin{verbatim}
\par
\hbox{a}
\hbox{b}
\end{verbatim}
has no effect.

Both plain \TeX\ and \LaTeX\ define a command \cs{obeyspaces}
\altt
that makes spaces significant: after one space other spaces are no
longer ignored. In both cases the basis is
\altt
\begin{verbatim}
\catcode`\ =13 \def {\space}
\end{verbatim}
However, there is a difference between the two cases:
in plain \TeX\ \begin{verbatim}
\def\space{ }
\end{verbatim}
while in \LaTeX\ \begin{verbatim}
\def\space{\leavevmode{} }
\end{verbatim}
although the macros bear other names there.

The difference between the two macros becomes
apparent in the context of \cs{obeylines}:
each line end is then a \cs{par} command, implying that
each next line is started in vertical mode.
An active space is expanded by the plain macro to a space token, 
which is ignored in vertical mode.
The active spaces in \LaTeX\ will immediately switch to horizontal
mode, so that each space is significant.

%\spoint More ignored spaces
\subsection{More ignored spaces}

There are three further places where \TeX\ will ignore space tokens.
\alt
\begin{enumerate}
\item When \TeX\ is looking for
an undelimited macro argument it will accept the
first token (or group) that is not a space. This is treated
in Chapter~\ref{macro}.

\item In math mode space tokens are ignored (see Chapter~\ref{math}).

\item After an alignment tab character spaces are ignored
(see Chapter~\ref{align}).
\end{enumerate}
\awp

%\spoint \gr{space token}
\subsection{\gr{space token}}

Spaces are anomalous in \TeX.
For instance, the \cs{string} operation 
assigns category code~12 to all
characters except spaces; they receive category~10.
Also, as was said above, \TeX's input processor converts (when in
state~{\italic M}) all tokens with category code~10 into real spaces:
they get character code~32.
Any character token with category~10 is called
\gram{space token}\term space! token\par.
Space tokens with character
code not equal to 32 are called `funny spaces'
\term space !funny\par.

\begin{example} After giving the character \n Q 
the category code of a space character, 
and using it in a definition
\begin{verbatim}
\catcode`Q=10 \def\q{aQb}
\end{verbatim}
we get
\begin{verbatim}
\show\q
macro:-> a b
\end{verbatim}
because the input processor
changes the character code of the funny space
in the definition.
\end{example}

Space tokens with character codes other than 32 can be
created using, for instance, \cs{uppercase}.
However, `since the various forms of
space tokens are almost identical in behaviour, there's no
point dwelling on the details'; see~\cite{Knuth:TeXbook}~p.~377.


%\spoint Control space
\subsection{Control space}

The `control space' command \verb-\-\n{\char32}
\cstoidx\char32\par\
contributes the amount of space that a \gr{space token} would
when the \verb=\spacefactor= is~1000.
A~control space
is not treated like a space token, or like a macro
expanding to one (which is how \cs{space} is defined in plain \TeX).
For instance, \TeX\ ignores spaces
at the beginning of an input line, but
control space is a \gr{horizontal command}, so it 
makes \TeX\ switch from vertical to horizontal mode
(and insert an indentation box).
See  Chapter~\ref{space} for the space factor, and
chapter~\ref{hvmode} for horizontal and vertical modes.

%\spoint `\n{\char32}'
\subsection{`\n{\char32}'}

The explicit symbol `\n{\char32}' for a space
is character~32 in the Computer Modern typewriter typeface.
However, switching to \cs{tt} is not sufficient to get
spaces denoted this way, because spaces will still
receive special treatment in the input processor.

One way to
let spaces be typeset by \n{\char32}
is to set \begin{verbatim}
\catcode`\ =12
\end{verbatim}
\TeX\ will then take a space as the instruction to
typeset character number~32. Moreover, subsequent spaces
are not skipped, but also typeset this way: state~{\italic S}
\awp
is only entered after a character with category code~10.
Similarly, spaces after a control sequence are made
visible by changing the category code of the space character.

%\point More about line ends
\section{More about line ends}

\TeX\ accepts lines from an input file, excluding any line
terminator that may be used\term line! end\par.
Because of this, \TeX's behaviour here is not dependent
on the operating system and the line terminator it uses (\key{CR}-\key{LF},
\key{LF}, or none at all for block storage).
From the input line any trailing spaces are removed.
The reason for this is historic; it has to do with 
the block storage mode on \key{IBM} mainframe computers.
For some computer-specific problems with end-of-line
characters, see~\cite{B:ctrl-M}.

A~terminator character is then appended
with a character code of \cs{endlinechar}, 
unless this parameter has a value that
is negative or more than~255. 
Note that this terminator character
need not have category code~5, end of line.

%\spoint Obeylines
\subsection{Obeylines}

Every once in a while it is desirable that the line ends in
\message{Check spurious space obeylines+1}%
\cstoidx obeylines\par\howto Change the meaning of the line end\par
the input correspond to those in the output.
The following piece of code does the trick:
\begin{verbatim}
\catcode`\^^M=13 %
\def^^M{\par}% 
\end{verbatim}
The \cs{endlinechar} character is here made active,
and its meaning becomes \cs{par}.
The comment signs prevent \TeX\ from seeing the terminator of the
\alt
lines of this definition, and expanding it since it is active.

However, it takes some care to embed this code in a macro.
The definition
\begin{verbatim}
\def\obeylines{\catcode`\^^M=13 \def^^M{\par}}
\end{verbatim}
will be misunderstood:
\TeX\ will discard everything
after the second \verb>^^M>, because this has category code~5.
Effectively, this line is then
\begin{verbatim}
\def\obeylines{\catcode`\^^M=13 \def
\end{verbatim}
To remedy this,
the definition itself has to be
performed in a context where \verb>^^M> is an active
character:\begin{verbatim}
{\catcode`\^^M=13 %
 \gdef\obeylines{\catcode`\^^M=13 \def^^M{\par}}%
}
\end{verbatim}
Empty lines in the  input are not taken into account
in this definition: these disappear, because two consecutive \cs{par}
tokens are (in this case) equivalent to one. 
A slightly modified definition for the line end as
\begin{verbatim}
\def^^M{\par\leavevmode}
\end{verbatim}
remedies this:
now every line end forces \TeX\ to start a paragraph. For empty
lines this will then be an empty paragraph.
\awp

%\spoint Changing the \cs{\endlinechar}
\subsection{Changing the \cs{endlinechar}}

Occasionally you may want to change the \cs{endlinechar}, or
the \cs{catcode} of the ordinary line terminator \verb.^^M.,
for instance to obtain special effects such as macros where 
the argument is terminated by the line end.
See page~\pageref{pick:eol} for a worked-out example.

There are  a couple of traps. Consider the following:
\begin{verbatim}
{\catcode`\^^M=12 \endlinechar=`\^^J \catcode`\^^J=5
...
... }
\end{verbatim}
This causes unintended output of both character~13 (\verb-^^M-)
and~10 (\verb-^^J-), caused by the line terminators of the
first and last line.

Terminating the first and  last line with a comment works,
but replacing the first line by the two lines
\begin{verbatim}
{\endlinechar=`\^^J \catcode`\^^J=5
\catcode`\^^M=12
\end{verbatim}
is also a solution.

Of course, in many cases it is not necessary to substitute
another end-of-line character; a~much simpler solution 
is then to put \begin{verbatim}
\endlinechar=-1 
\end{verbatim}
which treats all lines as if they end with a comment.

%\spoint More remarks about the end-of-line character
\subsection{More remarks about the end-of-line character}

The character that \TeX\ appends at the end of an input line
is treated like any other character. Usually one is not aware
of this, as its category code is special, but there are a few
ways to let it be processed in an unusual way.

\begin{example} Terminating an input line with \verb>^^> will
(ordinarily, when \cs{endlinechar} is~13) give `M' in the output, 
which is the 
\ascii{} character with code~13+64.
\end{example}

\begin{example} If \verb>\^^M> has been defined,
terminating an input line with a backslash will execute this command.
The plain format defines
\begin{verbatim}
\def\^^M{\ }
\end{verbatim}
which makes a `control return' equivalent to a control space.
\end{example}

%\point More about the input processor
\section{More about the input processor}

%\spoint The input processor as a separate process
\subsection{The input processor as a separate process}

\TeX's levels of processing are all working at the
\awp
same time and incrementally, but conceptually they can often be
considered to be separate processes that each accept the
completed output of the previous stage. The juggling with
spaces provides a nice illustration for this.

Consider the definition
\begin{verbatim}
\def\DoAssign{\count42=800}
\end{verbatim}
and the call
\begin{verbatim}
\DoAssign 0
\end{verbatim}
The input processor, the part
of \TeX\ that builds tokens, in scanning this call
skips the space before the zero, so the expansion of this
call is \begin{verbatim}
\count42=8000
\end{verbatim}
It would be incorrect to reason
`\cs{DoAssign} is read, then expanded, the space delimits the
number 800, so 800 is assigned and the zero is printed'.
Note that the same would happen if the zero appeared on the next line.

Another illustration shows that optional spaces appear in a different
stage of processing from that for skipped spaces:
\begin{disp}\verb>\def\c.{\relax}>\nl
     \verb>a\c.>{\tt\char32 b}\end{disp}
expands to
\begin{disp}\n{a\cs{relax}\char32 b}\end{disp}
which gives as output\begin{disp} `a b'\end{disp}
because spaces after the \cs{relax} control sequence are only
skipped when the line is first read, not when it is expanded.
The fragment
\begin{disp} \verb-\def\c.{\ignorespaces}-\nl \verb-a\c. b-\end{disp}
on the other hand, expands to
\begin{disp}\n{a\cs{ignorespaces}\char32 b}\end{disp}
Executing the \cs{ignorespaces} command removes the subsequent
space token, so the output is \begin{disp} `ab'.\end{disp}
In both definitions
the period after \cs{c} is a delimiting token; it is used here
to prevent spaces from being skipped.

%\spoint The input processor not as a separate process
\subsection{The input processor not as a separate process}

Considering the tokenizing of \TeX\ to be a separate process
is a convenient view, but sometimes it leads to confusion.
The line \begin{verbatim}
\catcode`\^^M=13{}
\end{verbatim}
\awp
makes the line end active,
and subsequently gives an `undefined control sequence' error
for the line end of this line itself. Execution of the commands
on the line thus influences the scanning process of that
same line.

By contrast, \begin{verbatim}
\catcode`\^^M=13
\end{verbatim}
does not give an error.
The reason for this is that \TeX\ reads the line end while it is still
scanning the number~13; that is, at a time when the assignment
has not been performed yet.
The line end is then converted to the optional space character
delimiting the number to be assigned.

%\spoint Recursive invocation of the input processor
\subsection{Recursive invocation of the input processor}

Above, the activity of replacing a parameter
character plus a digit by a parameter token was described
as something similar to the lumping together of letters
into  a control sequence token. Reality is somewhat more
complicated than this. \TeX's token scanning mechanism
is invoked both for input from file and for input from
lists of tokens such as the macro definition. Only in the
first case is the terminology of internal states applicable.

Macro parameter characters are treated the same in both
cases, however. If this were not the case it would
not be possible to write things such as
\begin{verbatim}
\def\a{\def\b{\def\c####1{####1}}}
\end{verbatim}
See page \pageref{nest:def} for an explanation of such
nested definitions.

%\point The \verb@- convention
\section{The \n{@} convention}

Anyone who has ever browsed through either the plain format or
the \LaTeX\ format will have noticed that a lot of control sequences
contain an `at' sign:~\verb-@-. These are control sequences that
are meant to be inaccessible to the ordinary user.

Near the beginning of the format files the instruction
\begin{verbatim}
\catcode`@=11
\end{verbatim}
occurs, making the at sign into a letter,
meaning that it can be used in control sequences. Somewhere near the
end of the format definition the at sign is made `other' again:
\begin{verbatim}
\catcode`@=12
\end{verbatim}

Now why is it that users cannot
call a control sequence with an at sign
directly, although they can call macros that contain lots of those
`at-definitions'? The reason is that the control sequences
containing an \n@ are internalized by \TeX\ at definition time,
after which they are a token, not a string of characters. 
Macro expansion then
just inserts such tokens, and at that time the category codes
of the constituent characters do not matter any more.

%%%% end of input file [mouth]

%\InputFile:char
%%%% this is input file [char]
%\subject[char] Characters
\endofchapter
\chapter{Characters}\label{char}

Internally, \TeX\ represents characters by their (integer) 
character code. This chapter treats those codes, and the
commands that have access to them.

\begin{inventory}
\item [\cs{char}]
      Explicit denotation of a character to be typeset. 

\item [\cs{chardef}] 
      Define a control sequence to be a synonym for
      a~character code.

\item [\cs{accent}] 
      Command to place accent characters.

\item [\cs{if}]
      Test equality of character codes. 

\item [\cs{ifx}]
      Test equality of both character and category codes.

\item [\cs{let}]
      Define a control sequence to be a synonym of a token.

\item [\cs{uccode}] 
      Query or set
      the character code that is the uppercase variant of a given code.

\item [\cs{lccode}]
      Query or set
      the character code that is the lowercase variant of a given code.

\item [\cs{uppercase}]
      Convert the \gr{general text} argument to its uppercase form.

\item [\cs{lowercase}] 
      Convert the \gr{general text} argument to its lowercase form.

\item [\cs{string}]
      Convert a token to a string of one or more characters.
\item [\cs{escapechar}]
      Number of the character that is to be used 
      for the escape character
      when control sequences are being converted
      into character tokens. \IniTeX\ default:~92~(\cs{}).
\end{inventory}

%\point[char:code] Character codes
\section{Character codes}
\label{char:code}

Conceptually it is easiest to think that \TeX\ works with
\term character! codes\par
characters internally, but in fact
\TeX\ works with integers: the `character codes'. 

The way characters are encoded in a computer may differ
from system to system.
Therefore \TeX\ uses its own scheme of character codes.
Any character that is read from a file (or from the user terminal)
is converted to a character code according to the
character code table.
A~category code is then assigned based on this (see Chapter~\ref{mouth}).
The character code table is based on the 7-bit \ascii{} table
for numbers under~128 (see Chapter~\ref{table}).

There is an explicit conversion between characters
(better:  character tokens)
and  character codes  using the left quote (grave, back quote)
character~\n{`{}}:
at all places where \TeX\ expects a \gram{number} you
can use the left quote followed by a character
token or
a single-character control sequence.
Thus both \verb.\count`a. and \verb.\count`\a. are synonyms
\awp
for \verb.\count97.. See also Chapter~\ref{number}.

The possibility of a single-character control
sequence is necessary in certain cases such as
\begin{disp}\verb>\catcode`\%=11>\quad or\quad \verb>\def\CommentSign{\char`\%}>\end{disp}
which would be misunderstood if the backslash were left out.
For instance \begin{verbatim}
\catcode`%=11
\end{verbatim}
would consider
the \n{=11} to be a comment.
Single-character
control sequences can be formed from characters with any
category code.

After the conversion to character codes any connection
with external representations has disappeared. Of course,
for most characters  the visible output will `equal' the input
(that is, an `\n{a}' causes an~`a').
There are exceptions, however, even among the common symbols.
In the Computer Modern
roman fonts there are no `less than' and `greater than'
\message{Check <>! Dammit!}%
signs, so the input `\verb.<>.' will give `<>' in the output.
%{\MathRMx<>}

In order to make \TeX\ machine independent at the output
side, the character codes are also used in the \n{dvi} file:
opcodes $n=0\ldots127$ denote simply the instruction `take
character $n$ from the current font'. The complete definition
of the opcodes in a \n{dvi} file can be found in~\cite{Knuth:TeXprogram}.


%\point Control sequences for characters
\section{Control sequences for characters}

There are a number of ways in which a control sequence can denote
a character. The \cs{char} command specifies a character to be
typeset; the \cs{let} command introduces
a synonym for a character token, that is,
the combination of character code and category code.

%\point Denoting characters to be typeset: \cs\char
\section{Denoting characters to be typeset: \protect\cs{char}}

Characters can be denoted numerically by, for example,
\verb.\char98.\cstoidx char\par.
This command tells \TeX\ to add character number~98 of the
current font to the horizontal list currently under construction.

Instead of decimal notation, it is often more convenient to
use octal or hexadecimal notation. For octal the single quote is used:
\verb.\char'142.; hexadecimal uses the double quote: \verb.\char"62..
Note that \verb.\char''62. is incorrect; the process that replaces
two quotes by a double quote works at a later stage of processing
(the visual processor) than number scanning (the execution processor).

Because of the explicit conversion to character codes by the
back quote character it is also possible to get a `b' \ldash provided
that you are using a font organized a bit like the \ascii{} table \rdash
with \verb.\char`b.  or \verb.\char`\b..

The \cs{char} command looks superficially a bit like
the \verb-^^- substitution mechanism (Chapter~\ref{mouth}).
Both mechanisms access characters without directly denoting them.
However, the \verb-^^- mechanism operates in a very early stage of
processing (in the input processor of \TeX,
but before category code
assignment); the \cs{char} command, on the other hand,
comes in the final stages of processing. 
In effect it says `typeset character number
so-and-so'.
\awp

There is a construction to let a control sequence stand
for some character code: the \cstoidx chardef\par\ command.
The syntax of this is \label{chardef}
\begin{disp}\cs{chardef}\gram{control sequence}\gr{equals}\gram{number}, 
\end{disp}
where the number can be an explicit
representation or a counter value, but it can also be
a character code
obtained using the left quote command (see above; 
the full definition of \gr{number} is given in Chapter~\ref{number}). 
In the plain format 
the latter possibility is used in
definitions such as \begin{verbatim}
\chardef\%=`\%
\end{verbatim}
which could have been given equivalently as
\begin{verbatim}
\chardef\%=37
\end{verbatim}
After this command, the control symbol \verb>\%>
used on its own is a synonym for \verb>\char37>,
that is, the command to typeset character~37
(usually the per cent character).

A control sequence that has been defined with a \cs{chardef}
command can also be used as a \gr{number}.
This fact is used in  allocation commands such as 
\cs{newbox} (see Chapters~\ref{number} and~\ref{alloc}).
Tokens defined with \cs{mathchardef} can also be used this
way.

%\spoint Implicit character tokens: \cs{let}
\subsection{Implicit character tokens: \protect\cs{let}}

Another construction defining a control sequence
\term character !implicit\par
to stand for (among other things)
a character is~\cs{let}\cstoidx let\par:
\begin{disp}\cs{let}\gr{control sequence}\gr{equals}\gr{token}\end{disp}
with a character token on the right hand side of the (optional)
equals sign. The result is called an implicit character token.
(See page~\pageref{let} for a further discussion of~\cs{let}.)

In the
plain format there are for instance synonyms for
the open and close brace:
\begin{verbatim}
\let\bgroup={ \let\egroup=}
\end{verbatim}
The resulting control sequences are called `implicit braces'
(see Chapter~\ref{group}).

Assigning characters by \cs{let}
is different from defining control sequences by \cs{chardef}, 
in the sense that \cs{let}
makes the control sequence stand for the combination
of a character code and category code. 

As an example
\begin{verbatim}
\catcode`|=2 % make the bar an end of group
\let\b=|  % make \b a bar character
{\def\m{...}\b \m
\end{verbatim}
gives an `undefined control sequence \cs{m}'
because the \cs{b} closed the group inside which \cs{m}
was defined. On the other hand,
\begin{verbatim}
\let\b=| % make \b a bar character
\catcode`|=2  % make the bar character end of group
{\def\m{...}\b \m
\end{verbatim}
leaves one group open, and it prints a vertical bar
(or whatever is in position 124 of the current font).
The first of these examples
implies that even when the braces have been redefined
(for instance into active characters for macros that
format C code) the beginning-of-group and end-of-group
functionality is available through the control sequences
\cs{bgroup} and~\cs{egroup}.

Here is
another example to show
that implicit character tokens are hard to distinguish
from real character tokens. After the above sequence
\begin{verbatim}
\catcode`|=2 \let\b=|
\end{verbatim}
the tests \begin{verbatim}
\if\b|
\end{verbatim}
and \begin{verbatim}
\ifcat\b}
\end{verbatim}
are both true.

Yet another example can be found in the plain format:
the commands
\begin{verbatim}
\let\sp=^ \let\sb=_ 
\end{verbatim}
allow people without an
underscore or circumflex on their keyboard to 
make sub- and superscripts in mathematics.
For instance:
\begin{disp}\verb>x\sp2\sb{ij}>\quad gives\quad $x\sp2\sb{ij}$\end{disp}
If a person typing in the format itself does not have
these keys, some further tricks are needed:\label{spsb:truc}
\begin{verbatim}
{\lccode`,=94 \lccode`.=95 \catcode`,=7 \catcode`.=8
\lowercase{\global\let\sp=, \global\let\sb=.}}
\end{verbatim}
will do the job; see below for an explanation of lowercase codes.
The \verb>^^> method as it was in \TeX\ version~2
(see page~\pageref{hathat}) cannot be used here,
as it would require typing two characters that can ordinarily
not be input.
With the extension in \TeX\ version~3 it would also be possible
to write \begin{verbatim}
{\catcode`\,=7
\global\let\sp=,,5e \global\let\sb=,,5f}
\end{verbatim}
denoting the codes 94 and 95 hexadecimally.

Finding out just what a control sequence has been defined to be with
\cs{let} can be done using \cs{meaning}:
the sequence \begin{verbatim}
\let\x=3 \meaning\x
\end{verbatim}
gives
`\n{the character 3}'.\awp

%\point Accents
\section{Accents}

Accents can be placed by the
\gr{horizontal command}~\cstoidx accent\par\term accents\par
\label{character}:
\begin{disp}\cs{accent}\gr{8-bit number}\gr{optional assignments}%
     \gr{character}\end{disp}
where \gr{character} is a character of category 11 or~12,
 a~\cs{char}\gr{8-bit number} command,
or a~\cs{chardef} token. If none of these
four types of \gr{character} follows, the accent is taken to be a
\cs{char} command itself; this gives an accent `suspended
in mid-air'. Otherwise the accent is placed
on top of the following character.
Font changes between the accent and the character can be effected
by the \gr{optional assignments}.

An unpleasant implication of the fact that an \cs{accent} command
has to be followed by a \gr{character} is that it is not
possible to place an accent on a ligature, or
two accents on top of each other.
In some languages, such as Hindi or Vietnamese,
such double accents do occur.
Positioning accents on top of each other is possible,
however, in math mode.

The width of a character with an accent is the same as that of
the unaccented character. \TeX\ assumes that the 
accent as it appears in the font file
is properly positioned for a character that is as high
as the x-height of the font; for characters with other heights
it correspondingly lowers or raises the accent.

No genuine under-accents exist in \TeX. They are
implemented as low placed over-accents. A~way of handling
them more correctly would be to write a macro that
measures the following character, and raises or drops
the accent accordingly.
The cedilla macro, \cs{c}\cstoidx c\par,
in plain \TeX\ does something along these lines. However,
it does not drop the accent for characters with descenders.

The horizontal positioning of an accent is controlled by
\cs{fontdimen1}, slant per point. Kerns are used
for the horizontal movement. Note that, although they
are inserted automatically, these kerns are classified
as {\italic explicit\/} kerns. Therefore they inhibit hyphenation
in the parts of the word before and after the kern.

As an example of kerning for accents, 
here follows the dump of a horizontal list.
\message{maybe italic correction for extra line}
\begin{verbatim}
\setbox0=\hbox{\it \`l}
\showbox0
\end{verbatim}
gives\begin{verbatim}
\hbox(9.58334+0.0)x2.55554
.\kern -0.61803 (for accent)
.\hbox(6.94444+0.0)x5.11108, shifted -2.6389
..\tenit ^^R
.\kern -4.49306 (for accent)
.\tenit l
\end{verbatim}
Note that the accent is placed first, so afterwards the italic
correction of the last character is still available.
\awp

%\point Testing characters
\section{Testing characters}

Equality of character codes is tested by \cs{if}:
\begin{disp}\cs{if}\gr{token$_1$}\gr{token$_2$}\end{disp}
Tokens following this conditional are expanded until two
unexpandable tokens are left. The condition is then true
if those tokens are character tokens with the same character
code, regardless of category code. 

An unexpandable control
sequence is considered to have character code 256 and
category code~16 (so that it is unequal to anything except
another control sequence), except in the case
where it had been \cs{let} to a non-active character token.
In that case it is considered to have the character code
and category code of that character. This was mentioned above.

The test \cs{ifcat} for category codes was mentioned
in Chapter~\ref{mouth}; the test
\begin{disp}\cs{ifx}\gr{token$_1$}\gr{token$_2$}\end{disp}
can be used to test for category code and character code
simultaneously.
The tokens following this test are not expanded.
However, if they are macros, \TeX\
tests their expansions for equality.

Quantities defined by \cs{chardef} can be tested with
\cs{ifnum}:
\begin{verbatim}
\chardef\a=`x \chardef\b=`y \ifnum\a=\b % is false 
\end{verbatim}
based on the fact (see Chapter~\ref{number}) that
\gr{chardef token}s can be used as numbers.

%\point Uppercase and lowercase
\section{Uppercase and lowercase}

%\spoint[uc/lc] Uppercase and lowercase codes
\subsection{Uppercase and lowercase codes}
\label{uc/lc}

To each of the character codes correspond
\term uppercase\par\term lowercase\par
\cstoidx lccode\par\cstoidx uccode\par
an uppercase code and a lowercase code (for still more codes see below).
These can be assigned
by 
\begin{Disp}\cs{uccode}\gram{number}\gr{equals}\gram{number}\end{Disp}
and 
\begin{Disp}\cs{lccode}\gram{number}\gr{equals}\gram{number}.\end{Disp}
In \IniTeX\ codes \verb-`a..`z-, \verb-`A..`Z- have uppercase code
\label{ini:uclc}
\verb-`A..`Z- and lowercase code \verb-`a..`z-.
All other character codes have both uppercase and lowercase
code zero.

%\spoint[upcase] Uppercase and lowercase commands
\subsection{Uppercase and lowercase commands}
\label{upcase}

The commands \verb-\uppercase{...}- and \verb-\lowercase{...}-
\cstoidx uppercase\par\cstoidx lowercase\par
go through their argument lists, replacing all character 
codes of explicit character tokens
by their uppercase and lowercase code respectively
if these are non-zero,
without changing the category codes. 
\awp

The argument of \cs{uppercase} and \cs{lowercase}
is a \gr{general text}, which is defined as
\begin{Disp} \gr{general text} $\longrightarrow$ \gr{filler}\lb
      \gr{balanced text}\gr{right brace}\end{Disp}
(for the definition of \gr{filler} see Chapter~\ref{gramm})
meaning that the left brace can be implicit, but the closing
right brace must be an explicit character token with category
code~2. \TeX\ performs expansion to find the opening
brace.

Uppercasing and lowercasing are executed in the execution processor;
they are not `macro expansion' activities
like \cs{number} or \cs{string}.
The sequence (attempting to produce~\cs{A})
\begin{verbatim}
\expandafter\csname\uppercase{a}\endcsname
\end{verbatim}
gives an error (\TeX\ inserts an \cs{endcsname} before   the
\cs{uppercase} because \cs{uppercase} is unexpandable), but
\begin{verbatim}
\uppercase{\csname a\endcsname}
\end{verbatim}
works.

As an example of the correct use of \cs{uppercase}, here
is a macro that tests if a character is uppercase:
\begin{verbatim}
\def\ifIsUppercase#1{\uppercase{\if#1}#1}
\end{verbatim}
The same test can be
performed by \verb>\ifnum`#1=\uccode`#1>.

Hyphenation of words starting with an uppercase character,
that is, a character not equal to its own \cs{lccode},
is subject to the \cs{uchyph} parameter: if this
is positive, hyphenation of capitalized words is allowed.
See also Chapter~\ref{line:break}.

%\spoint Uppercase and lowercase forms of keywords
\subsection{Uppercase and lowercase forms of keywords}

Each character in \TeX\ keywords, such as \n{pt}, can be
given in uppercase or lowercase form. 
For instance, \n{pT}, \n{Pt}, \n{pt}, and~\n{PT} all have
the same meaning. \TeX\ does not use
the \cs{uccode} and \cs{lccode} tables here to
determine the lowercase form. Instead it
converts uppercase characters to lowercase by adding~32
\ldash the \ascii{} difference between uppercase and lowercase
characters \rdash to their character code. This has some implications
for implementations of \TeX\ for non-roman alphabets;
see page 370 of \TeXbook, \cite{Knuth:TeXbook}.

%\spoint Creative use of \cs{uppercase} and \cs{lowercase}
\subsection{Creative use of \cs{uppercase} and \cs{lowercase}}

The fact that \cs{uppercase} and \cs{lowercase} do not change
category codes can sometimes be used to create certain
character-code--category-code combinations that would
otherwise be difficult to produce. See for instance the
explanation of the \cs{newif} macro in Chapter~\ref{if},
and another example on page~\pageref{spsb:truc}.

For a slightly different application, consider the
problem (solved by Rainer Sch\"opf) of,
given a counter \verb-\newcount\mycount-, writing character
number \verb-\mycount- to the terminal.
Here is a solution:
%\begin{verbatim}
%\lccode`a=\mycount \chardef\terminal=16
%\lowercase{\write\terminal{a}}
%\end{verbatim}
\begin{verbatim}
\lccode`a=\mycount \chardef\terminal=16
\end{verbatim}
\awp
\begin{verbatim}
\lowercase{\write\terminal{a}}
\end{verbatim}
The \cs{lowercase} command effectively changes the 
argument of the \cs{write} command from~`\n a'
into whatever it should be.

%\point[codename] Codes of a character
\section{Codes of a character}
\label{codename}

Each character code has a number of \gr{codename}s associated
\term codenames\par
with it. These are integers in various ranges that determine
how the character is treated in various contexts, or
how the occurrence of that character changes the workings
of \TeX\ in certain contexts.

The code names are as follows:
\begin{description}\item [\cs{catcode}]
\gr{4-bit number} (0--15); the category to which a character belongs.
This is treated in Chapter~\ref{mouth}.
\item [\cs{mathcode}]
\gr{15-bit number} (0--\verb-"7FFF-) or \verb-"8000-;
determines how a character is treated
in math mode. See Chapter~\ref{mathchar}.
\item [\cs{delcode}]
\gr{27-bit number} (0--\n{\hex7$\,$FFF$\,$FFF});
determines how a character is treated after
\cs{left} or \cs{right} in math mode.
See page~\pageref{delcodes}.
\item [\cs{sfcode}]
integer; determines how spacing is affected after this character.
See Chapter~\ref{space}.
\item [\cs{lccode}, \cs{uccode}]
\gr{8-bit number} (0-255); lowercase and
uppercase codes \rdash these were treated above.
\end{description}

%\point Converting tokens into character strings
\section{Converting tokens into character strings}

The command \cs{string} takes the next token and expands it
\cstoidx string\par
into a string of separate characters. Thus
\begin{verbatim}
\tt\string\control
\end{verbatim}
will give \cs{control} in the
output, and
\begin{verbatim}
\tt\string$
\end{verbatim}
will give~\verb-$-, but, noting that the string 
operation comes after the tokenizing,
\begin{verbatim}
\tt\string%
\end{verbatim}
will {\em not\/} give~\verb$%$,
because the comment
sign is removed by \TeX's input processor.
Therefore, this command will `string' the first token on the next line.

The \cs{string} command is executed by the expansion processor, thus
it is expanded unless explicitly inhibited (see Chapter~\ref{expand}).

%\spoint Output of control sequences
\subsection{Output of control sequences}

In the above examples the typewriter font was selected, because
\cstoidx escapechar\par
the Computer Modern roman font does not have a backslash character.
\awp
However,
\TeX\ need not have used the backslash character to display
a control sequence: it uses character number \cs{escapechar}.
This same value is also used when a control sequence is
output with \cs{write}, \cs{message}, or \cs{errmessage},
and it is used in the output of \cs{show}, \cs{showthe} and \cs{meaning}.
If \cs{escapechar} is negative or more than~255,
the escape character is not
output; the default value (set in \IniTeX) is~92, the number
of the backslash character.

For use in a  \cs{write} statement the \cs{string} can 
in some circumstances be
replaced  by \cs{noexpand} (see page~\pageref{expand:write}).

%\spoint Category codes of a \cs{string}
\subsection{Category codes of a \cs{string}}

The characters that are the result of a \cs{string} command have 
category code~12, except for any spaces in 
a stringed control sequence;
they have category code~10. Since inside a control
sequence there are no category codes, 
any spaces resulting from \cs{string} are
of necessity only space {\em characters}, that is,
characters with code~32.
However, \TeX's input processor converts
all space tokens that have a character code other than~32
into character tokens with character code~32, 
so the chances are pretty slim that
`funny spaces' wind up in control sequences.

Other commands with the same behaviour with respect to 
category codes as \cs{string}, are
\cs{number},
\cs{romannumeral}, \cs{jobname}, \cs{fontname}, \cs{meaning},
and \cs{the}.




%%%% end of input file [char]

%\InputFile:fontfam
%%%% this is input file [fontfam]
%\subject[font] Fonts
\endofchapter
\chapter{Fonts}\label{font}

In text mode \TeX\ takes characters from a `current font'.
\term fonts\par
This chapter describes how fonts are identified to \TeX,
and what attributes a font can have.

\begin{inventory}
\item [\cs{font}] 
      Declare the identifying control sequence of a font.

\item [\cs{fontname}] 
      The external name of a font.

\item [\cs{nullfont}] 
      Name of an empty font that \TeX\ uses in emergencies.


\item [\cs{hyphenchar}] 
      Number of the hyphen character of a font.

\item [\cs{defaulthyphenchar}] 
      Value of \cs{hyphenchar} when a font is loaded.
      Plain \TeX\ default:~\verb>`\->.

\item [\cs{fontdimen}] 
      Access various parameters of fonts.

\item [\cs{char47}]
      Italic correction.

\item [\cs{noboundary}] 
      Omit implicit boundary character.
\end{inventory}

%\point Fonts
\section{Fonts}

In  \TeX\ terminology a font is the set of characters that
is contained in one external font file. 
During processing, \TeX\ decides from
what font a character should be taken. This decision is
taken separately for text mode and math mode.

When \TeX\ is processing ordinary text, characters are taken
from the `current font'. 
External font file names are coupled to  control sequences
by   statements such as
\begin{verbatim}
\font\MyFont=myfont10
\end{verbatim}
which makes \TeX\ load the file \n{myfont10.tfm}.
Switching the current font to the font described in that file
is then done by
\begin{verbatim}
\MyFont
\end{verbatim}
The status of the current font
can be queried: the sequence \begin{verbatim}
\the\font
\end{verbatim}
produces the control sequence for the current font.

Math mode completely ignores the current font. Instead
it looks  at the `current family', which can contain
three fonts: one for text style, one for script style,
and one for scriptscript style. This is treated
in Chapter~\ref{mathchar}.
\awp

See \cite{S} for a consistent terminology of fonts and typefaces.

With `virtual fonts' (see~\cite{K:virt}) it is possible that
what looks like one font to \TeX\ resides in more than
one physical font file.
\alt
See further page~\pageref{virtual:fonts}.

%\point Font declaration
\section{Font declaration}

Somewhere during a run of \TeX\ or \IniTeX\ 
\cstoidx font\par
the coupling between an internal identifying control sequence
and the external file name of a font has to be made.
The syntax of the command for this is
\begin{disp}\cs{font}\gr{control sequence}\gr{equals}%
\gr{file name}\gr{at clause}\end{disp} 
where
\begin{disp}\gr{at clause} $\longrightarrow$ \n{at} \gr{dimen}
$|$ \n{scaled} \gr{number} $|$ \gr{optional spaces}\end{disp}
Font declarations are local to a group.

By the \gr{at clause} the user specifies that some
magnified version of the font is wanted. The \gr{at clause} comes
in two forms: if the font is given \n{scaled}~{\italic f\/} \TeX\
multiplies all its font dimensions for that font by~$f/1000$; 
if the font
has a design size~{\italic d\/}\n{pt} and 
the \gr{at clause} is \n{at}~{\italic p\/}\n{pt}
\TeX\ multiplies all font data by~$p/d$.
The presence of an \gr{at clause} makes no difference for
the external font file (the \n{.tfm} file)
that \TeX\ reads for the font; it just multiplies
the font dimensions by a constant.


After such a font declaration, using the defined control sequence
will set the current font to the font of the 
control sequence.

%\spoint Fonts and \n{tfm} files
\subsection{Fonts and \n{tfm} files}

The external file needed for the font is a \n{tfm} 
(\TeX\ font metrics) file,
which is taken independent of any  \gr{at clause}
in the \cs{font} declaration. If the \n{tfm}
file has been loaded already (for instance by \IniTeX\
when it constructed the format),
an assignment of that font file can be reexecuted
without needing recourse to the \n{tfm} file.

Font design sizes are given in the font metrics files.
The \n{cmr10} font, for instance, has a design size
of 10~point. However, there is not much in the font
that actually has a size of 10~points: the opening and closing
parentheses are two examples, but capital
letters are considerably smaller.

%\spoint Querying the current font and font names
\subsection{Querying the current font and font names}

It was already mentioned above that the control sequence
which set the current font can be retrieved by the
command \verb>\the\font>. This is a special case of
\begin{Disp}\cs{the}\gr{font}\end{Disp} where 
\begin{disp}\gr{font} $\longrightarrow$
\cs{font} $|$ \gr{fontdef token} $|$ \gr{family member}\nl
\gr{family member} $\longrightarrow$ 
\gr{font range}\gr{4-bit number}\nl
\gr{font range} $\longrightarrow$ 
\cs{textfont} $|$ \cs{scriptfont} $|$ \cs{scriptscriptfont}\end{disp}
\awp
A \gr{fontdef token} is a control sequence defined by \cs{font},
or the predefined control sequence \cs{nullfont}.
The concept of \gr{family member} is only 
relevant in math mode.

Also, the 
\cstoidx fontname\par
external name of fonts can be retrieved:
\begin{Disp}\cs{fontname}\gr{font}\end{Disp}
gives a sequence of character tokens of category~12
(but space characters get category~10) that spells the font file
name, plus an \gr{at clause} if applicable.

\begin{example} After
\begin{verbatim}
\font\tenroman=cmr10 \tenroman
\end{verbatim}
the calls
\verb>\the\font> and \verb>\the\tenroman> both give \cs{tenroman}.
The call \verb>\fontname\tenroman> gives \n{cmr10}.
\end{example}

%\spoint \cs{nullfont}
\subsection{\cs{nullfont}}

\TeX\ always knows a font that has no characters: the \csidx{nullfont}.
If no font has been specified, or if in math mode a family member
is needed that has not been specified, 
\TeX\ will take its characters from the nullfont.
This control sequence qualifies as a \gr{fontdef token}:
it acts like any other control sequence that stands for a font;
it just does not have an associated \n{tfm} file.

%\point Font information
\section{Font information}

During a run of \TeX\ the main information needed about the
\term \n{tfm} files\par
font consists of the dimensions of the characters.
\TeX\ finds these in the font metrics files, which usually have
extension \n{.tfm}. Such files
contain \begin{itemize} \item global information: the \cs{fontdimen}
parameters, and some other information,
\item dimensions and the italic corrections of characters, and
\altt 
\item ligature and kerning programs for characters.
	\end{itemize}
Also, the design size of a font is specified in the \n{tfm} file;
see above. The definition of the \n{tfm} format can be found
in~\cite{Knuth:TeXprogram}.

%\spoint[font:dims] Font dimensions
\subsection{Font dimensions}
\label{font:dims}

Text fonts need to have at least seven \csidx{fontdimen} parameters
(but \TeX\ will take zero for unspecified parameters);
\term font! dimensions\par
math symbol and math extension fonts have more
(see page~\pageref{fam23:fontdims}).
For text fonts the minimal set of seven comprises the following:
\begin{enumerate} \item the slant per point; this dimension is used
    for the proper horizontal positioning of accents;
\awp
\item the interword space: this is used unless the user
    specifies an explicit \cs{spaceskip};
    see Chapter~\ref{space};
\item interword stretch: the stretch component of the interword
    space;
\item interword shrink: the shrink component of
    the interword space;
\item the x-height: the value of
    the \gr{internal unit} \n{ex}, which is usually about the
    height of the lowercase letter~`x'; 
\item the quad width:
    the value of the \gr{internal unit} \n{em}, which is
    approximately the width of the capital letter~`M'; and
\item the extra space: the space added to the interword space
at the end of sentences (that is, when \cs{spacefactor}${}\geq2000$)
unless the user specifies an explicit \cs{x\-space\-skip}.
\end{enumerate}

Parameters 1 and~5 are purely information about the font
and there is no point in varying them.
The values of other parameters can be changed in order to
adjust spacing; see Chapter~\ref{space} for examples
of changing parameters 2, 3, 4, and~7.

Font dimensions can be altered in a \gr{font assignment},
which is a \gr{global assignment} (see page~\pageref{global:assign}):
\begin{Disp}\cs{fontdimen}\gr{number}\gr{font}\gr{equals}\gr{dimen}
\end{Disp} See above for the definition of \gr{font}.

%\spoint Kerning
\subsection{Kerning}

Some combinations of characters should be moved closer
\term kerning\par
together than would be the case if their bounding boxes
were to be just abutted. This fine spacing is called kerning,
and a proper kerning is as essential to a font as the
design of the letter shapes.

Consider as an example\message{Kerning!}
\begin{Disp} `Vo' versus the unkerned variant `V\hbox{}o'\end{Disp}

Kerning in \TeX\ is controlled by information in the
\n{tfm} file, and is therefore outside the influence of the
user. The \n{tfm} file can be edited, however (see Chapter~\ref{TeXcomm}).

The \cs{kern} command has (almost) nothing to do with the
phenomenon of kerning; it is explained in Chapter~\ref{glue}.

%\spoint Italic correction
\subsection{Italic correction}

The primitive control symbol \verb-\/- inserts the `italic
\term italic correction\par\cstoidx /\par
correction' of the previous character or ligature.
Such a correction may be necessary owing to the definition
of the `bounding box' of a character. This box always
has vertical sides, and the width of the character as \TeX\
perceives it is the distance between these sides.
However, in order to achieve proper spacing  for slanted or
italic typefaces, characters may very well project outside their
bounding boxes. The italic correction is then needed if
such an overhanging character is followed by a 
character from a non-slanting typeface.
\awp

Compare for instance\message{Visible italic correction!}
\begin{Disp} `{\italic\TeX} has'
to `{\italic\TeX\/} has',
\end{Disp} where the second version was typed as
\begin{verbatim}
{\italic\TeX\/} has
\end{verbatim}

The size of the italic correction of each character
is determined by font information
in the font metrics file; for the Computer Modern fonts it is
approximately half the `overhang' of the characters;
see~\cite{K:partE}.
Italic correction is not the same as \cs{fontdimen1}, slant
per point. That font dimension is used only for positioning
accents on top of characters.

An italic correction can only be inserted if the previous item
processed 
by \TeX\ was a character or ligature. Thus the
following solution for roman text inside an italic passage
does not work:
\begin{verbatim}
{\italic Some text {\/\roman not} emphasized}
\end{verbatim}
The italic correction has no effect here,
because the previous item is glue.

%\spoint Ligatures
\subsection{Ligatures}

Replacement of character sequences by ligatures is controlled
\term ligatures\par
by information in the \n{tfm} file of a font.
Ligatures are formed from \gr{character} commands:
sequences such as \n{fi} are replaced by `fi' in some fonts.

Other ligatures traditionally in use are
between \n{ff}, \n{ffi}, \n{fl}, and \n{ffl};
in some older works \n{ft} and \n{st} can be found,
and similarly to the \n{fl} ligature \n{fk} and \n{fb}
can also occur.

Ligatures in \TeX\ can be formed between explicit character
tokens, \cs{char} commands, and \gr{chardef token}s.
For example,
the sequence \verb-\char`f\char`i- is replaced by the
`fi' ligature, if such a ligature is part of the font.

Unwanted ligatures can be suppressed in a number of ways:
the unwanted ligature `\hbox{halflife}' can 
for instance be prevented by
\begin{disp} \verb>half{}life>, \verb>half{l}ife>, \verb>half\/life>,
      or \verb>half\hbox{}life>\end{disp}
but the solution using italic correction is not equivalent
to the others.

%\spoint Boundary ligatures
\subsection{Boundary ligatures}

Each word is surrounded by a left and a right
boundary character (\TeX3 only).
This makes phenomena possible
such as the two different sigmas in Greek:
one at the end of a word, and one for every other position.
This can be realized through a ligature with the
boundary character. A~\csidx{noboundary} command immediately
before or after a word suppresses the boundary character
at that place.

In general, the ligature mechanism has become more complicated
with the transition to \TeX\ version~3; see~\cite{K:TeX23}.

%%%% end of input file [fontfam]

%\InputFile:boxes
%%%% this is input file [boxes]
%\tracingmacros=2 \tracingcommands\tracingmacros
%\subject[boxes] Boxes
\endofchapter
\chapter{Boxes}\label{boxes}

The horizontal and vertical boxes of \TeX\ are containers for
\term box\par
pieces of horizontal and vertical lists.
Boxes can be stored in box registers. 
This chapter treats box registers and such
aspects of boxes as their dimensions, and the way their components
are placed relative to each other.

\begin{inventory}
\item [\cs{hbox}] 
      Construct a horizontal box.
\item [\cs{vbox}] 
      Construct a vertical box with reference point of the last item.
\item [\cs{vtop}] 
      Construct a vertical box with reference point of the first item.
\item [\cs{vcenter}] 
      Construct a vertical box vertically centred
      on the math axis; this command can only be used in math mode.

\item [\cs{vsplit}] 
      Split off the top part of a vertical box. 

\item [\cs{box}] 
      Use a box register, emptying it. 

\item [\cs{setbox}] 
      Assign a box to a box register.

\item [\cs{copy}] 
      Use a box register, but retain the contents. 

\item [\cs{ifhbox \cs{ifvbox}}]
\mdqon
      Test whether a box register contains a horizontal/""vertical box.
\mdqoff

\item [\cs{ifvoid}] 
      Test whether a box register is empty.


\item [\cs{newbox}] 
      Allocate a new box register. 

\item [\cs{unhbox \cs{unvbox}}]
      Unpack a box register containing a horizontal/vertical box,
      adding the contents to the current horizontal/vertical list,
      and emptying the register. 

\item [\cs{unhcopy \cs{unvcopy}}]
      The same as \cs{unhbox}$\,$/$\,$\cs{unvbox},
      but do not empty the register. 

\item [\cs{ht \cs{dp} \cs{wd}}]
      Height/depth/width of the box in a box register. 

\item [\cs{boxmaxdepth}] 
      Maximum allowed depth of boxes.
      Plain \TeX\ default:~\cs{maxdimen}.

\item [\cs{splitmaxdepth}]
      Maximum allowed depth of boxes generated by \cs{vsplit}.

\item [\cs{badness}] 
      Badness of the most recently constructed box.

\item [\cs{hfuzz \cs{vfuzz}}]
      Excess size that \TeX\ tolerates before it considers  
\mdqon
      a horizontal/""vertical box overfull.
\mdqoff

\item [\cs{hbadness \cs{vbadness}}]
      Amount of tolerance before \TeX\ reports an underfull 
\mdqon
      or overfull  horizontal/""vertical box.
\mdqoff

\item [\cs{overfullrule}] 
      Width of the rule that is printed to indicate 
      overfull horizontal boxes.

 
\item [\cs{hsize}] 
      Line width used for text typesetting inside a vertical box.
\awp

\item [\cs{vsize}] 
      Height of the page box.


\item [\cs{lastbox}] 
      Register containing the last item added to the current list, 
      if this was a box.

\item [\cs{raise \cs{lower}}]
      Adjust vertical positioning of a box in horizontal mode. 

\item [\cs{moveleft \cs{moveright}}]
      Adjust horizontal positioning of a box in vertical mode. 

\item [\cs{everyhbox \cs{everyvbox}}]
\mdqon
      Token list inserted at the start of a horizontal/""vertical box.
\mdqoff

\end{inventory}

%\point Boxes
\section{Boxes}

In this chapter we shall look at boxes. Boxes are containers
for pieces of horizontal or vertical lists.
Boxes that are needed more than once can be stored in box registers.

When \TeX\ expects a \gr{box}, any of the following forms
is admissible:
\begin{itemize}
\item \cs{hbox}\gr{box specification}\lb\gr{horizontal material}\rb
\item \cs{vbox}\gr{box specification}\lb\gr{vertical material}\rb
\item \cs{vtop}\gr{box specification}\lb\gr{vertical material}\rb
\item \cs{box}\gr{8-bit number}
\item \cs{copy}\gr{8-bit number}
\item \cs{vsplit}\gr{8-bit number}\n{to}\gr{dimen}
\item \cs{lastbox}
\end{itemize}
A \gr{box specification} is defined as\label{box:spec}
\begin{disp}\gr{box specification} $\longrightarrow$ \gr{filler}
\nl\indent$|$ \n{to} \gr{dimen}\gr{filler} 
          $|$ \n{spread} \gr{dimen}\gr{filler}
\end{disp}
An \gr{8-bit number} is a number in the range~0--255.

The braces surrounding box material define a group;
they can be explicit characters
of categories 1 and~2 respectively,
or control sequences \cs{let} to such characters;
see also below.


A \gr{box} can in general be used in horizontal, vertical,
and math mode, but see below for the \cs{lastbox}.
The connection between
boxes and modes is explored further in Chapter~\ref{hvmode}.

The box produced by \cs{vcenter} \ldash a command that is allowed only in
math mode \rdash  is not a \gr{box}. For instance,
it can not be assigned with \verb=\setbox=; see further
Chapter~\ref{math}.

The \cs{vsplit} operation is treated in Chapter~\ref{page:break}.

%\point Box registers
\section{Box registers}

There are 256 box registers, numbered 0--255. 
\term box! registers\par
Either a box register is  empty (`void'), or it contains a horizontal
or vertical box.
This section discusses specifically box {\em registers};
the sizes of boxes, and the way material is arranged inside them,
is treated below.
\awp

%\spoint Allocation: \cs{newbox}
\subsection{Allocation: \cs{newbox}}

The plain \TeX\ \csidx{newbox} macro allocates an unused
box register:
\begin{verbatim}
\newbox\MyBox 
\end{verbatim}
after which one can say
\begin{verbatim}
\setbox\MyBox=...
\end{verbatim}
or \begin{verbatim}
\box\MyBox
\end{verbatim}
and so on.
Subsequent calls to this macro give subsequent box numbers;
this way macro collections can allocate their own boxes
without fear of collision with other macros.

The number of the box is assigned by \cs{chardef}
(see Chapter~\ref{alloc}). 
This implies that \cs{MyBox} is equivalent to,
and can be used as, a~\gr{number}.
The control sequence
\altt
\cs{newbox} is an \cs{outer} macro.
Newly allocated box registers are initially empty.


\subsection{Usage: \cs{setbox}, \cs{box}, \cs{copy}}

A~register is filled by assigning a \gr{box}
\cstoidx setbox\par
to it:
\begin{Disp}\verb>\setbox>\gr{number}\gr{equals}\gr{box}\end{Disp}
For example, the \gr{box} can be explicit
\begin{Disp}\verb>\setbox37=\hbox{...}>\quad or\quad \verb>\setbox37=\vbox{...}>
\end{Disp}
or it can be a box register:
\begin{verbatim}
\setbox37=\box38
\end{verbatim}
Usually, box numbers will have been assigned by a \cs{newbox}
command.

The box in a box register is appended
by the commands \cs{box} and~\cs{copy}
to whatever list \TeX\ is building: the call
\begin{verbatim}
\box38
\end{verbatim}
appends box~38.
To save memory space, box registers become empty by using them:
\TeX\ assumes that after you have inserted a box by
calling \csidx{box}$nn$ in some mode, you do not need the
contents of that register any more and empties it.
In case you {\em do\/} need the contents of
a box register more than once, 
you can \csidx{copy} it. Calling \cs{copy}$nn$ is
equivalent to \cs{box}$nn$ in all respects except that
the register is not cleared.

It is possible to unwrap the contents of a box register
by `unboxing' it using the commands \cs{unhbox} and \cs{unvbox},
and their copying versions \cs{unhcopy} and \cs{unvcopy}.
Whereas a box can be used in any mode, the
unboxing operations can only be used in the appropriate mode,
since in effect they contribute a partial
horizontal or vertical list (see also Chapter~\ref{hvmode}).
See below for more information on unboxing registers.
\awp

%\spoint Testing: \cs{ifvoid}, \cs{ifhbox}, \cs{ifvbox}
\subsection{Testing: \cs{ifvoid}, \cs{ifhbox}, \cs{ifvbox}}

Box
registers can be tested for their contents:
\begin{disp}\cs{ifvoid}\gr{number}\end{disp}
is true if the box register is empty.
Note that an empty, or `void',
box register is not the same as a register containing an empty box.
An empty box is still either a horizontal or a vertical box;
a~void register can be used as both.

The test
\begin{disp}\cs{ifhbox}\gr{number}\end{disp}
is true if the box register contains a horizontal box;
\begin{disp}\cs{ifvbox}\gr{number}\end{disp}
is true if the box register contains a vertical box.
Both tests are false for void registers.

%\spoint[lastbox] The \cs{lastbox}
\subsection{The \cs{lastbox}}
\label{lastbox}

When \TeX\ has built a partial list, the last box in this
list is accessible as the \csidx{lastbox}. This behaves
like a box register, so you can remove the last box from  the
list by assigning the \cs{lastbox} to some  box register. 
If the last item on the current list is not a box,
the \cs{lastbox} acts like a void box register.
It is not possible to get hold of the last box
in the case of the main vertical list.
The \cs{lastbox} is then always void.

As an example, the statement \begin{verbatim}
{\setbox0=\lastbox}
\end{verbatim}
removes
the last box from the current list, assigning it to box
register~0. Since this assignment occurs inside a group,
the register is cleared at the end of the group.
At the start of a paragraph this can be used to remove the
indentation box (see Chapter~\ref{par:start}).
Another example of \cs{lastbox} can be found on page~\pageref{varioset}.

Because the \verb-\lastbox- is always empty in external vertical mode,
it is not possible to get hold of boxes that have been 
added to the page. However, it is possible to dissect
the page once it is in \cs{box255}, for instance doing
\begin{verbatim}
\vbox{\unvbox255{\setbox0=\lastbox}}
\end{verbatim}
inside the output routine.

If boxes in vertical mode have been shifted by \cs{moveright}
or \cs{moveleft}, or if boxes in horizontal mode  have
been raised by \cs{raise} or lowered by \cs{lower}, 
any information about this
displacement due to such a command is lost when
the \cs{lastbox} is taken from the list.
\awp

%\point Natural dimensions of boxes
\section{Natural dimensions of boxes}

%\spoint Dimensions of created horizontal boxes
\subsection{Dimensions of created horizontal boxes}

Inside an \csidx{hbox} all constituents are lined up next to each other,
\term box! dimensions\par
with their reference points on the baseline of the box,
unless they are moved explicitly in the vertical directio