[ LUGOS ] Bringing the Linux Documentation Project to Slovenia

Primoz Peterlin peterlin na biofiz.mf.uni-lj.si
Pet Nov 20 18:17:18 CET 1998


Pozdravljeni,

Z nekajtedensko zamudo bi rad poročal o predstavitvi Linuxa in društva
LUGOS na mednarodni konferenci Where East Meets West: Technical
Communication and Usability (štiri vabljene predavatelje, dr. Saula
Carlinerja, Paulo Berger, dr. Janice Redish in dr. Evelyn Williams, ste
imeli priliko slišati tudi na Infosu). Kot gostje podjetja HERMES SoftLab
smo se konference udeležili Roman Maurer, Marko Samastur, Borut Mrak in
jaz, predstavili pa smo prispevek z naslovom ,,Bringing the Linux
Documentation Project to Slovenia``, ki povzema delo sekcije za
slovenjenje Linuxa pri društvu. Prispevek bo izšel v konferenčnem
zborniku, preliminarno in, hmm, ekskluzivno :-) pa ga ponujamo v branje
tudi bralkam in bralcem te liste.

Lep pozdrav, Primož

--
Primož Peterlin         email: primoz.peterlin na biofiz.mf.uni-lj.si
Inštitut za biofiziko MF, Lipičeva 2, SI-1000 Ljubljana, Slovenija
Fax: +386-61-1315127     WWW: http://sizif.mf.uni-lj.si/~peterlin/

--------------------------------------------------------------------------

Bringing the Linux Documentation Project to Slovenia

Primož Peterlin, Roman Maurer, Marko Samastur and Borut Mrak

(presented on the international conference Where East Meets West:
Technical Communication and Usability, October 29-31, 1998, HERMES
SoftLab, Ljubljana, Slovenia)


Abstract

Linux is presented in the broader context of free software. The idea
of free software is described, and the need for free documentation
matching free software is explained. Linux Documentation Project is an
example of free software, aiming to provide complete documentation for
the Linux operating system.  Among hundreds of volunteers there is
also a small group from Slovenia involved in the project. In this
paper we present our work on making documentation more accessible to
users in our community.

Introduction: Linux and Free Software 

I believe we should start at the very beginning: the title of our
talk, as it itself might already require a short explanation. What is
Linux? Linux is a multi-tasking, multi-user, multi-platform
POSIX-compliant (i.e. Unix-like) operating system. But since we are
not talking about its technical merits right now, we can focus on
another of its aspects: it's a free operating system. As the idea of
free software usually seems unusual at best to people in general, and
in particular to people working in the software industry, we feel the
need to spend a few words on the free software idea itself.

What is Free Software?

What people usually notice first is that using free software doesn't
require paying for its use. But that's not the end of the
story. Microsoft Corp. also doesn't charge (not directly, anyway) for
using the Internet Explorer -- the full version is available on their
Web site, and everybody can download and use it. Still, Microsoft
Internet Explorer is not free software. Because what I get is only the
compiled bit-image of the program, not the source code. Thus,
regardless of whether I am capable of doing it or not, I am in advance
deprived of the possibility to correct any bugs I think I have spotted
in the program, to add some new gadget, or to modify it to run on some
other platform, and to share such modified version with my
friends. Even more, I am not even allowed to share the unmodified
bit-image, since the end-user license only allows me to download it
and use it on my computer.  So we have come to the first conclusion:
freedom doesn't equal zero price. Programs we don't have to pay for
aren't necessarily free, and vice versa: if we buy free software on a
CD-ROM, we usually have to pay a small fee for it.

Is then free software thus the software which comes along with its
source code? Again, the answer is not that simple. Availability of
source code is indeed required before we can treat some software
as free software. It is an important enough requirement that many
people prefer to use the term "Open-Source Software" instead of
free software, thus emphasizing this requirement, and also making
the idea of free software more palatable for the entrepreneurial
world. Still, the sole availability of source code is not enough.
An example familiar to most students of science and technology in
this country is the Numerical Recipes library. The routines are
available in source code (and actually also with a clearly written
explanation of their operation), yet on the other hand they also
come with a very restrictive license. This license explicitly
prohibits me to redistribute the source code of any routine from
the library, even as part of my own program.

What rights does therefore free software include? In short, you
are allowed to use, copy, redistribute, understand, modify and
improve the program. In more detail, we can divide these rights
into three levels:

1)  The right to read the source code, understand its operation,
    and adapt it to suit your own needs.
2)  The right to copy and redistribute the program.
3)  The right to improve the program, including the right to
    redistribute your improved version of the program.

Linux is perhaps enjoying more than its fair share of publicity
among the free software, probably because the operating system
kernel, written by Linus Torvalds and collaborators, filled in the
last gap to make a complete working free operating system
possible; yet it's only the tip of the iceberg. Since 1984 the
Free Software Foundation, centered around Richard M. Stallman, has
been working on the GNU project, aiming to produce an enhanced
free plug-in replacement for the UNIX operating system. The X
Window System, a free multi-platform windowing system developed at
MIT, can also be traced a long way ago, just as the networking
software, developed at the University of California at Berkeley.
These are only a few major traits from the rich heritage, upon
which Linux was built.

With Linux, so it seems, the free software concept has passed the test
of the real world with flying colors. Linux has very successfully
combined the idea of free software with the rising tide of the
Internet in the early 1990's and is today the only non- Microsoft
operating system expanding its user base. People worried about the
non-existing support for free software may be surprised to learn that
InfoWorld awarded the Linux user community the best technical support
award for the 1997. Such unorthodox nomination certainly tells a lot
about the enthusiasm and the positive atmosphere in the Linux
community.

Free Software Needs Free Documentation

The criteria for the free documents are in its essence the same ones
as the ones for the free software. Or are they? Is it really essential
that people have a general permission to modify, say, this article? I
don't believe so. After all, it contains the authors' personal views
and opinions.

Still, free software does need to be matched by free
documentation. Imagine a situation when a bright, creative programmer
finds a way to enhance an existing free program (events like this are
happening daily, and are actually fuelling the free software). Now,
being one of a well-mannered sort, he -- or she -- would also want to
modify the manual in a manner that accurately reflects the action of
the modified program. A manual that would disallow any modifications
doesn't suit the needs of users of free software, as it would require
that the authors of even so slight change in the program behavior
write a new manual from scratch.

So, while there is no question that the license must require
preservation of the original author's copyright notice, the
distribution terms, or the list of authors, it must also allow the
modification of the technical content of the manual. There is a
general agreement on it also among the authors contributing to the
Linux documentation project.

About the Linux Documentation Project

The Linux Documentation Project (LDP) was started in 1992 with the aim
of producing good, reliable documentation for the Linux operating
system. The documents are covering installing, configuring and using
Linux, and are written in a variety of formats: plain text that can be
read anywhere, HTML documents can be viewed with a browser, man pages
that can be read either online or printed in a book, and typeset
documents intended to be printed and read in books.

The LDP is centered around its web page, hosted by SunSite at the
University of North Carolina, USA <http://sunsite.unc.edu/LDP/>,
which is in turn mirrored throughout the world.

There are four basic types of documentation produced by the LDP:

* Guides: entire books on complex topics, e.g. "Linux
  Installation and Getting Started Guide", "Linux System
  Administrator's Guide", "The Linux Network Administrators' Guide".
* HOWTOs and Mini-HOWTOs: articles of moderate length, written
  in a practical, hands-on approach, and focusing on a variety
  of specialized topics, such as using Linux with Chinese glyphs,
  configuring NIS, running Oracle database server on Linux, or the
  programming issues of SCSI interface.
* man pages: documentation for single programs, file formats,
  and library functions, written in the standard "UNIX Reference
  Manual" format.
* FAQs: lists of frequently asked questions (FAQ) with answers,
  covering various topics.

In addition to those, several documents are only available online
on the LDP web page. These are:

* "Linux Gazette", a monthly on-line newspaper for the Linux
  community, bringing unedited articles and letters from Linux users
  world-wide.
* "The Linux Kernel Hackers' Guide", an interactive, edited
  forum where Linux kernel developers talk about kernel development
  issues.
* Special HOWTOs. A few HOWTO documents rely on features that
  make them unable to be processed in a way required for HOWTO
  documents.

In total, the yield of LDP so far is: 7 guides in various degrees
of completeness from fragmentary to already published, 111 HOWTOs
and 131 mini HOWTOs, a handful of FAQs and 34 issues of Linux
Gazette.

language  ISO639 mini-    HOWTOs  guides
           code  HOWTOs             

German     de      8         25     1
Greek      el     18                 
English    en    104        102     7
Spanish    es     23         37     4
French     fr     80         80     1
Croatian   hr      4          7      
Indonesian id      2         10      
Italian    it     13         31     2
Japanese   ja     58         61      
Korean     ko     35         28      
Polish     pl     28         39      
Russian    ru                       1
Slovenian  sl                10      
Swedish    sv      3         19      
Turkish    tr     23                 
Chinese    zh     31         29      

Table 1: An overview of the translated documentation in the Linux
Documentation Project.

The documents produced by the LDP are primarily written in English
(an exception are the "national" HOWTOs, dealing with localizing
Linux for a particular locale, which are usually written in the
language understood by its target population). In order to make
Linux documentation available also to people whose native language
is not English, there are, however, several translation projects
running in parallel to the main LDP. So far, the documents of LDP
have been translated to 15 other languages: Chinese, Croatian,
French, German, Greek, Indonesian, Italian, Japanese, Korean,
Polish, Russian, Slovenian, Spanish, Swedish and Turkish.

The already mentioned "national" HOWTOs currently exist for 10
languages and locales: Danish, Esperanto, Finnish, French, German,
Italian, Polish, Portuguese, Slovenian, and Swedish. In addition,
Linux manual pages have so far been translated into Czech, German,
Italian, Japanese and Spanish.

Technical Issues of the LDP

Starting as a voluntary project, the LDP was in the beginning
gladly accepting any contribution, regardless of the format its
authors chose. While the guides' authors usually chose LaTeX, the
authors of shorter texts shown more imagination, their preferred
choices ranging from plain text, HTML (HyperText Markup Language)
to LaTeX. If you add the already existing documentation written in
troff macro packages and project GNU's own hypertext format,
texinfo, one can imagine that the situation was dangerously close
to becoming a complete chaos. So it was relatively early on when
it was agreed that a uniform format is needed for the HOWTO
documents, which comprise the bulk of the LDP. The solution sought
had to provide possibility to produce from a single source various
formats, both those meant for online reading, like HTML and GNU
info, and those intended to be printed, like LaTeX.

Considering the requirements, SGML (Standard Generalized Markup
Language; ISO 8879:1986) was chosen as the standard template
format for the documents. Relying on an established standard, it
offers maximal independence of the written material on the tools
used in its preparation. SGML is not a document format itself. It
is a meta-language that allows defining customized markup
languages, known as Document Type Definitions (DTD). HTML is
without doubt the best known DTD today. HTML was actually
conceived at CERN with a similar idea in mind, providing the
writer with semantic mark-up tags and offering an opportunity to
organize a distributed technical documentation system. The format
unfortunately strayed away from this goal during the years.
Instead of strengthening its rather frail hypertext structure and
relatively weak semantic markup, the vendors pushed it more and
more towards visual markup, and bloated it with various gadgets
that have little use in technical documentation, but which do
require exceedingly complex browsers. This all made HTML
unsuitable choice.

The DTD sought had to be a better match for the needs of technical
documentation. It had to be simple enough -- about as simple as
HTML 2.0 -- in order not to turn the aspiring authors away from
learning it, and the whole package had to be reasonably easy to
implement. So Matt Welsh, the author of the first version of the
Linuxdoc-SGML package, based it on the QWERTZ DTD by Tom Gordon
and James Clark sgmls parser (which was in turn based on Charles
Goldfarb's arcsgml parser) and. Other people worked on the back-
ends: Magnus Alvestad and Helmut Geyer provided the HTML support,
and Christian Schwarz added the texinfo interface. Further
development of the Linuxdoc-SGML package was conducted by Greg
Hankins and later Cees de Groot, who succeeded him as a
maintainer. In November 1996 the package was renamed to SGMLtools,
in order to emphasize that it is actually a general system for
writing technical documentation and not something specific to
Linux.

The current production version of SGMLtools (1.0.9) expects its
input written conformant to Linuxdoc DTD, descending from QUERTZ
DTD. It allows one to produce LaTeX, PostScript, PDF (Portable
Document Format), HTML, RTF (Rich Text Format), GNU info, LyX (GUI
editor for writing documents in LaTeX and Linuxdoc SGML) and plain
text (via groff) from a single source. On October 18, 1998,
however, SGMLtools 2.0 was released, supporting a much richer
DocBook DTD (v3.0), developed by the Davenport Group, a consortium
of specialists in technical documentation. DocBook, now being
maintained by the Organization for the Advancement of Structured
Information Standards (OASIS) is supported by most SGML vendors.
In the one-year transition period, all the documentation will have
to be converted into the new format. The new SGMLtools package of
course contain tools to facilitate the transition. Still, since
the transition requires replacing old-style visual tags with
higher-level semantic ones, the first, automatic pass will have to
be followed by a second, incremental pass, when the authors will
add the semantic features that DocBook supports.

The SGMLtools 2.0 package is employing James Clark's SP SGML
parser, a descendant of sgmls parser by the same author and JADE
(James' DSSSL Engine), a free implementation of DSSSL (Document
Style Semantics and Specification Language; ISO/IEC 10179:1996),
again written by James Clark. On the presentation side, DocBook
DTD by the Davenport Group is matched by a set of modular DocBook
style-sheets by Norm Walsh. The new solution offers two
improvements over the old one. First, a vastly richer DTD not only
makes it possible to write also longer documents such as LDP
guides and thus allows to unify the Linux documentation pool, but
also means that LDP has entered the SGML mainstream. A valid
DocBook document can be processed on virtually any SGML text
processing system. And second, DSSSL, being a high-level Scheme-
like language, offers much easier and much more powerful way to
control the look of SGML documents than it was possible with
simple mapping files. The increased complexity of DTD however also
has its drawbacks. One of them is that it has become increasingly
difficult to write documents without a specialized editor which
would enable the writer to edit mark-up at the logical, or
semantic, level. While the psgml mode in the Emacs editor does
offer significant help for composing SGML text, it still does not
match the capabilities offered by commercial product like
ArborText Adept. The team developing SGMLtools is thus discussing
the plans for a GUI SGML editor.

LDP and Slovenia

Our first contribution to the LDP was Slovenian-HOWTO (19 pages,
3887 words, 30778 characters), written in 1996, which covers
localization of the Linux system and adaptation for use with
Slovenian language. However, it was only in the first half of
1998, when this first solitary attempt was followed by forming a
Slovenian translating team. The members of the team cooperate by
discussing the terminology, cross-proofreading and editing each
other's work, all of which increases the quality of translation.

To date, the team has produced a dozen of documents. Prevailing in
number are the translations of HOWTO documents:

* Access-HOWTO. A guide to adaptive technologies which can make
  Linux accessible to people with some disability: the blind, the
  partially sighted, deaf and the physically disabled. (28 pages,
  9229 words, 69329 characters)
* Installation-HOWTO. Instructions for the newcomers on how to
  obtain and install Linux software. (19 pages, 8482 words, 61559
  characters)
* Kernel-HOWTO. A detailed manual on configuring, re-compiling
  and upgrading the operating system kernel on Intel-based Linux
  systems. (22 pages, 8555 words, 61669 characters)
* PPP-HOWTO. Instructions on how to connect one's Linux PC to a
  PPP (Point-to-Point Protocol) server, how to use PPP to link two
  LANs together, and how to set up a Linux computer as a PPP server.
  (59 pages, 29832 words, 153133 characters)
* Printing-HOWTO. A collection of information on how to set up
  various printers and fax machines to work with Linux. (20 pages,
  7405 words, 54899 characters)
* Printing-Usage-HOWTO. Instructions on how to use the printer
  spooling system on Linux. (11 pages, 3265 words, 23857 characters)
* TeTeX-HOWTO. A guide to installing, setting up and using the
  most popular distribution of TeX typesetting system on Unix
  workstations. (29 pages, 11374 words, 89032 characters)
* XFree86-HOWTO. Instructions on how to install and configure
  XFree86, a free implementation of X Window System for Intel
  platform. (11 pages, 3619 words, 27246 pages)

Along with these, the Linux INFO-SHEET, a short promotional
document providing basic information on Linux, including a brief
introduction, a list of features, hardware requirements for
running it, and a list of relevant resources, was also translated.

In a hope to reduce some of the ever-recurring questions on local
Linux forums, two lists of frequently asked questions (FAQ) were
also translated:

* Linux FAQ, an edited collection of frequently asked questions
  with answers from the international mailing lists. (47 pages,
  16318 words, 128407 characters)
* UNIX FAQ, a similar document devoted to questions relating to
  Unix systems in general, as Linux is for the most people nowadays
  the first contact with Unix. (70 pages, 24144 words, 174225
  characters)

Our work has so far received very good response from the user
side. This is important, as good volunteer work is the best way to
attract new volunteers to join the project.

While the translation is still carried out manually, our
translation team is considering possibilities of employing modern
methods and aids offered by the digital technology.

Towards a Computer-Assisted Translation

The growing collection of translated material presents a wealth of
accumulated knowledge that can be utilized to ease further
translation. What we have in mind is computer-assisted translation
(CAT) using a translation memory system. Translation memory
systems build a knowledge base on a set of translated parallel
units of text, which can be on demand readily available to the
translator. Usefulness of such approach increases as the knowledge
base increases, which also amounts to the fact that only increased
availability of high-power computers and high-volume storage
devices made the method popular during the last decade. Some of
the known commercial tools in the CAT market are the IBM
Translation Manager, EUROLANG Optimizer, TRADOS Fine Translation
Tools, STAR Transit, ATRIL Déja Vu and ZERESTRANS Translation
Memory Technology.

The material accumulated so far in the translation of the LDP is
particularly suited for the method. First, it is technical
writing, with a relatively limited vocabulary and numerous
recurring terms, patterns and syntagms. Second, the text is
already marked-up using SGML tags, which facilitates segmentation
into parallel-running units of texts (usually paragraphs).
Research projects like MULTEXT-East have already dealt with this
topic. Third, English as the starting language in the translation
process simplifies the situation, since one does not have to deal
with numerous word forms due to declinations and conjugations.

We therefore consider it feasible to implement in about one year
time a translation memory system operating on the following
principles:

* Automatic segmentation into parallel units of text of the
  source and   target document. The system can be easily extended to
  a more than two-way parallel segmentation, should the need arise.
  For our purposes, including translations into similar languages of
  Slavonic language group would probably be most useful.
  Segmentation might need some manual intervention at the beginning
  despite the mark-up, but should improve with the growing of the
  translation base if statistically weighted matching of the already
  segmented text is employed.
* Full-text indexing of the parallel units of text. After the
  stop-words containing no semantic meaning (i.e. "the", "is",
  "are", "and", "in"...) are subtracted from the text the rest of
  the text is indexed for later faster retrieval.

When using it, a translator could request all instances of some
term, phrase or, using some fuzzy matching, even a whole sentence,
and, if the term already exists in the base, get in return
contextual translations of all instances of the requested term.
Since the described translation tool is meant for the interactive
use, integration into popular editors like Emacs or LyX should
also be considered, as well as its use as a Web tool.

Conclusion

With close to estimated 10 million users, and being the only non-
Microsoft operating system expanding its user base in the absolute
and relative terms during the recent years, Linux is a phenomenon
that cannot go unnoticed. The principles its booming development
is founded upon are in many ways challenging the traditional
perception of software development.

Documentation has always been an inherently weak point of all
volunteer projects. As good documentation is crucial for the success
of a project, Linux Documentation Project was conceived to remedy the
situation and provide the complete documentation for the Linux
operating system. The technical solutions chosen put an emphasis on
the independence of written material on the tools used in its
preparation, the preference of international open standards to
internal "industry" standards, and the possibility to produce
documentation in a variety of formats from a single source. Hence, the
solution employ SGML and DSSSL.

Slovenia is the smallest language group in an odd dozen nations
participating in the project. We consider our participation to be
important both for bringing Linux closer to a user in our local
community, as well as exercising our language's ability to answer the
challenges posed by the increased informational dynamics of the
post-industrial era.

References
DocBook Documentation, http://www.oreilly.com/davenport/

T. Erjavec, N. Ide, D. Tufis (1997): Encoding and Parallel
Alignment of Linguistic Corpora in Six Central and Eastern
European Languages. Presented at the Joint International
Conference of the ACH-ALLC '97, June 1997.

Greg Hankins and Michael K. Johnson (1997): Introduction to the
Linux Documentation Project; in: R. Kiesling (Ed.), Linux, The
Complete Reference, Linux Systems Labs, 1998.

Robert Kiesling, Ed. (1998): Linux, The Complete Reference, 6th
Edition, Linux Systems Labs. ISBN 1-57176-199-3

Linux User Community win the 1997 Product of the Year Award for
Best Technical Support, InfoWorld.
http://www.infoworld.com/cgi-bin/displayTC.pl?/97poy.supp.htm

Organization for the Advancement of Structured Information
Standards,
http://www.oasis-open.org/

Prevodi HOWTO-jev, http://www.lugos.si/delo/slo/HOWTO-sl/

Eric S. Raymond (1997): The Cathedral and the Bazaar,
http://www.earthspace.net/~esr/writings/cathedral-bazaar/

SGMLtools Homepage, http://www.sgmltools.org/

Richard Stallman (1994): Why Software Should Not Have Owners,
http://www.fsf.org/philosophy/why-free.html

Richard Stallman (1997): Free Software and Free Manuals,
http://www.fsf.org/philosophy/free-doc.html

The Linux Documentation Project Homepage,
http://sunsite.unc.edu/LDP/

Špela Vintar, Programi s pomnilnikom prevodov s stališča
morebitnega uporabnika, Jezikovne tehnologije za slovenski jezik /
Mednarodna multi-konferenca Informacijska družba - IS'98,
Ljubljana, Slovenija, 8. oktober 1998 / International Multi-
conference Information Society - IS'98, Ljubljana, Slovenia,
October 1998. - Ljubljana : Institut Jožef Stefan, 1998. . - str.
87-91.




Dodatne informacije o seznamu Starilist