Conversion of legacy documents and community publishing

Language Science Press uses a Latex-based workflow. Authors can use our Word/OpenOffice templates as a start, but there are many manuscripts out there which predate the publication of our templates. In this blogpost, I will detail our principles of community-based publishing for one of these manuscripts.

Case study: A grammar of Mauwake

The Mauwake language is spoken in Papua New Guinea, along the North coast of  Madang province. Liisa Berghäll has worked there for over 25 years, and the  manuscript of her grammar was finalised around 2010. It was available from the University of Helsinki e-thesis service.

Re-publication of this work with Language Science Press as Open Access allows for a much broader readership, but of course the manuscript has to follow our guidelines. In order to arrive there, the following steps had to be undertaken

  1. convert the manuscript to *tex
  2. make sure the linguistic content is correct
  3. incorporate suggested changes
  4. proofreading
  5. incorporate proofreaders’ comments
  6. final typesetting

Conversion to *tex

We used our online converter to convert the original document to tex. Since the word document was very nicely structured, using a lot of detailed styles, the conversion was rather easy. A lot of styles like \textstyleEmphasizedVernacularWords{some text} showed up. These could be mapped to LaTeX styles in a straightforward way. One issue was that the layout of some examples was broken, and that cross-references were broken. We appealed to our typesetting community for this, and 4 typesetters volunteered to help in fixing the examples.


The document was split into chapters and uploaded to overleaf. Detailed instructions on what to repair and how to do it were given to the typesetters on Trello cards. When a task was finished, they ticked it off on the corresponding check list. Jessica Brown, Charles Lam, Constantin Freitag and Benedikt Singpiel helped with this,

Using Trello to keep track of the conversion progress

Using Trello to keep track of the conversion progress


In winter 2014/5 the author returned to Papua New Guinea. After her return, a full overleaf version comprising all chapters was prepared for her. There, she could use a graphical interface to redact her manuscript without having to delve too deep into Latex. Liisa said:

Liisa Berghäll, author of "A grammar of Mauwake"

Liisa Berghäll, author of “A grammar of Mauwake”

I don’t mind learning new things, but I tend  to learn computer stuff the hard way,  making all the possible, and some  impossible, mistakes along the way

But apparently the process was indeed suitable for a seasoned linguist like Liisa. Of course, not everything was straightforward, but all issues could be solved within a reasonable amount of time.


After the finalization of the content, the manuscript was sent out to our community proofreaders. The aim was to have every chapter covered twice. 10 proofreaders volunteered. Proofreaders used the guidelines available from our website and sent their comments to the coordinator, who forwarded them to the author. Revisions resulting from the proofreaders’ comments were also done in Overleaf.

Language Science Press office

At the LangSci office our main task was to coordinate the process and help with problems. One hands-on thing we did was the creation of vectorized graphics with tikz. These look much better than the original graphics, use our fonts and are searchable.

The original graph as a hand drawing in Word

The original graph as a hand drawing in Word


The same graph in LaTeX using tikz

The same graph in LaTeX using tikz


The final stages of a manuscript are always the most stressing with the manuscript going back and forth to apply final touches.

We made good use of the Overleaf git bridge. This means that the LangSci office has access to the book on their own computers, with all necessary tools at hand, but can easily synchronise it with the Overleaf web frontend which the author uses.


The final book

The final book

We published the pdf on September 9 as a free pdf and as softcover and hardcover versions. All these versions are automatically generated from the underlying Latex code. Mauwake data are also found in the Languini app, a language quiz for the iPhone. The structured nature of the data allows us to extract the linguistic examples from the source code.

This blogpost has shown how a document written before the advent of Language Science Press could be transformed into a great contribution to Studies in Diversity Linguistics, relying on the help of the community in typesetting and proofreading. I hope that it will help to illustrate our workflow and show the power of collaboration. Let’s close with Liisa’s words:

The cooperation was a thoroughly positive experience for me too. The helpfulness, patience and encouraging attitude of all the LangSci people made it easy to ask for directions and other help; and the answer usually came very quickly and the instructions were clear. Special thanks to
[Martin], Sebastian and Felix!


The only area where I saw room for improvement is that of the feedback comments. Some proofreaders were clearly very experienced, whereas others probably were just learning – or were hesitant to suggest corrections for
other reasons. The latter will need training in order to add to the group of the competent proofreaders. Anyway, it is amazing how you have been able to get together such a large group of linguists who are willing to give of their time and skills for the effort!

2 thoughts on “Conversion of legacy documents and community publishing

Leave a Reply

Your email address will not be published. Required fields are marked *

Hinweis / Hint
Das Captcha kann Kleinbuchstaben, Ziffern und die Sonderzeichzeichen »?!#%&« enthalten.
The captcha could contain lower case, numeric characters and special characters as »!#%&«.