Scholarly Communication Technology Catalogue
Log in

Function: Format Conversion


Technologies providing this function
Name Adoption Readiness Status Governance Business Form
Electric Book Manager Electric Book is a Jekyll-based tool for producing print PDF, digital PDF, EPUB, website, and app versions of books from a single markdown, YAML, and HTML-based content source. It was developed by consultancy and service provider Electric Book Works.
Significant TR9 Actively Maintained Not Classified Commercial Vendor
Grobid GROBID (or Grobid) stands for GeneRation Of BIbliographic Data. It is a machine-learning library for extracting, parsing, and re-structuring journal articles in PDF format into structured TEI-encoded documents that can then be transformed to JATS XML.
Not Classified TR9 Actively Maintained Community (ad-hoc) Volunteer Community
le-tex Transpect Transpect is an XProc- and XSLT-based framework and suite of modules for managing, schema checking, and converting from/to XML-based formats such as .docx, IDML, EPUB, HTML, DocBook, TEI and JATS. le-tex Transpect also provides a framework for combining modules into publishing workflows with revision control and custom, cascade-based configuration. le-tex Transpect can run standalone or integrated into publishing workflows. A simple upload interface and an HTTP API is available, as is hosted operation and maintenance agreements for professional use.
Not Classified TR9 Actively Maintained Not Classified Commercial Vendor
Lodel Lodel is the journal publishing software for the French OpenEdition publishing platform. It provides content management and import/conversion to bring word processor documents into an XML-based article production environment.
Significant TR9 Actively Maintained Not Classified Fiscal Sponsorship (Academic Institution)
Manifold Scholarship Manifold is a collaborative, web-based scholarly publishing system designed by the University of Minnesota Press and the CUNY Graduate Center. Manifold provides a dynamic approach to publishing book-length works capable of gathering commentary, annotation, and revisions within the publication. Built to publish long-form digital monographs, Manifold is also used in service of open educational resources, journals, and collaborative scholarly projects. Tt is currently used by twenty-eight publishers, including the University of Minnesota Press, the City University of New York, and the University of Arizona Press, as well as digital humanities centers and teaching and learning centers.
Significant TR9 Actively Maintained Not Classified Fiscal Sponsorship (Academic Institution)
Open Typesetting Stack Open Typesetting Stack (OTS) is an article conversion/ingest service developed by the Public Knowledge Project to convert word-processor and PDF versions of articles into JATS XML for publication. OTS integrates a host of other parsing and conversion tools (including the machine-learning tool Grobid) and external services to provide the most accurate possible XML without additional user input. This service "and its OJS plugin integration" is intended to decrease the labour involved in production, and to facilitate the creation of archive-friendly and web-native article formats. OTS is in maintenance mode as of this writing.
Limited TR9 Unsupported Community (formal) Fiscal Sponsorship (Academic Institution)
Pandoc Pandoc is a robust, multi-format document conversion tool that can read from and write to a vast number of file formats. Pandoc can work with a range of markup formats, markdown, word-processor files, and it supports integration with tools like LaTeX and reference managers, as well as a host of web-based formats. Several different input and exports formats for math are handled, including MathJax, LaTeX, and translation to MathML. Pandoc also includes a powerful system for automatic citations and bibliographies. Pandoc is usable as a command-line tool as well as an integrated library, and is used in several other publishing toolkits.
Ubiquitous TR9 Actively Maintained Not Classified Volunteer Community
Pressbooks Pressbooks is a web-based book editing and production system that exports in multiple formats: ebooks, webbooks, print-ready PDF, and various XML types. The system is built on top of Wordpress, but makes significant changes to the admin interface, presentation layer, and export routines to for web, ebook, and print formats. Pressbooks is widely used in the open textbook and open educational resouces community.
Significant TR9 Actively Maintained Not Classified Commercial Vendor
Readium Readium provides a "set of software building blocks" for the development of standardized EPUB and web publication reader applications for a variety of contexts—browser-based, mobile app, and desktop. Readium is a set of libraries and frameworks, and also a foundation and international community dedicated to ebook implementation standards.
Limited TR9 Actively Maintained Community (formal) Fiscal Sponsorship (Non-profit Organisation)
Stencila Stencila is an authoring and editorial development software developed by Code for Science & Society. It provides an integrated word processor, coding (R, Python, and SQL), and spreadsheet interface in the browser, and the resulting interactive document (using the same file format used by the Texture editor, with which Stencila shares code) is shareable and publishable. Stencila's "Converters" module is a Pandoc-based collection of import and export routines. eLife's "Reproducable Document Stack" initiative is based on Stencila.
Limited TR9 Actively Maintained Not Classified Commercial Vendor
XSweet XSweet is a free, open source conversion tool for converting Microsoft Word documents (.docx) into HTML and beyond. Built as a series of XSL (eXtensible Stylesheet Language) transformation steps, it’s designed to be modular and flexible. Use it out of the box, or modify and extend it to meet your needs. XSweet is being developed by Cabbage Tree Labs.
Limited TR9 Actively Maintained Community (ad-hoc) Fiscal Sponsorship (Non-profit Organisation)