How to fix table contents turned to “0” in LibreOffice
Posted by Jim DeLaHunt on 30 Apr 2021 at 11:42 pm | Tagged as: robobait, software engineering, technical support, web technology
Well, that was a fright. I was editing a large report with the LibreOffice word processor. I had a table of results. It consisted of a placeholder table — a header and a couple of rows — with Table Styles applied to get the rows formatted right. I pasted in dozens of rows of content from a spreadsheet. I saved the file for the night. The next morning, I opened the document, and to my horror, found that text values in many of the columns had been turned to “0”. This is LibreOffice bug 131025. And here is how I recovered from the error, and got my table contents back. I hopes it helps others who encounter this bug.
The method I used to recover from this bug is unfortunately rather technical. It involves using XSLT, and cracking open the compressed multi-part Open Document format. That is great for me, as a software engineer who is curious about such things. But it isn’t for everyone. For what it’s worth, here it is.
The important thing I learned about this bug is that it corrupts the document in two stages. Each stage is ratcheted in when you save the document. When you paste in the table content from the spreadsheet, LibreOffice formats the cells wrongly. The cells have a text value, but LibreOffice marks them as having a numerical format, and an alternate numerical value of 0. However, this is not visible yet. When you save the document, the incorrect formats and alternate values are committed, but the original text cell values remain. When you open the document next, LibreOffice incorrectly displays the alternate numerical values instead of the original text values. But when you save the document next, LibreOffice discards the original text values. All you are left with is the numerical formats and the numerical values. You can no longer recover the original text values.
If you encounter this bug, the first step is: find a copy of the document as it was saved for the first time after the bug happened, before you opened the document and saw the evidence of the bug. To put it another way, if you open your document and see those “0” cells in your tables, do not save the document under the same name. If you do, you will overwrite the previous version of the document, and the bug will lose your previous text values. Instead, close without saving (discarding changes), or save the document as a different filename (keeping changes outside the tables but losing table contents).
If you have already saved for a second time, see if you kept an earlier revision of the document file. I typically save a revision of my document every day and after every major edit, using filenames with a date suffix. Or see if your backup system kept a copy of the document file after its first save. (You do have an automated backup system which runs every hour or so, don’t you?)
Use the system’s file manager, not LibreOffice, to preserved a backup copy of the corrupted document as of its first save. Then use the system’s file manager to make a second, working copy on which to perform the repairs. If the repairs go wrong, toss that copy out and make another copy from the preserved copy.
Internally in the OpenDocument structure, the essence of the bug is that it stores table cells as something like:
… <table:table-cell table:style-name="Table_5f_7.1.B2" office:value-type="string"><text:p text:style-name="P71">Correct cell contents</text:p></table:table-cell> …
(where “Correct cell contents” is the content I want to have). Instead has the wrong office:value-type
of "float"
, and has an unwanted office:value
of "0"
. It is something like:
… <table:table-cell table:style-name="Table_5f_7.1.B2" office:value-type="float" office:value="0"><text:p text:style-name="P71">Correct cell contents</text:p></table:table-cell> …
At the second save, LibreOffice discards the contents of <table:table-cell>…</table:table-cell>, including the content I want to have. What remains is something like:
… <table:table-cell table:style-name="Table_5f_7.1.B2" office:value-type="float" office:value="0"></table:table-cell> …
So, the method I used to fix this is:
- Open up the working copy of the OpenDocument file, following the instructions in my earlier blog, How to crack open LibreOffice .ODT documents for fun and bug fixing. The result is a directory containing XML and other files.
- Copy the XML file content.xml out of the directory to a place where you can work on it. Name it content_corrupted.xml, or something similar.
- Install the tool xsltproc or similar. See my earlier blog, How to use XSLT to modify XML files inside .ODT documents, for an explanation of this tool, and how to use it with OpenDocument files.
- Apply the XSLT stylesheet below to content_corrupted.xml, as shown by the sample command below. It creates content_repaired.xml .
- Move the file content_repaired.xml back into the OpenDocument directory, naming it content.xml .
- Turn the directory of XML and other files back into an OpenDocument file, following the further instructions in How to crack open LibreOffice .ODT documents for fun and bug fixing.
- Open the repaired OpenDocument file and verify that your table contents are restored.
This is the XSLT stylesheet which performs the repair. Store it in a file named repair.xslt
. (My earlier blog, How to change and delete attributes in an XML file using XSLT, explains aspects of its workings.)
<?xml version="1.0" encoding="UTF-8"?>
<!-- Restore_odt_table_cell_text.xslt
by Jim DeLaHunt (jdlh.com), 2020-12-16. Donated to the public domain (cc0).
This XSLT fixes a LibreOffice bug where Writer table cells containing
text turn into digit 0 when the document is re-opened.
This is Bug 131025 - Writer document with tables lost data in cells (apparently) replacing with 0
<https://bugs.documentfoundation.org/show_bug.cgi?id=131025>
It does this by fixing table cells like
<table:table-cell table:style-name="…" office:value-type="float"
office:value="0">
<text:p text:style-name="P71">desired text</text:p>
</table:table-cell>
by changing to office:value-type="string" and deleting office:value="0".
-->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0">
<xsl:output method="xml" indent="no" encoding="UTF-8" />
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<!-- Create a replacement value for .../table-cell[@value-type] -->
<xsl:template match="/office:document-content/office:body/office:text/table:table/table:table-row/table:table-cell[@office:value-type='float'][@office:value='0'][not(text()='')]/@office:value-type" >
<xsl:copy>
<xsl:attribute name="office:value-type">string</xsl:attribute>
</xsl:copy>
</xsl:template>
<!-- Delete .../table-cell[@value], because this null template generates nothing -->
<xsl:template match="/office:document-content/office:body/office:text/table:table/table:table-row/table:table-cell[@office:value-type='float'][@office:value='0'][not(text()='')]/@office:value" />
</xsl:stylesheet>
Invoke this stylesheet with the following command:
xsltproc -o content_repaired.xml repair.xslt content_corrupted.xml
Whew! There is a lot to this. Sorry about that. But it is maybe better than losing your work. Hopefully, the LibreOffice developers will be able to fix this bug, then noone will have to use this fix again.
I encountered the lost data in cells bug on a Windows 10 machine. Wordpad, provided with earlier versions of Windows, is still available in Windows 10.
I opened the file in Wordpad, which displayed a message “Wordpad does not support all the features of this document’s format. Some content might be missing or displayed improperly.” It worked in my case. I then saved the file to a new name as open document text, opened the new file in LibreOffice Writer, saved it then reopened it. It seems to have recovered. In technical terms I would not like to say what is going on.
LibreOffice Version: 7.0.6.2 (x64)
Build ID: 144abb84a525d8e30c9dbbefa69cbbf2d8d4ae3b
CPU threads: 12; OS: Windows 10.0 Build 19042; UI render: Skia/Vulkan; VCL: win
Locale: en-GB (en_GB); UI: en-US
Calc: CL
For systems other than Windows, there may be an alternative to Wordpad that can provide recovery. Otherwise the file could be transferred to a Windows 10 machine and hopefully recovered there.