Difference between revisions of "GnuCash XML format"

From GnuCash
Jump to: navigation, search
m (+parse precondition)
m (Syntaxhighlight)
 
(20 intermediate revisions by 10 users not shown)
Line 1: Line 1:
'''This article is just descriptive, and neither normative nor authoritative.'''
+
This article collects some notes about the XML file format of GnuCash. It is descriptive, and neither normative nor authoritative.  
  
Beginning with version 1.6, the primary GnuCash storage mechanism is an [[Wikipedia:XML|XML]] file. The file is optionally compressed with [[Wikipedia:gzip|gzip]] (“<u>E</u>dit” menu → “Preferences” → “General” → “Use file compression”).  The schema of the XML document is not presently defined declaratively (e.g. [[Document Type Definition]] or [[XML Schema]]).
+
Beginning with version 1.6, the primary GnuCash storage mechanism is an XML file. The file is optionally compressed with gzip, which is a preference that is set at <u>E</u>dit→Preferences→General→Use file compression.
  
As of version 1.8.10, XML files created by GnuCash are missing [http://www.w3.org/TR/REC-xml-names/#ns-decl XML namespace declarations] that are required by some XML processing software (see also [[FAQ#Q: How can I export data?]]). See [http://www.gnucash.org/docs/v1.8/C/gnucash-guide/appendixa_xmlconvert1.html GnuCash Tutorial and Concepts Guide, Appendix A, part 5: Converting XML GnuCash File] for the missing declarations.
+
There is a non-normative RELAX NG schema for the XML file format ([https://github.com/Gnucash/gnucash/blob/maint/libgnucash/doc/xml/gnucash-v2.rnc gnucash-v2.rnc]). There are also DTD schema definitions, but these are outdated and do not define the current format correctly ([https://github.com/Gnucash/gnucash/tree/maint/libgnucash/doc/xml libgnucash/doc/xml]).
 +
 
 +
Many elements in the XML file are identified by Globally Unique Identifiers (GUID). GnuCash includes its own GUID implementation.
  
 
==Character encoding==
 
==Character encoding==
GnuCash interprets XML documents using a character encoding determined by operating-system–level locale settings, and so does not include an [http://www.w3.org/TR/REC-xml/#NT-EncodingDecl encoding declaration] in the opening [http://www.w3.org/TR/REC-xml/#sec-TextDecl XML text declaration].  (The locale setting constitues a “higher-level protocol” [http://www.w3.org/TR/REC-xml/#charencoding].)  GnuCash serializes non-[[Wikipedia:ASCII|ASCII]] octets (i.e. those with the high-order bit set) as decimal numeric entity references.
 
 
For example, the UTF-8 encoding of the Cyrillic capital letter “Б” is written as “<code>&amp;#208;&amp;#145;</code>”.  As the following Python script shows, the UTF-8 text should be transcoded to recover the original Unicode text.  (This script uses the [http://4suite.org/ 4Suite] XML library.)
 
 
<pre>/usr/bin/python2.4                                                                                                                             
 
  
from Ft.Xml.Domlette import NonvalidatingReader
+
With GnuCash 1.9.0, GnuCash writes the XML document using UTF-8 encoding and includes the appropriate encoding declaration in the opening XML text declaration.
from Ft.Xml.XPath import Evaluate
 
from Ft.Xml.XPath.Context import Context
 
  
# precondition: foo.xac was created by GnuCash with LANG=en_US.UTF-8
+
==Validation==
doc = NonvalidatingReader.parseUri('file:///tmp/foo.xac')
 
context = Context(doc, processorNss={'cd'    : "http://www.gnucash.org/XML/cd",
 
                                    'book'  : "http://www.gnucash.org/XML/book",
 
                                    'gnc'  : "http://www.gnucash.org/XML/gnc",
 
                                    'cmdty' : "http://www.gnucash.org/XML/cmdty",
 
                                    'trn'  : "http://www.gnucash.org/XML/trn",
 
                                    'split' : "http://www.gnucash.org/XML/split",
 
                                    'act'  : "http://www.gnucash.org/XML/act",
 
                                    'price' : "http://www.gnucash.org/XML/price",
 
                                    'ts'    : "http://www.gnucash.org/XML/ts",
 
                                    'slot'  : "http://www.gnucash.org/XML/kvpslot",
 
                                    'cust'  : "http://www.gnucash.org/XML/cust",
 
                                    'addr'  : "http://www.gnucash.org/XML/custaddr"})
 
  
accountName = Evaluate('/gnc-v2/gnc:book/gnc:account[act:id="0d69c3557f4d9340198bfd151f9e13cb"]/act:name/text()', context=context)[0]
+
The RELAX NG schema file mentioned above can be used to validate an uncompressed GnuCash XML data file. This requires that you:
 +
* save your GnuCash data file in uncompressed format
 +
* use an XML validator--e.g., [https://github.com/relaxng/jing-trang Jing], which will be used in this example.
  
# object of type "unicode":                                                                                                                       
+
As stated above, the GnuCash data file is by default stored using gzip compression. You must first save your data file in an uncompressed state. The easiest way to do this is to change the storage preference and save your file. (Remember to reset the preference afterwards).
name_unicode = accountName.data.encode('latin1').decode('utf-8')
 
  
# objects of type "str":                                                                                                                           
+
Then download jing and run the following command
name_koi8r = name_unicode.encode('koi8-r')
+
<Syntaxhighlight lang="sh">
name_utf8  = name_unicode.encode('utf-8')
+
jing -c path-to-gnucash-v2.rnc path-to-your-datafile.gnucash
name_utf16 = name_unicode.encode('utf-16')
+
</Syntaxhighlight>
 +
jing will report any validation errors it finds.
  
assert name_utf8 == accountName.data.encode('latin1')</pre>
+
;Note:The validation should not be considered authoritative, as the schema is not updated or tested very often. So validation errors can just as easily be due to errors in the schema than due to errors in the data file.
  
==See also==
+
''Based on information provided by Baptiste Carvello in [https://bugzilla.gnome.org/show_bug.cgi?id=680887 bug 680887].''
* [[List of external software interfaces]]
 
  
 
==External links==
 
==External links==
* http://qof.sourceforge.net/ - QOF is the object persistence layer used by GnuCash
 
 
* http://gnucashtoqif.sourceforge.net/ - GnuCash XML &rarr; [[Wikipedia:QIF|QIF]] conversion tool
 
* http://gnucashtoqif.sourceforge.net/ - GnuCash XML &rarr; [[Wikipedia:QIF|QIF]] conversion tool
 
** [http://gnucashtoqif.sourceforge.net/#mozTocId164261 notes about file format]
 
** [http://gnucashtoqif.sourceforge.net/#mozTocId164261 notes about file format]
* [http://edseek.com/archives/2005/08/18/gnucash-export-to-gnumeric-and-csv/ GnuCash export to Gnumeric and CSV], using [[Wikipedia:XSL Transformations|XSLT]]
+
* [http://web.archive.org/web/20070219085556/http://edseek.com/archives/2005/08/18/gnucash-export-to-gnumeric-and-csv/ GnuCash export to Gnumeric and CSV], using [[Wikipedia:XSL Transformations|XSLT]] -- NOTE: With GnuCash 3.2, users can export to CSV directly from the program.
* Relevant mailing list threads
 
** [http://lists.gnucash.org/pipermail/gnucash-devel/2002-March/thread.html#5750], March 2002
 
* [http://bugzilla.gnome.org/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=&product=GnuCash&component=XML+Backend&long_desc_type=allwordssubstr&long_desc=&status_whiteboard_type=allwordssubstr&status_whiteboard=&keywords_type=allwords&keywords=&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=NEEDINFO&bug_status=VERIFIED&emailtype1=substring&email1=&emailtype2=substring&email2=&bugidtype=include&bug_id=&chfieldfrom=&chfieldto=Now&chfieldvalue=&cmdtype=doit&order=Reuse+same+sort+as+last+time&field0-0-0=noop&type0-0-0=noop&value0-0-0=], non-closed, non-resolved GnuCash bug reports pertaining to the XML backend
 

Latest revision as of 21:17, 17 August 2018

This article collects some notes about the XML file format of GnuCash. It is descriptive, and neither normative nor authoritative.

Beginning with version 1.6, the primary GnuCash storage mechanism is an XML file. The file is optionally compressed with gzip, which is a preference that is set at Edit→Preferences→General→Use file compression.

There is a non-normative RELAX NG schema for the XML file format (gnucash-v2.rnc). There are also DTD schema definitions, but these are outdated and do not define the current format correctly (libgnucash/doc/xml).

Many elements in the XML file are identified by Globally Unique Identifiers (GUID). GnuCash includes its own GUID implementation.

Character encoding

With GnuCash 1.9.0, GnuCash writes the XML document using UTF-8 encoding and includes the appropriate encoding declaration in the opening XML text declaration.

Validation

The RELAX NG schema file mentioned above can be used to validate an uncompressed GnuCash XML data file. This requires that you:

  • save your GnuCash data file in uncompressed format
  • use an XML validator--e.g., Jing, which will be used in this example.

As stated above, the GnuCash data file is by default stored using gzip compression. You must first save your data file in an uncompressed state. The easiest way to do this is to change the storage preference and save your file. (Remember to reset the preference afterwards).

Then download jing and run the following command

jing -c path-to-gnucash-v2.rnc path-to-your-datafile.gnucash

jing will report any validation errors it finds.

Note
The validation should not be considered authoritative, as the schema is not updated or tested very often. So validation errors can just as easily be due to errors in the schema than due to errors in the data file.

Based on information provided by Baptiste Carvello in bug 680887.

External links