Difference between revisions of "GnuCash XML format"

From GnuCash
Jump to: navigation, search
m (missing hash-bang)
(GnuCash 1.8.5+ able to handle "xmlns"s)
Line 1: Line 1:
 
'''This article is just descriptive, and neither normative nor authoritative.'''
 
'''This article is just descriptive, and neither normative nor authoritative.'''
  
Beginning with version 1.6, the primary GnuCash storage mechanism is an [[Wikipedia:XML|XML]] file.  The file is optionally compressed with [[Wikipedia:gzip|gzip]] (“<u>E</u>dit” menu → “Preferences” → “General” → “Use file compression”).  The schema of the XML document is not presently defined declaratively (e.g. [[Document Type Definition]] or [[XML Schema]]).
+
Beginning with version 1.6, the primary GnuCash storage mechanism is an [[Wikipedia:XML|XML]] file.  The file is optionally compressed with [[Wikipedia:gzip|gzip]] (“<u>E</u>dit” menu → “Preferences” → “General” → “Use file compression”).  The schema of the XML document is not presently defined declaratively (e.g. [[Document Type Definition]] or [[XML Schema]]). <!-- or does QOF use some kind of declarative schema? -->
  
As of version 1.8.10, XML files created by GnuCash are missing [http://www.w3.org/TR/REC-xml-names/#ns-decl XML namespace declarations] that are required by some XML processing software (see also [[FAQ#Q: How can I export data?]]).  See [http://www.gnucash.org/docs/v1.8/C/gnucash-guide/appendixa_xmlconvert1.html GnuCash Tutorial and Concepts Guide, Appendix A, part 5: Converting XML GnuCash File] for the missing declarations.
+
As of version 1.8.10, XML files created by GnuCash are missing [http://www.w3.org/TR/REC-xml-names/#ns-decl XML namespace declarations] that are required by some XML processing software (see also [[FAQ#Q: How can I export data?]]).  See [http://www.gnucash.org/docs/v1.8/C/gnucash-guide/appendixa_xmlconvert1.html GnuCash Tutorial and Concepts Guide, Appendix A, part 5: Converting XML GnuCash File] for the missing declarations.  GnuCash 1.8.5+ is able to ''read'' XML files containing these declarations [http://mail.gnome.org/archives/gnome-announce-list/2003-August/msg00070.html].
  
 
==Character encoding==
 
==Character encoding==
GnuCash interprets XML documents using a character encoding determined by operating-system–level locale settings, and so does not include an [http://www.w3.org/TR/REC-xml/#NT-EncodingDecl encoding declaration] in the opening [http://www.w3.org/TR/REC-xml/#sec-TextDecl XML text declaration].  (The locale setting constitues a “higher-level protocol” [http://www.w3.org/TR/REC-xml/#charencoding].)  GnuCash serializes non-[[Wikipedia:ASCII|ASCII]] octets (i.e. those with the high-order bit set) as decimal numeric entity references.
+
GnuCash interprets XML documents using a character encoding determined by operating-system–level locale settings, and so does not include an [http://www.w3.org/TR/REC-xml/#NT-EncodingDecl encoding declaration] in the opening [http://www.w3.org/TR/REC-xml/#sec-TextDecl XML text declaration].  (The locale setting here constitues a “higher-level protocol” in W3C vernacular [http://www.w3.org/TR/REC-xml/#charencoding].)  GnuCash serializes non-[[Wikipedia:ASCII|ASCII]] octets (i.e. those with the high-order bit set) as decimal numeric entity references.
  
 
For example, the UTF-8 encoding of the Cyrillic capital letter “Б” is written as “<code>&amp;#208;&amp;#145;</code>”.  As the following Python script shows, the UTF-8 text should be transcoded to recover the original Unicode text.  (This script uses the [http://4suite.org/ 4Suite] XML library.)
 
For example, the UTF-8 encoding of the Cyrillic capital letter “Б” is written as “<code>&amp;#208;&amp;#145;</code>”.  As the following Python script shows, the UTF-8 text should be transcoded to recover the original Unicode text.  (This script uses the [http://4suite.org/ 4Suite] XML library.)
Line 31: Line 31:
 
                                     'addr'  : "http://www.gnucash.org/XML/custaddr"})
 
                                     'addr'  : "http://www.gnucash.org/XML/custaddr"})
  
accountName = Evaluate('/gnc-v2/gnc:book/gnc:account[act:id="0d69c3557f4d9340198bfd151f9e13cb"]/act:name/text()', context=context)[0]
+
accountName = Evaluate('/gnc-v2/gnc:book/gnc:account[act:id="0d69c3557f4d9340198bfd151f9e13cb"]/act:name/text()',
 +
                      context=context)[0]
  
# object of type "unicode":                                                                                                                         
+
# object of type "str" (is actually UTF-8–encoded, not latin1!):                                                                                                                         
name_unicode = accountName.data.encode('latin1').decode('utf-8')
+
name_raw = accountName.data.encode('latin1')
 +
 
 +
# object of type "unicode":
 +
name_unicode = name_raw.decode('utf-8')
  
 
# objects of type "str":                                                                                                                             
 
# objects of type "str":                                                                                                                             

Revision as of 09:14, 24 February 2006

This article is just descriptive, and neither normative nor authoritative.

Beginning with version 1.6, the primary GnuCash storage mechanism is an XML file. The file is optionally compressed with gzip (“Edit” menu → “Preferences” → “General” → “Use file compression”). The schema of the XML document is not presently defined declaratively (e.g. Document Type Definition or XML Schema).

As of version 1.8.10, XML files created by GnuCash are missing XML namespace declarations that are required by some XML processing software (see also FAQ#Q: How can I export data?). See GnuCash Tutorial and Concepts Guide, Appendix A, part 5: Converting XML GnuCash File for the missing declarations. GnuCash 1.8.5+ is able to read XML files containing these declarations [1].

Character encoding

GnuCash interprets XML documents using a character encoding determined by operating-system–level locale settings, and so does not include an encoding declaration in the opening XML text declaration. (The locale setting here constitues a “higher-level protocol” in W3C vernacular [2].) GnuCash serializes non-ASCII octets (i.e. those with the high-order bit set) as decimal numeric entity references.

For example, the UTF-8 encoding of the Cyrillic capital letter “Б” is written as “&#208;&#145;”. As the following Python script shows, the UTF-8 text should be transcoded to recover the original Unicode text. (This script uses the 4Suite XML library.)

#! /usr/bin/python2.4                                                                                                                               

from Ft.Xml.Domlette import NonvalidatingReader
from Ft.Xml.XPath import Evaluate
from Ft.Xml.XPath.Context import Context

# precondition: foo.xac was created by GnuCash with LANG=en_US.UTF-8
doc = NonvalidatingReader.parseUri('file:///tmp/foo.xac')
context = Context(doc, processorNss={'cd'    : "http://www.gnucash.org/XML/cd",
                                     'book'  : "http://www.gnucash.org/XML/book",
                                     'gnc'   : "http://www.gnucash.org/XML/gnc",
                                     'cmdty' : "http://www.gnucash.org/XML/cmdty",
                                     'trn'   : "http://www.gnucash.org/XML/trn",
                                     'split' : "http://www.gnucash.org/XML/split",
                                     'act'   : "http://www.gnucash.org/XML/act",
                                     'price' : "http://www.gnucash.org/XML/price",
                                     'ts'    : "http://www.gnucash.org/XML/ts",
                                     'slot'  : "http://www.gnucash.org/XML/kvpslot",
                                     'cust'  : "http://www.gnucash.org/XML/cust",
                                     'addr'  : "http://www.gnucash.org/XML/custaddr"})

accountName = Evaluate('/gnc-v2/gnc:book/gnc:account[act:id="0d69c3557f4d9340198bfd151f9e13cb"]/act:name/text()',
                       context=context)[0]

# object of type "str" (is actually UTF-8–encoded, not latin1!):                                                                                                                         
name_raw = accountName.data.encode('latin1')

# object of type "unicode":
name_unicode = name_raw.decode('utf-8')

# objects of type "str":                                                                                                                            
name_koi8r = name_unicode.encode('koi8-r')
name_utf8  = name_unicode.encode('utf-8')
name_utf16 = name_unicode.encode('utf-16')

assert name_utf8 == accountName.data.encode('latin1')

See also

External links