He/C++

From GnuCash
Revision as of 09:51, 6 April 2021 by Avma (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
שפות Deutsch English Español Français עִברִית Português

Converting Gnucash to C++ from sort-of GObject

The underlying rationale to this page is a long thread on the gnucash-devel list titled Beyond 2.6. It's worth frequent review, as there are a lot of good ideas as well as valid concerns expressed there.

To summarize, Gnucash has grown over 10 years without a lot of thought on continuing design. The old design documents are still in the tree, and though they might get deleted soon will always be in git history. They haven't been updated in more than 10 years. The reimplementation in GObject was done with a poor understanding of how GObject's emulation of object orientation works. That's not surprising; it was done in the early days of GObject, and GObject itself is complicated and generally lacking in the "syntactic sugar' which makes real object oriented languages usable.

There are lots of object oriented languages, though, so why C++?

  • C++ compilers are available on all major platforms except Android, and there's a shim library available there which may work to wrap a C++ library in a Java GUI. While Java is available on all the desktop platforms, it isn't available on iOS.
  • The conversion will take a long time, so interoperability with the existing C code is essential. Only C++ and Objective C can be interspersed with C one line at a time, but Objective-C is not available on Microsoft platforms using native tools.
  • There are two widely-used C++-based cross-platform GUI libraries, wxWidgets and Qt. Both support all three major desktop platforms and iOS.
cstim's comment: In addition to this reasoning, everyone is invited to have a look at some already existing C++ wrapper objects in src/optional/gtkmm/gncmm, which make use of the glibmm/gtkmm C++ wrapper library around glib/gtk to present classes that "look like" real C++ classes. Using this sort of wrappers would even make it possible to do a step-by-step onversion to C++ - as long as we accept the dependency on glibmm/glib in the core objects for some time being. See Cutecash on how to compile this part of the code with CMake.

Developer Preparation

C++ is easier to learn and to write than is GObject. Low bar. GObject is a bitch to write, and takes a lot of work to understand. C++ is easy to write, but it still takes some work to understand. Some very strongly recommended (I'd say required, but I don't want to be too scary) reading:

  • Bjarne Stroustroup, The C++ Programming Language, Fourth Edition, Addison-Wesley, 2013.
  • Nicolai Josuttis, The C++ Standard Library, A Tutorial and Reference, Second Edition, Addison-Wesley, 2012
  • Scott Meyers, Effective C++, More Effective C++, Effective STL, Addison-Wesley, 2005, 1996, and 2001 respectively, and Effective Modern C++, O'Reilly 2014.
  • Herb Sutter, Exceptional C++, Addison-Wesley, 1999
  • David Vandevoorde and Nicolai Josuttis, C++ Templates: The Complete Guide, Second Edition, Addison-Wesley, 2018.

If you buy all of them new, it will run well over $200 in the US and probably more in Europe. Meyers's and Sutter's books are widely available used at a fraction of the cost, and understanding them will make you a much better C++ programmer. There's a fair amount of overlap between Stroustroup and Josuttis. Of the two, Josuttis is much more approachable for the working developer; Stroustroup is a bit academic. OTOH, Josuttis's coverage is narrower, being mostly focused on the Standard Library. Earlier and therefore cheaper, especially used, versions of either are fine, except that they won't cover the recent developments in the language.

Study in particular templates and generic programming. I found it easier to grasp than rewiring my brain from structured to object oriented design, but that was a long and painful process. The seminal work about that was

  • James Coplien, Advanced C++ Programming Styles and Idioms, Addison-Wesley, 1991

which I found utterly impenetrable, but I haven't tried to read it in a long time. The book that really made it popular is

  • Andrei Alexandrescu, Modern C++ Design, Addison-Wesley, 2001

Stroustroup added a large section in the fourth edition of The C++ Programming Language; earlier editions have a chapter which discusses the feature but doesn't go into much detail about what to do with it. The important thing about Templates is that they allow one to push off some of the work and overhead consumed by pure OO onto the compiler, making the compiled result faster and smaller. Templates are also very helpful in reducing dependencies between classes, a major problem with Gnucash's current code. Moreover, most of the Standard Library and Boost are written using templates and using those 'libraries' effectively depends on a good understanding of templates.

One other note about templates: They're much better than preprocessor macros. While macros work in C++ just like they do in C, they're very clumsy compared to templates. Whenever possible replace macros with const variables, constfuncs, and templates. Remember that the compiler can't really understand macros, but it does understand templates.

Also familiarize yourself with the Standard Library Algorithms. These are highly optimized implementations of common things you need to do. You're not likely to have time to write better code, so use the algorithms every chance you get.

If you're new to C++ or haven't used it in a long time, the most highly regarded introductory book is Stanley Lippman, Losée Laoie, and Barbara Moo, C++ Primer, 5th Edition, Addison-Wesley, 2013.

A couple of other excellent reference sites are

C++11/14/17

A greatly improved language was released in 2011, with more improvements in 2014 and 2017. Fortunately most of those fixes are already implemented in already-available compilers or in Boost. This modern C++ is vastly more expressive, easier to use, and safer than the previous C++98 standard, so it's what we'll adopt for the project. At this point while the latest-and-greatest versions of compilers support most of C++17 those compilers aren't necessarily available on all supported versions of operating systems/distros, so 3.0 is limited to C++11 and boost rather than C++14/17 standard features. The language standard for new work leading to 4.0 is under review. Be careful! Most of the guru books from Strousstoup, Meyers, Sutter, and the rest are pushing C++14 features.

SWIG

GnuCash exposes a fair amount of the core API to the scripting languages Guile (Scheme) and Python via SWIG. While SWIG is quite capable of understanding most modern C++ constructs (see SWIG and C++11 it isn't necessarily the case that they're translatable into Scheme or Python, so careful thought is necessary when designing core library implementations. It may be necessary to provide C wrappers for some class functions.

Gtk+

GnuCash uses Gtk+ for its Graphical User Interface. GnuCash doesn't use the C++ wrapper so any functions that the GUI needs will require C wrappers.

Dependencies

GLib provides a ton of useful cross-platform support functions, macros, and classes. Almost all of them are replaceable with algorithms and containers from the Standard Library. Most of gnucash's use cases of an extensible collection uses GList, but could better use std::vector<> which is vastly more efficient. (Discussion in Talk:C++). The rest of what GLib and GObject provide, along with a couple of things that we did ourselves (GUID, gnc-date, gnc-numeric, and QofSignal) are provided by the Boost libraries. The goal is to have no dependencies other than the Standard Library and Boost except in the GUI, import-export, and backends.

The Plan

The first phase of the conversion focuses on three source directories, somewhat in order because there's some interdependence: libqof/qof, backend, and engine.

QOF

  1. Make the module compile in C++. Complete
  2. Replace the internals of some utility classes with Boost template classes.
    1. GncGUID with boost::uuid. Complete
    2. GncNumeric to a higher-precision rational-number class. Complete
    3. GncDate to boost::datetime. Complete
These classes are very independent and so in theory can be converted without a lot of side effects, which will help us to gain experience in the process without a lot of complications.
  1. QofSession and QofBackend: QofSession is somewhat dependent on QofBackend, and as much as can be has been done. QofBackend is blocked by the backends needing to be converted first.
  2. QofId
  3. QofLog, dependent on GLib log. Replace with a C++ logging library.
  4. Other QOF utility classes
These classes are interdependent on each other and engine:
  1. QofInstance: Has a multi-responsibility smell and needs to be broken up. This is a base class for all of the persistent engine classes.
  2. QofBook: Should really go in engine.
  3. QofQuery: Should become mostly abstraction for SQL queries.
  4. QofEvent: Replace this and GLib signals and various hooks with a C++ signal/slot solution.
These classes should be no longer needed after the
  1. QofClass, QofObject, QofChoice: A bizarre object system used for the business objects.

Backends

  1. DBI: Depends on SQL. In Progress
  2. SQL: First pass complete. Still some work to do replacing GLib containers and void*.
  3. XML

Engine

[TBD]

Preparation

  1. Get a Github account and fork the GnuCash Repository so that you can make pull requests.
  2. Set up a local git repository.
  3. Install and configure Google Test and Google Mock. If you're using Linux use the package manager; then GnuCash's build system should detect it by default.
  4. Build the master branch to make sure that you can. Run make/ninja check.

Requirements

  • Pull requests only unless you have push. These changes are too big for bug reports.
  • Work from a feature branch and rebase it on master immediately before submitting your PR.
  • Work incrementally: Make as small a change as practical, get it to work and pass tests, and commit. Lather, rinse, repeat until done.
  • Preserve all C API that's needed by code that hasn't yet been converted by wrapping C++. Make the C wrappers as thin as possible.
  • Replace C API calls in any C++-ready code (source file has a .cpp extension).
  • Testing:
    • If it's feasible to use TDD then do so.
    • If there are existing C tests by all means use them to validate your work and if not consider writing tests for the C code before you start converting to C++.
  • Every commit should build and pass make/ninja check.
  • Follow the Coding Standards.
  • N.B.: standard filestreams (i.e. std::ifstream, std::ofstream) cannot handle Windows file paths with unicode characters! Use regular C file functions and in order to get unicode handling on Windows you'll need to either use GLib's gstdio functions (g_fopen and friends) or steal their innards.

Design Recommendations

  • Don't waste time with API that isn't immediately used. When you find unused C API, don't reimplement it, remove it.
  • Consider templates instead of class hierarchies when subclass selection can be done at compile time.
  • Keep class hierarchies shallow.
  • Minimize dependencies between classes. They make it harder to change class internals and harder to write good tests. Where they're unavoidable make it easy to substitute mock classes for testing.
  • Write unit tests for your C++ code. When possible test the C code with the tests first so that you can be confident that the C++ implementation doesn't make any unintended changes. Don't forget to test failure conditions as well as success conditions.
  • Maximize use of the C++ standard library, especially the algorithms, and Boost, again subject to the Boost version setting in configure.ac.
  • If there's an STL or Boost algorithm that does what you need, use it. Those guys are better programmers than all of us put together. Take advantage of that. That requires some familiarity with what's available, so do study in particular the STL algorithms.
  • Write type-safe code: Use templates instead of void*.
  • Avoid naked pointers when practical, using std::unique_ptr and std::shared_ptr. Note that you can't pass these to C, it doesn't know how to dereference them, so for functions passed to C you must use naked pointers. Naked pointers may also make sense where they can be allocated and freed in the same function, but beware of exceptions and multiple returns. If you find you've written delete more than once you need a smart pointer. If your function or anything it calls can throw you need a smart pointer.
  • Use good judgement when using the standard namespaces std and boost. Prefer using only specific identifiers so that infrequently used ones are tagged with the namespace. This can make it easier to understand code by making clear what is local and what is imported.
  • Acquire resources (e.g. new and open) in constructors and release them (delete and close) in destructors. This is called Resource Acquisition Is Instantiation or RAII and combined with smart pointers is a fundamental practice for preventing leaks.
  • Keep thread-safety and reentrancy in mind: Modern processors are mostly multi-core, and modern operating systems and dispatchers will multi-thread programs on their own if possible. In particular avoid statics and provide locking (std::async, std::future, std::atomic, etc.) when writing to class (i.e. static) members. These are both particular weaknesses of the existing implementation.
  • Exceptions are OK but you must be sure that they can't pass into C or out of any interface exposed with Swig.
  • Avoid adding dependencies if possible. If you must, ensure that their licenses are compatible with GnuCash's and that we comply with any license requirements like crediting the copyright holder in documentation. We may not distribute dependencies on Linux and BSD, but we do on Windows and MacOSX.
  • Don't make radical changes to the class hierarchy at this point, but do make changes that improve design flexibility.
  • Be mindful of patterns and use them to inform your design decisions.
  • Overloading in C++ is good, just remember that it's not C.