User:Lmat

From GnuCash
Jump to: navigation, search

Bayes GUID Work

Version Behavior
2.7.2 Converts bayes import map to GUID, and sets GNC_FEATURE_GUID_BAYESIAN. Reads and writes only GUIDs. Sets the GUID_BAYESIAN feature when the conversion is run, or anything is added to the import map.
2.6.18 When reading the import map, if a record is found under account name, it will be used, otherwise if an account is found by GUID, it will be used. When writing to the import map, if a slot is not found under the account name but is found under the GUID, the GUID record is used.

The Old Plan

In the next 2.7 release (2.7.3?), if GNC_FEATURE_GUID_BAYESIAN is not set, a conversion will take place to change all the import maps to guid and flat[1]. Only this 2.7 release will set the feature flat.

The next 2.6 release (2.6.19?), when GNC_FEATURE_GUID_BAYESIAN is set, all reads and writes will be done in the new style. If it is not, all reads and writes will be in the old style.

The Problem

If books are created in 2.7.2, then bayes entries are added in 2.6.18, the import map will be mixed. If this is then read by either the new 2.7 release or the new 2.6 release, they will fail to get all information.

The New Plan

The next 2.7 release (2.7.3?) should have a new feature: GNC_FEATURE_FLAT_GUID_BAYESIAN. It will only be set by this and subsequent releases. When it is set, the import maps will be converted.

The next 2.6 release (2.6.19?) will be able to read the feature GNC_FEATURE_FLAT_GUID_BAYESIAN.

If books are created in 2.7.2, then edited in 2.6.18 to make a mixed-import-map situation, then loaded in the new 2.7 release, what should happen?

gjanssens: When the feature flag protection is properly implemented this can't happen. If the books are created with 2.7.3 and the user has created bayesian imap entries in that version, GNC_FEATURE_FLAT_GUID_BAYESIAN will be set. When you then try to open that book with 2.6.18, it will refuse to do so because 2.6.18 doesn't know this feature. So in this case 2.6.18 can't create a mixed-import-map situation any more. If on the other hand 2.7.2 didn't add guid based import maps the feature flag won't be set by 2.7.2. This means gnucash 2.6.18 will be able to open this file as there is no data format conflict. If subsequently the user adds import maps in 2.6.18, they will all be in the old format. A subsequent run of 2.7.2 on that file will then find account name based import maps in in it while GNC_FEATURE_FLAT_GUID_BAYESIAN not set. Hence it will run a conversion of all import maps and set the feature flag. From that point on the file will no longer be opened by 2.6.18 (full circle). Note it's crucial here the feature flag should only be set if there actually are flat/guid based import maps in the data file. Otherwise you'd be preventing 2.6.18 from reading the file even if it didn't have any import maps at all. Yet in that case the file is perfectly readable by 2.6.18 and the user should be allowed to do so.

jralls: Not quite. If the file is created in 2.7.0 - 2.7.2 and the user did an import with bayes matching then GNC_FEATURE_GUID_BAYESIAN (no FLAT) will be set; if the file was created in an earlier version and has bayes import maps with account names then those get converted and GNC_FEATURE_GUID_BAYESIAN is set and the account name entries are converted to guids. Trying to open either file with Gnucash older than 2.6.12 will fail because the feature is unknown. Opening the file with 2.6.12-2.6.18 will read the guids OK and if it needs to write new ones will clear the feature flag so that opening it with 2.7.0-2.7.2 will know to convert the new entries and reset the feature flag.

So 2.7.3 and 2.6.19 will need to handle three possible situations: 1)GNC_FEATURE_GUID_BAYESIAN is not set, where some or all of the entries have account names instead of guids. Those that do need to have the account names changed to guids and all entries need to be flattened. 2) GNC_FEATURE_GUID_BAYESIAN is set, so all of the entries will have guids and will need only to be flattened. 3) GNC_FEATURE_FLAT_GUID_BAYESIAN is set, no conversion required.

2.6.19 should write account names and not-flat if neither flag is set and guids and flat if GNC_FEATURE_FLAT_GUID_BAYESIAN is set. Where GNC_FEATURE_GUID_BAYESIAN is set it could either unset it and write account names and not flat or leave it and write guid and not flat.

Once the GNC_FEATURE_FLAT_GUID_BAYESIAN flag is set by 2.7.3 neither 2.7.0-2.7.2 nor anything older than 2.6.19 will be able to open it for writing.

Thank you for your replies. The implication thus far is that the answer to my present question is "yes", but I would like to make it explicit: is it important that we support data files from unstable versions? Lmat (talk) 01:17, 5 December 2017 (UTC)
Well, your question was "what should happen?" and I think I laid that out for both versions. Whether it's worthwhile to implement all three cases is a bit of a probability game. It's entirely possible that there are zero instances of case 2 or of case 1b, where GNC_FEATURE_GUID_BAYESIAN isn't set but some of the entries are still already in guid form. You can finesse the latter by including instructions that if there's Bayesian data and it's ever been opened by 2.7.[012] that one needs to open it in 2.7.[012] and save before opening it in 2.7.3. Or we can say that since we warned about not using production data that any book that's been opened with 2.7.[012] should be discarded if it has Bayes matching data. Or handle all 3 (4?) cases. All are acceptable options, but we have to pick one and if it't not the last one we have to say what to do in the release notes. John Ralls (talk) 03:46, 5 December 2017 (UTC)
My reasoning silently assumed we don't support data files manipulated by beta versions. If we can with a reasonable effort, that would be the most user-friendly way. It seems to me supporting all 3/4 uses cases adds quite some complexity in both 2.6.19 and the 2.8 series which we have to carry around for quite some time. I haven't looked too deeply in the code though. Second best would be to add some code that detects unsupported situations and tell the user how to resolve it if it's too complicated to write code for all possible conversions. The simplest form is to consider files with import maps manipulated with 2.7.[0-2] as unsupported as this feature only existed in early betas of the 2.8 development cycle. In that case we could even drop GNC_FEATURE_GUID_BAYESIAN from 2.7.3 as well, so anything more recent will refuse to open these files as well. That should be accompanied with a clear release note message though. I'm having a hard time deciding which is best. Perhaps we should poll our user base to get an idea of how many people discarded our advice ? Gjanssens (talk) 09:04, 5 December 2017 (UTC)
Thanks for making that clear. I think it won't be too bad to cover all situations, and that's the way I'll go for now. Lmat (talk) 12:12, 5 December 2017 (UTC)

A Difficult Scenario

  1. A user creates books in 2.7.3 but doesn't create import map data. This will set a run-once conversion flag for flat KVP.
  2. The user opens those books in 2.6.15 and creates import map data.
  3. The user opens the books in 2.7.3. 2.7.3 should convert the books at this point, but how will it know to do so?


Possible Solution?

If there are no import data, the run-once conversion flag could not be set in 2.7.3. That means, every time 2.7.3 opens the books, if it hasn't seen import map data, it will check the import map to see if there are any data. If the user never creates import map data, the user will have to pay this backward compatibility performance tax each time he opens his books. In this case, the run once conversion flag is exactly the same as the GUID_FLAT_BAYES feature flag, so the run once conversion flat is actually superfluous in this scheme. I know startup performance is a touchy subject. Hopefully I'm missing another way.

I always assumed there to be only one flag, not a separate run once conversion flag. If the user never creates import map data, the time to test would be negligible, no ? So in this particular case I don't expect much of a performance hit. But I do agree in general the situation is suboptimal as there may be tests in the future that would be much more time consuming. Perhaps we should have gnucash write its version to the data file as well. That way we always know which version of gnucash has last written to the file and we can make assumptions based on this version. In you scenario above: if you open a data file and it says it was last saved with 2.7.3, you don't have to check import map data because if it is there a previous run in 2.7.3 will have converted it. If on the other hand the version number stored in the file is missing (which means a gnucash version older than 2.6.19 has last touched the file) or less than 2.7.3 you go through to check sequence as described above (with different actions depending on which flags are set). The 2.6.19 code continues to behave as John laid out. I am tempted to have it write its version number into the data file as well, but at first sight there's not much we can deduce from it for that version. It would just be for consistency and starting the new custom. Gjanssens (talk) 10:22, 7 December 2017 (UTC)
I'm strongly hesitant to add more database flags. These things are dashed tricky to get all the details right and grow more complicated exponentially. The thing I hate most about the Possible Solution above is that the people who pay the highest price are the people who don't use the import map. (I'm not sure how much they have to pay, but it does require navigating the entire account structure and looking through their KVPs.) To make sure you only pay if you use the feature, I'm thinking about doing the check on first use of the import map. Any objections or thoughts to that? I'll flesh it out a bit now below. Lmat (talk) 21:41, 8 December 2017 (UTC)

Second Solution

When a file is opened in 2.7.3 nothing is done. Any time the import map is accessed (to find an account or write/update an entry), the database is checked for the GUID_FLAT_BAYES feature flag. If it is not found, the import map data are converted to the guid, flat style, and that feature flag is set. In this case (import map data are present and no GUID_FLAT_BAYES feature flag is set), the data are known not to be flat and either in the pre-2.6.16 style with account names, or in the 2.7.{1,2} style with guids.

Of course, after this flag is set, only the newer 2.6 releases can read and write the data, and earlier versions will refuse based on the feature flag.

This seems workable, modulo pre 2.6.19 vs. pre 2.6.16. But you'd better get on it, the 2.6.19 release is next weekend. John Ralls (talk) 03:36, 9 December 2017 (UTC)
As John already says, this can work. Independent of whether you want to use it for your current conversion, I'm still in favour or adding a "last-written-by-version" as a book kvp because it may help us to centralize all migrations/conversions between book formats and format versions in the future rather than having it spread all throughout the code with only very limited overhead (only conversions between the last-written-by-version and the current version are necessary and opening the file by older versions tend to happen less over time). Even if we don't use it right now it can become a very valuable data file parameter in the future. Whether you want to use it for your current feature flag is your choice. Gjanssens (talk) 10:39, 9 December 2017 (UTC)

  1. Only the bayes import maps are "flat"; the "regular" import maps retain their dimensionality.