Update:Remora Migration
Contents
The Addon Version to Files to Application Versions Problem
'Resolved'
In view of the discussion at Update:Remora_Schema#versions.approval, the migration script copies the Versions tables naively: each row of the old database 'version' table maps to a row in the new database 'versions' table as well as a row in the 'applications_versions' and 'files' tables. This results in rows in the 'versions' table that differ only by their primary key id. Instead, this duplication should be eliminated: effectively identical rows should be consolidated under one primary key. Unfortunately, this means that entries in the 'applications_versions' table must violate their primary key constraints.
Discussion
Consider the following example which illustrates the problems in mapping the old data into the new database.
Old Database 'version' table for addon 300. All three rows appear to refer to the same version of the addon. The first row has min/max appversions that make no sense but that is another issue. There are examples where all the min/max appversions make sense yet still demonstrate the problem. The second and third rows are duplicates of each other, except for their DateAdded and DateUpdated columns (not shown).
Old Database 'version' table:
vID | ID | Version | OSID | AppID | MinAppVer | MaxAppVer | URI | ... |
---|---|---|---|---|---|---|---|---|
906 | 300 | 0.3 | 1 | 1 | 0.9 | 0.10 | ...biofox-0.3-fx.xpi | ... |
1253 | 300 | 0.3 | 1 | 1 | 0.9 | 1.0 | ...biofox-0.3-fx+fx.xpi | ... |
1316 | 300 | 0.3 | 1 | 1 | 0.9 | 1.0 | ...biofox-0.3-fx+fx.xpi | ... |
How do I make sense of these entries?
These enries should make only one new row in the new database 'versions' table:
id | addon_id | version | ... |
---|---|---|---|
906 | 300 | 0.3 | ... |
Since we have two different filenames, we should have two entries in the 'files' table, though there is an issue as to which DateAdded to use from the duplicated rows.
new database 'files' table:
id | version_id | platform_id | filename | ... |
---|---|---|---|---|
m | 906 | 1 | ...biofox-0.3-fx.xpi | ... |
m+1 | 906 | 1 | ...biofox-0.3-fx+fx.xpi | ... |
Unfortunately, this leaves a problem for the new database 'applications_versions' table. These two entries violate the table's unique primary key constraint.
new database 'applications_versions' table:
application_id | version_id | min | max |
---|---|---|---|
1 | 906 | 0.9 | 0.10 |
1 | 906 | 0.9 | 1.0 |
Solutions Proposals
1 -- Schema Modification
REJECTED
Is the 'applications_versions' table not correct for the model? The version ranges seem to apply to files not addon versions. Should the table be replaced with an 'applications_files' table?
2 -- Cull Bad Data
ACCEPTED
The problematic sources here may represent simple bad data. Since addons will me migrated manually, don't migrate the ones that have this problem. As a rule, when encountering multiple entries in the old database for the same addon version and os, take only the latest one.
3 -- Naive Migration
REJECTED
Return to the original naive method of migration. Each row from the old database 'version' table gets mapped to a single row in each of the new database's 'versions', 'files' and 'applications_versions'.
While this eliminates the problem, it does not correctly follow the new model's intent. It leads to multiple version rows that differ only by primary key. The approved column can only refer to a single file rather than all files for a given version.
The Application Version Problem (Resolved)
The decision to convert the applications_versions.min and applications_versions.max columns from strings into actual foreign keys into the appversions table presents a conversion problem. The original min and max version strings in the old database were not closely policed and do not necessarily map to actual application versions.
How do we make a programmatic mapping from fuzzy application version strings to real ones?
Discussion
Addon #1980 has two versions (both designated as version 1.0, but that is a different problem). The first version of the addon is for Firefox 1.0 through 1.6. Because 1.6 does not exist, migrating that version fails because there is no valid foreign key value for FF1.6. The second version of this addon is for Firefox 1.0 through 1.5.1. Well, that version of Firefox doesn't exist either, so that version fails to migrate.
Many addon version entries have faulty version numbers.
Complicating this issue is the incompleteness of the appversions table. This table is populated by finding the unique entries in the original database's applications table. Not all versions of all products are actually in this table. Should this table be made more complete? What would be the source of the complete data?
The Solutions
I'm quite open to having someone give some input into this section.
1 -- Drop Versions
REJECTED
If a version has bad min or max application version strings, drop that version from the migration.
What if an addon then has no version entries at all?
That leaves the clean up the appversions version strings question unanswered...
2 -- A Mapping
ACCEPTED
Since the migration script is a one-off program (only used once and then thrown away) making a complicated and flexible-intelligent-fuzzy-version-conversion-system isn't necessary. There aren't that many distinct erroneous versions. Manually make a map of Max&MinVer strings to the versions available in the appversions table. When encountering a mystery version, look it up and proceed.
The 'appversions' table is originally created from the old database applications table. The version strings found there are translated into rationality using a lookup table. See the first page of the attached spreadsheet.
As the 'versions' table is populated from the old database, the min and max supported versions for any given application is translated into rationality by lookup table. See the second and third pages of the attached spreadsheet to see the translations. If a version is not found in the existing 'appversions' table, it is added.