Fixing cvs2svn branchpoints

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Fixing cvs2svn branchpoints

Joseph Myers
As mentioned at the Cauldron, I'm looking at finding better branchpoints
for the cases in the GCC repository where cvs2svn messed up identifying
the parent branch and commit on which a branch was based, so that affected
branches can be reparented as part of moving to git, since messed-up
branchpoints are actually confusing in practice when looking at old
branches.

An idiomatic branch in SVN would start with a commit that just copies one
commit of one branch to another branch, with no further changes.  In many
cases it's not possible to achieve that through reparenting because there
is no commit on any parent branch exactly corresponding to the first
commit on the cvs2svn-generated branch.  However, it's still possible to
find a much better approximation than cvs2svn did in some cases.  (There
are also cases where cvs2svn found a good branchpoint, but represented the
branch-creation commit in a superfluously complicated way, replacing lots
of files and subdirectories by copies of different revisions.  That
doesn't really matter for conversion to git, however, since git's data
structures don't say anything about where a particular subdirectory was
copied from, just the tree hash and the parent commit.)

I'm using heuristics to see if a particular branch has a suspicious
branchpoint.  First, if there is a branchpoint tag I take that as the best
estimate of what the tree should look like at the branchpoint commit on
the parent branch; otherwise, I take the first commit on the branch as the
best estimate of that.  Then, I consider a branchpoint not to be
suspicious if the only diffs between the tree at the parent commit and the
tree estimated to start the branch to be file deletions, and, if there was
no branchpoint commit, file additions.

(There are several reasons why the creation of a branch might involve file
deletions.  Some look like CVS glitches where it simply failed to create
the branch in particular ,v files; some may be cases where the person
created the branch only for certain subdirectories, deliberately; some
look like cases where ,v files for separately developed subdirectories,
e.g. libjava, got moved into the GCC CVS repository at some point, so
resulting in the appearance of those subdirectories being deleted on
creation of branches before they were moved into place.  File additions at
branch creation look more like an artifact of how cvs2svn handles cases of
a file first added on trunk after a branch was created, then backported to
that branch.)

If the branchpoint is suspicious (54 are, out of 135 branches in /branches
as of r105925, the last cvs2svn-generated commit), I then look for an
alternative non-suspicious branchpoint, which might be either on the same
parent branch currently used, or on a different one chosen by some
heuristics.  Because pretty much all normal GCC commits change file
contents (modifying a ChangeLog file, if nothing else), any candidate
parent that is non-suspicious, and thus does not involve any file content
differences when compared with the branchpoint commit or first commit on
the branch, should be very close to being the right parent commit.

Here is a list of reparentings I suggest for 16 of those 54 branches,
including in particular the cases of egcs_1_00_branch and gcc-3_2-branch
that were noted on IRC to have bad branchpoints at present; some are only
small changes, some are much more major fixes.  I expect I can find
reparentings for some of the rest with more investigation and improved
heuristics or hints for those heuristics, while others may well already be
essentially the right branchpoint despite file content changes being
present in the first commit.  (Two of the rest do have reparentings
suggested by my script, but they need more careful investigation because
of file content mismatches between the branchpoint tags and the first
commit on the branch.)

The first two columns after REPARENT: list the SVN path of the branch, and
the revision number of the first commit on it (the one that should be
reparented).  The next two list the suspicious parent (that is, the branch
and revision from which cvs2svn generated the copy that created the
top-level /branches/whatever directory for the branch, along with further
changes in the commit to fix up files and subdirectories in that copy to
have the right tree contents).  The final two columns list the proposed
new parent branch and revision on that branch.  In all cases, the tree
content is expected to be left as generated by cvs2svn; it's simply the
commit parent that should be changed in git.

REPARENT: /branches/GC_5_0_ALPHA_1 27860 /trunk 27852 /trunk 27855
REPARENT: /branches/csl-3_3_1-branch 70143 /trunk 60111 /branches/gcc-3_3-branch 70142
REPARENT: /branches/csl-3_4-linux-branch 90110 /trunk 75991 /branches/gcc-3_4-branch 90109
REPARENT: /branches/csl-3_4_0-hp-branch 80843 /trunk 75991 /branches/gcc-3_4-branch 80842
REPARENT: /branches/csl-sol210-3_4-branch 87927 /trunk 75991 /branches/gcc-3_4-branch 87903
REPARENT: /branches/cygming331 70683 /trunk 60111 /branches/gcc-3_3-branch 70142
REPARENT: /branches/cygming332 73014 /trunk 60111 /branches/cygming331 73013
REPARENT: /branches/cygwin-mingw-gcc-3_1-branch 53609 /trunk 50029 /branches/gcc-3_1-branch 53596
REPARENT: /branches/egcs_1_00_branch 16282 /branches/devo_gcc_testsuite 14842 /trunk 16272
REPARENT: /branches/gcc-2_95_2_1-branch 30162 /trunk 26993 /branches/gcc-2_95-branch 30160
REPARENT: /branches/gcc-3_2-branch 55785 /trunk 50029 /branches/gcc-3_1-branch 55783
REPARENT: /branches/gcc-3_3-rhl-branch 66998 /trunk 60111 /branches/gcc-3_3-branch 66832
REPARENT: /branches/gcc-3_4-e500-branch 89417 /trunk 75991 /branches/gcc-3_4-branch 89410
REPARENT: /branches/gcc-3_4-rhl-branch 81014 /trunk 75991 /branches/gcc-3_4-branch 80870
REPARENT: /branches/gcc-4_0-rhl-branch 95664 /trunk 95533 /branches/gcc-4_0-branch 95655
REPARENT: /branches/libgcj-2_95-branch 27730 /branches/CYGNUS 26267 /trunk 27727

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Fixing cvs2svn branchpoints

Joseph Myers
Here are complete lists of reparentings I think should be done on the
commits that start branches, along with my notes on branches with messy
initial commits but where I don't think any reparenting should be done.  
The REPARENT: lines have the meaning I described in
<https://gcc.gnu.org/ml/gcc/2019-10/msg00127.html>.

Of the 54 branches with suspicious branchpoints, I have 32 with
automatically suggested or verified reparentings, meeting the criteria
given in that message for the new parent not being suspicious, and 7 more
I think should be reparented although the new parent is still suspicious
in some ways (e.g. because of vendor branch issues or non-atomic
branching).  The remaining 15 suspicious cases are ones where I think the
existing branchpoint is the best one available.

Automatically suggested or verified reparentings:

REPARENT: /branches/GC_5_0_ALPHA_1 27860 /trunk 27852 /trunk 27855
REPARENT: /branches/apple-200511-release-branch 105574 /trunk 95082 /branches/apple-local-200502-branch 105446
REPARENT: /branches/apple-gcc_os_35-branch 90607 /branches/tree-ssa-20020619-branch 79740 /branches/apple-ppc-branch 90334
REPARENT: /branches/apple-tiger-release-branch 96595 /branches/tree-ssa-20020619-branch 79740 /branches/apple-ppc-branch 96593
REPARENT: /branches/bje-unsw-branch 97591 /trunk 95529 /branches/gcc-4_0-branch 97590
REPARENT: /branches/bounded-pointers-branch 33333 /trunk 33317 /trunk 33062
REPARENT: /branches/cfg-branch 46945 /trunk 46940 /trunk 46941
REPARENT: /branches/csl-3_3_1-branch 70143 /trunk 60111 /branches/gcc-3_3-branch 70142
REPARENT: /branches/csl-3_4-linux-branch 90110 /trunk 75991 /branches/gcc-3_4-branch 90109
REPARENT: /branches/csl-3_4_0-hp-branch 80843 /trunk 75991 /branches/gcc-3_4-branch 80842
REPARENT: /branches/csl-3_4_3-linux-branch 93879 /trunk 72971 /branches/csl-arm-branch 92959
REPARENT: /branches/csl-arm-2004-q3-branch 90934 /trunk 72971 /branches/csl-arm-branch 90933
REPARENT: /branches/csl-gxxpro-3_4-branch 102442 /trunk 72971 /branches/csl-arm-branch 102441
REPARENT: /branches/csl-sol210-3_4-branch 87927 /trunk 75991 /branches/gcc-3_4-branch 87903
REPARENT: /branches/cygming331 70683 /trunk 60111 /branches/gcc-3_3-branch 70142
REPARENT: /branches/cygming332 73014 /trunk 60111 /branches/cygming331 73013
REPARENT: /branches/cygwin-mingw-gcc-3_1-branch 53609 /trunk 50029 /branches/gcc-3_1-branch 53596
REPARENT: /branches/cygwin-mingw-gcc-3_2-branch 55799 /trunk 50029 /branches/cygwin-mingw-gcc-3_1-branch 55797
REPARENT: /branches/cygwin-mingw-gcc-3_2_1-branch 59662 /trunk 50029 /branches/cygwin-mingw-gcc-3_2-branch 59368
REPARENT: /branches/cygwin-mingw-v2-branch 60175 /trunk 50029 /branches/gcc-3_2-branch 59267
REPARENT: /branches/egcs_1_00_branch 16282 /branches/devo_gcc_testsuite 14842 /trunk 16272
REPARENT: /branches/gcc-2_95_2_1-branch 30162 /trunk 26993 /branches/gcc-2_95-branch 30160
REPARENT: /branches/gcc-3_2-branch 55785 /trunk 50029 /branches/gcc-3_1-branch 55783
REPARENT: /branches/gcc-3_3-e500-branch 65902 /trunk 60111 /branches/gcc-3_3-branch 65660
REPARENT: /branches/gcc-3_3-rhl-branch 66998 /trunk 60111 /branches/gcc-3_3-branch 66832
REPARENT: /branches/gcc-3_4-e500-branch 89417 /trunk 75991 /branches/gcc-3_4-branch 89410
REPARENT: /branches/gcc-3_4-rhl-branch 81014 /trunk 75991 /branches/gcc-3_4-branch 80870
REPARENT: /branches/gcc-4_0-rhl-branch 95664 /trunk 95533 /branches/gcc-4_0-branch 95655
REPARENT: /branches/gomp-01-branch 62579 /trunk 62499 /branches/tree-ssa-20020619-branch 62392
REPARENT: /branches/libgcj-2_95-branch 27730 /branches/CYGNUS 26267 /trunk 27727
REPARENT: /branches/struct-reorg-branch 87007 /branches/tree-ssa-20020619-branch 77756 /branches/tree-profiling-branch 86038
REPARENT: /branches/tree-cleanup-branch 87819 /trunk 87795 /trunk 87698

In the case of cfg-branch, the reparenting was suggested automatically
based on the branchpoint tag with a note of possible mismatch between
branch and branchpoint tag; that mismatch appears just to be a vendor
branch artifact and the reparenting seems correct.

Manually identified reparentings (not perfect matches, but seem the
best available and better than the existing parents):

REPARENT: /branches/apple-200508-beta-branch 102941 /trunk 95082 /branches/apple-local-200502-branch 102940
REPARENT: /branches/bnw-simple-branch 56621 /trunk 54811 /branches/tree-ssa-20020619-branch 56620
REPARENT: /branches/egcs_gc_branch 19641 /branches/devo_gcc_testsuite 14842 /trunk 19615
REPARENT: /branches/ffixinc-branch 23624 /branches/devo_gcc_testsuite 14842 /trunk 23622
REPARENT: /branches/gcc-3_2-rhl8-branch 57454 /trunk 50029 /branches/gcc-3_2-branch 56747
REPARENT: /branches/gnu-win32-b20-branch 22525 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_1_branch 22523
REPARENT: /branches/structure-aliasing-branch 87042 /trunk 86982 /trunk 86980

My notes on branches with imperfect parents to leave as-is, as
considered good-enough after analysis (even where the automatic
process suggested a reparenting):

Vendor branch artifacts, as seen in various branches, are generally
cases where a file was first imported on a CVS vendor branch, and then
successive imports done there before any non-vendor-branch changes
were made, so meaning that for a while revisions 1.1.1.1, 1.1.1.2,
etc. were the HEAD revision as that's how CVS defines HEAD in that
case.  For some reason, in at least some cases where CVS was used to
create branches while that was the case, it created them based on
revision 1.1 of the file, the first revision, rather than the latest
revision on the vendor branch, meaning that the branch creation commit
involves reverting those files to their initially imported versions -
in the cases where I checked the CVS ,v files, cvs2svn was accurately
representing that peculiarity of the CVS history.

Apart from vendor branch artifacts, the main issue detected by my
script as a possible problem with the branchpoint is CVS branching
being non-atomic so the actual branchpoint spans parts of multiple
separate CVS commits.

BAD: /branches/csl-hpux-branch 73668 /trunk 73667
BAD: /branches/gcc-3_4-branch 76005 /trunk 75991
BAD: /branches/gcc-3_5-integration-branch 75824 /trunk 75823
BAD: /branches/libada-branch 72845 /trunk 72843
BAD: /branches/libobjc-branch 76540 /trunk 76539
  [boehm-gc/{libtool.m4,mkinstalldirs,install-sh} vendor branch
   artifacts]

MISMATCH: /branches/csl-arm-branch 73001 /tags/csl-arm-branchpoint 84258
BAD: /branches/csl-arm-branch 73001 /trunk 72977
  [non-atomic branching, branchpoint appears to span commits in range
   72973 to 72977, plus those vendor branch artifacts]

BAD: /branches/egcs_1_1_branch 21136 /trunk 21131
  [various vendor branch artifacts]

BAD: /branches/gomp-20050608-branch 100901 /trunk 100781
  [non-atomic branching, branchpoint has part but not all or r100781
   commit]

MISMATCH: /branches/hammer-3_3-branch 58890 /tags/hammer-3_3-branchpoint 58860
  [the automatically-suggested reparenting from /trunk:58888 to
   /trunk:58859 would be appropriate to the branchpoint tag - however,
   the branchpoint tag does not match the first commit on the branch
   and the existing parent is appropriate to that first commit, so I
   propose no change to this branch]

BAD: /branches/hot-cold-branch 88785 /trunk 88781
  [non-atomic branching, branchpoint has part but not all of r88784
  commit]

BAD: /branches/ia64-fp-model-branch 89947 /trunk 89945
  [non-atomic branching, branchpoint has part but not all of r89946
   commit]

BAD: /branches/pch-branch 49750 /trunk 48845
  [boehm-gc/Makefile.direct vendor branch artifact]

BAD: /branches/pchmerge-branch 45961 /trunk 45960
  [non-atomic branching, no branchpoint tag, branchpoint appears to
   span commits in range 45925 to 45960]

BAD: /branches/premerge-fsf-branch 14640 /trunk 14639
  [only oddity is branchpoint tag adding files not present on trunk]

BAD: /branches/java-gui-branch 77760 /trunk 77730
  [looks like some 77730 is right for directories branched initially,
   other directories only branched four months later]

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Fixing cvs2svn branchpoints

Eric S. Raymond
Joseph Myers <[hidden email]>:
> Here are complete lists of reparentings I think should be done on the
> commits that start branches, along with my notes on branches with messy
> initial commits but where I don't think any reparenting should be done.  
> The REPARENT: lines have the meaning I described in
> <https://gcc.gnu.org/ml/gcc/2019-10/msg00127.html>.

Please leave this as an issue on the gcc-conversion bugtracker.

Your timing is interesting.  Happens I got my first full conversion
with the Go port of reposurgeon earlier today.  I'm trying to verify
the conversion against the Subversion repository, but a full checkout
filled a filesystem on the EC2 instance I'm using. Recovery is
underway.

I'll do real benchmarks when I'm not staring at a deadline, but the
Go port is at least 20x faster than the Python was.  That makes
the conversion practical, though it turns out the 128GB on my
desktop machine isn't enough to support it - hence the EC2 instance.

The first full conversion took eight hours.  Turns out the single most
computationally expensive part of the surgery is data-mining ChangeLog
files for commit attributions.  Today I threw massive parallelism at
the problem, that being something far easier to do in Go than in Python
- I think that might cut as much as two hours from the next run.

By going to the cloud I've gotten a larger working-set capacity at the
cost of some memory-access speed.  Didn't want to do that, but
your repo is just too damn big for it to be otherwise, unless somebody
wants to drop cash on me to double the RAM in the Great Beast.

Your pile of requests is tricky but should be doable.

You had previously written:

>There are also cases where cvs2svn found a good branchpoint, but
>represented the branch-creation commit in a superfluously complicated
>way, replacing lots of files and subdirectories by copies of different
>revisions.

Yes, reposurgeon has logic to detect and deal with this automatically.
The assumption it makes is that the branch should root to the most
recent revision that CVS did a copy from. This is simple and seems to
give satisfactory results.

Which reminds me. I found a bunch of "svnmerge-integrated" properites
in the history. Should I treat those as though they were mergeinfo
properies and make branch merges from them?
--
                <a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


Reply | Threaded
Open this post in threaded view
|

Re: Fixing cvs2svn branchpoints

Joseph Myers
On Fri, 1 Nov 2019, Eric S. Raymond wrote:

> Joseph Myers <[hidden email]>:
> > Here are complete lists of reparentings I think should be done on the
> > commits that start branches, along with my notes on branches with messy
> > initial commits but where I don't think any reparenting should be done.  
> > The REPARENT: lines have the meaning I described in
> > <https://gcc.gnu.org/ml/gcc/2019-10/msg00127.html>.
>
> Please leave this as an issue on the gcc-conversion bugtracker.

Done.  <https://gitlab.com/esr/gcc-conversion/issues/1>.

As noted there, I think this ought just to be a single reposurgeon
reparent command for each of those 39 REPARENT lines, but I'm wary of
adding those 39 reparent commands in a merge request without testing, and
don't have any systems with more than 128 GB of memory to hand to test on.

(Incidental note: I'm taking the reparent syntax from the reposurgeon
sources, that command doesn't seem to be documented in reposurgeon.adoc
although it used to be documented in reposurgeon.xml.)

A similar issue may well apply to some tags, since tags and branches are
essentially the same thing in SVN, and I hope to make such checks for tags
as well.

> >There are also cases where cvs2svn found a good branchpoint, but
> >represented the branch-creation commit in a superfluously complicated
> >way, replacing lots of files and subdirectories by copies of different
> >revisions.
>
> Yes, reposurgeon has logic to detect and deal with this automatically.
> The assumption it makes is that the branch should root to the most
> recent revision that CVS did a copy from. This is simple and seems to
> give satisfactory results.

Once we have a full conversion we should extract details from it of the
branch roots reposurgeon identified, for further checks on them.

There are lots of mid-branch commits that also have commit messages of the
form "This commit was manufactured by cvs2svn to create branch 'X'".  
Those mid-branch commits should *not* be turned into merge commits.  The
typical situation resulting in such a mid-branch commit was that a file
(typically a testcase) first created on HEAD then got backported to a
branch, so cvs2svn means that commit created the branch *for that
particular file" (so it's typically part of a cherry-pick, not a merge,
though some CVS-era merges may have created such commits as well).

> Which reminds me. I found a bunch of "svnmerge-integrated" properites
> in the history. Should I treat those as though they were mergeinfo
> properies and make branch merges from them?

I think that's what those properties logically are, so making them into
merges makes sense if that's easy to do.

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Fixing cvs2svn branchpoints

Joseph Myers
And here are corresponding lists of tags where the commit cvs2svn
generated for the tag should be reparented.  The semantics are exactly the
same as for branches (change the parent of that commit without changing
the tree contents).  In many but not all cases, the reparenting may result
in the commit for the tag no longer having any changes compared to its
parent commit.

Automatic:

REPARENT: /tags/egcs_1_0_1_prerelease 17185 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_00_branch 17184
REPARENT: /tags/egcs_1_0_1_release 17283 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_00_branch 17282
REPARENT: /tags/egcs_1_0_2_980309_prerelease 18446 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_00_branch 18445
REPARENT: /tags/egcs_1_0_2_prerelease 18370 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_00_branch 18369
REPARENT: /tags/egcs_1_0_2_release 18616 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_00_branch 18615
REPARENT: /tags/egcs_1_0_3_prerelease 19382 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_00_branch 19381
REPARENT: /tags/egcs_1_0_3_release 19764 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_00_branch 19763
REPARENT: /tags/egcs_1_0_release 16926 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_00_branch 16925
REPARENT: /tags/egcs_1_1_1_pre 23463 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_1_branch 23462
REPARENT: /tags/egcs_1_1_1_prerelease 23487 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_1_branch 23486
REPARENT: /tags/egcs_1_1_1_prerelease_2 23595 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_1_branch 23594
REPARENT: /tags/egcs_1_1_1_prerelease_3 23825 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_1_branch 23824
REPARENT: /tags/egcs_1_1_1_release 24056 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_1_branch 24055
REPARENT: /tags/egcs_1_1_2_prerelease_1 25236 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_1_branch 25235
REPARENT: /tags/egcs_1_1_2_prerelease_2 25405 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_1_branch 25404
REPARENT: /tags/egcs_1_1_2_prerelease_3 25634 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_1_branch 25633
REPARENT: /tags/egcs_1_1_2_release 25764 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_1_branch 25763
REPARENT: /tags/egcs_1_1_prerelease 22098 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_1_branch 22097
REPARENT: /tags/egcs_1_1_release 22148 /branches/devo_gcc_testsuite 14842 /branches/egcs_1_1_branch 22147
REPARENT: /tags/gcc-2_95-release 28338 /trunk 26993 /branches/gcc-2_95-branch 28337
REPARENT: /tags/gcc-2_95_1-release 28722 /trunk 26993 /branches/gcc-2_95-branch 28721
REPARENT: /tags/gcc-2_95_2-release 30161 /trunk 26993 /branches/gcc-2_95-branch 30160
REPARENT: /tags/gcc-2_95_2_1-release 38099 /trunk 26993 /branches/gcc-2_95_2_1-branch 38098
REPARENT: /tags/gcc-2_95_3 40553 /trunk 26993 /branches/gcc-2_95-branch 40552
REPARENT: /tags/gcc-2_95_3-test1 38596 /trunk 26993 /branches/gcc-2_95-branch 38595
REPARENT: /tags/gcc-2_95_3-test2 38947 /trunk 26993 /branches/gcc-2_95-branch 38946
REPARENT: /tags/gcc-2_95_3-test3 39266 /trunk 26993 /branches/gcc-2_95-branch 39265
REPARENT: /tags/gcc-2_95_3-test4 39882 /trunk 26993 /branches/gcc-2_95-branch 39881
REPARENT: /tags/gcc-2_95_3-test5 40410 /trunk 26993 /branches/gcc-2_95-branch 40409
REPARENT: /tags/gcc-2_95_test 28256 /trunk 26993 /branches/gcc-2_95-branch 28255
REPARENT: /tags/gcc_3_0_1_release 45040 /trunk 39596 /branches/gcc-3_0-branch 45039
REPARENT: /tags/gcc_3_0_2_release 46438 /trunk 39596 /branches/gcc-3_0-branch 46437
REPARENT: /tags/gcc_3_0_3_release 48213 /trunk 39596 /branches/gcc-3_0-branch 48212
REPARENT: /tags/gcc_3_0_4_release 49907 /trunk 39596 /branches/gcc-3_0-branch 49906
REPARENT: /tags/gcc_3_0_release 43431 /trunk 39596 /branches/gcc-3_0-branch 43430
REPARENT: /tags/gcc_3_1_1_release 55766 /trunk 50029 /branches/gcc-3_1-branch 55765
REPARENT: /tags/gcc_3_2_1_release 59268 /trunk 50029 /branches/gcc-3_2-branch 59267
REPARENT: /tags/gcc_3_2_2_release 62431 /trunk 50029 /branches/gcc-3_2-branch 62430
REPARENT: /tags/gcc_3_2_3_release 65932 /trunk 50029 /branches/gcc-3_2-branch 65931
REPARENT: /tags/gcc_3_2_release 56290 /trunk 50029 /branches/gcc-3_2-branch 56289
REPARENT: /tags/gcc_3_3_1_release 70145 /trunk 60111 /branches/gcc-3_3-branch 70142
REPARENT: /tags/gcc_3_3_2_release 72569 /trunk 60111 /branches/gcc-3_3-branch 72568
REPARENT: /tags/gcc_3_3_3_release 77826 /trunk 60111 /branches/gcc-3_3-branch 77825
REPARENT: /tags/gcc_3_3_4_release 82514 /trunk 60111 /branches/gcc-3_3-branch 82513
REPARENT: /tags/gcc_3_3_5_release 88340 /trunk 60111 /branches/gcc-3_3-branch 88339
REPARENT: /tags/gcc_3_3_6_release 99150 /trunk 60111 /branches/gcc-3_3-branch 99149
REPARENT: /tags/gcc_3_3_release 66792 /trunk 60111 /branches/gcc-3_3-branch 66791
REPARENT: /tags/gcc_3_4_0_release 80844 /trunk 75991 /branches/gcc-3_4-branch 80842
REPARENT: /tags/gcc_3_4_1_release 83996 /trunk 75991 /branches/gcc-3_4-branch 83995
REPARENT: /tags/gcc_3_4_2_release 87129 /trunk 75991 /branches/gcc-3_4-branch 87128
REPARENT: /tags/gcc_3_4_3_release 90112 /trunk 75991 /branches/gcc-3_4-branch 90109
REPARENT: /tags/gcc_3_4_4_release 99965 /trunk 75991 /branches/gcc-3_4-branch 99964
REPARENT: /tags/gcc_4_0_0_release 98492 /trunk 95529 /branches/gcc-4_0-branch 98491
REPARENT: /tags/gcc_4_0_1_release 101728 /trunk 95529 /branches/gcc-4_0-branch 101726

Manual:

REPARENT: /tags/gcc-2_8_1-RELEASE 18395 /trunk 14564 /branches/premerge-fsf-branch 18394
REPARENT: /tags/gcc_3_1_release 54955 /trunk 50029 /branches/gcc-3_1-branch 53469
REPARENT: /tags/gcc_4_0_2_release 104510 /trunk 95529 /branches/gcc-4_0-branch 104479
REPARENT: /tags/libgcj-2_95-release 28380 /branches/CYGNUS 26267 /branches/libgcj-2_95-branch 28379
REPARENT: /tags/libgcj-2_95_1-release 28802 /branches/CYGNUS 26267 /branches/libgcj-2_95-branch 28801

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Fixing cvs2svn branchpoints

Eric S. Raymond
Joseph Myers <[hidden email]>:
> And here are corresponding lists of tags where the commit cvs2svn
> generated for the tag should be reparented.

Make that issue 2, please.  Also, open an issue 3 about how you want those
mid-branch deletes handled.  I agree that the right thing is just to nuke
them, but I have a lot of plates in the air right now...

Also please open reposurgeon issues about the svnmerge properties and the
missing documentation.  I might get to the svnmerge thing today, it
should be a trivial tweak.

The repository comparison is still grinding.  It has turned up some
content mismatches, fewer than last time, most in trunk/libgo.

The reason for the "fewer" is that the Go version has learned how to
correctly handle a corner case the Python did not - tag/branch delete
followed by a recreation at a different root point.  That's why this
is commented out in the lift script:

# Squash accidental trunk deletion and recreation.
# Should no longer be needed due to branch recoloring.
#<130803.1>,<138077>,<184996.1> squash

I used to have to find defects like that by hand and patch them. Now
there's a recoloring phase where branches and tags with multiple
creations are handled by renaming all but the last such branch in each
clique to a unique nonce name.  This makes all the results from branch
copies come out right, and none of the nonce names are ever visible in
the final conversion.

I'll go dive into the defect analysis now.
--
                <a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


Reply | Threaded
Open this post in threaded view
|

Re: Fixing cvs2svn branchpoints

Joseph Myers
On Sat, 2 Nov 2019, Eric S. Raymond wrote:

> Joseph Myers <[hidden email]>:
> > And here are corresponding lists of tags where the commit cvs2svn
> > generated for the tag should be reparented.
>
> Make that issue 2, please.

Done.

> Also, open an issue 3 about how you want those
> mid-branch deletes handled.  I agree that the right thing is just to nuke
> them, but I have a lot of plates in the air right now...

Which mid-branch deletes?  For the ones by accident (e.g. the deletions of
trunk), where the branch was recreated by copying from the pre-deletion
version of the same branch, nuking the deletes is clearly right.  For the
ones where a branch was deleted then recreated as a copy not from the
deleted version - essentially, rebasing done in SVN - maybe we need
community discussion of the right approach.  (There are two plausible
approaches there - either just discard all the deleted versions that
aren't part of the SVN history of the most recent creation of the branch,
which makes the list of commits in the branch's history in git look
similar to what it looks like in SVN, or treat deletion + recreation in
that case as some kind of merge.)

> Also please open reposurgeon issues about the svnmerge properties

As I understand it, support for that has now been implemented.

> and the missing documentation.

https://gitlab.com/esr/reposurgeon/issues/151 filed - it's a lot more than
just reparent for which documentation appears to have disappeared.

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Fixing cvs2svn branchpoints

Eric S. Raymond
Joseph Myers <[hidden email]>:

> Which mid-branch deletes?  For the ones by accident (e.g. the deletions of
> trunk), where the branch was recreated by copying from the pre-deletion
> version of the same branch, nuking the deletes is clearly right.  For the
> ones where a branch was deleted then recreated as a copy not from the
> deleted version - essentially, rebasing done in SVN - maybe we need
> community discussion of the right approach.  (There are two plausible
> approaches there - either just discard all the deleted versions that
> aren't part of the SVN history of the most recent creation of the branch,
> which makes the list of commits in the branch's history in git look
> similar to what it looks like in SVN, or treat deletion + recreation in
> that case as some kind of merge.)

To get content right, reposurgeon has to run through all nodes looking for
branches with more than one creation.  For each such clique, it has to change
all instances but the last so that the branch has a unique nonce name,
then run forward and patch all copy references to the each branch to use
the nonce name.

Only the last branch in each clique will be visible (and not renamed)
in the git conversion.  But the earlier branches can't simply be
nuked, as they might be (and typically are) referenced by branch
copies done before the final branch in the clique was created.

This might sound like it will get the special case of a trunk
delete/recreate wrong.  But when git imports a stream it does its own
branch recoloring based on tip resets and parent-child-relationships;
we can expect trunk to be (effectively) re-colored back to the root commit.

(This whole mess around branch re-creation is something other
conversion tools don't even try to get right.)

The other case - where you delete a target branch and copy a different
source branch over it - is simpler.  Because branch names in the
git conversion are controlled by the SVN repository pathname (root becomes
master, branches/foo becomes branch foo, etc), this looks exactly like
an ordinary modification of the target branch.

Presently, the fact of the copy is not recorded in the DAG. I could express
it as a git merge link; that wouldn't be difficult.

> > Also please open reposurgeon issues about the svnmerge properties
>
> As I understand it, support for that has now been implemented.

It has, yes.

> > and the missing documentation.
>
> https://gitlab.com/esr/reposurgeon/issues/151 filed - it's a lot more than
> just reparent for which documentation appears to have disappeared.

A large chunk of the section on surgical comands vanished, probably
due to a finger error wgile I was editing the translation.  I have
restored it.
--
                <a href="http://www.catb.org/~esr/">Eric S. Raymond</a>