[Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

classic Classic list List threaded Threaded
103 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|

[Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Maxim Kuvyrkov-7
This patch adds scripts to contrib/ to migrate full history of GCC's subversion repository to git.  My hope is that these scripts will finally allow GCC project to migrate to Git.

The result of the conversion is at https://github.com/maxim-kuvyrkov/gcc/branches/all .  Branches with "@rev" suffixes represent branch points.  The conversion is still running, so not all branches may appear right away.

The scripts are not specific to GCC repo and are usable for other projects.  In particular, they should be able to convert downstream GCC svn repos.

The scripts convert svn history branch by branch.  They rely on git-svn on convert individual branches.  Git-svn is a good tool for converting individual branches.  It is, however, either very slow at converting the entire GCC repo, or goes into infinite loop.

There are 3 scripts:

- svn-git-repo.sh: top level script to convert entire repo or a part of it (e.g., branches/),
- svn-list-branches.sh: helper script to output branches and their parents in bottom-up order,
- svn-git-branch.sh: helper script to convert a single branch.

Whenever possible, svn-git-branch.sh uses existing git branches as caches.

What are your questions and comments?

The attached is cleaned up version, which hasn't been fully tested yet; typos and other silly mistakes are likely.  OK to commit after testing?

--
Maxim Kuvyrkov
www.linaro.org



0001-Contrib-SVN-Git-conversion-scripts.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Segher Boessenkool
On Tue, May 14, 2019 at 07:11:18PM +0300, Maxim Kuvyrkov wrote:
> This patch adds scripts to contrib/ to migrate full history of GCC's
> subversion repository to git.  My hope is that these scripts will
> finally allow GCC project to migrate to Git.

Thank you for doing this.

> The result of the conversion is at
> https://github.com/maxim-kuvyrkov/gcc/branches/all .  Branches with
> "@rev" suffixes represent branch points.  The conversion is still
> running, so not all branches may appear right away.

What exactly is a branch point here?  Why is it useful to have tags
at branch points?  Why did you make branches instead of tags?


Only very lightly tested so far, but it looks promising.


Segher
Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Maxim Kuvyrkov-7
> On May 15, 2019, at 12:20 AM, Segher Boessenkool <[hidden email]> wrote:

>
> On Tue, May 14, 2019 at 07:11:18PM +0300, Maxim Kuvyrkov wrote:
>> This patch adds scripts to contrib/ to migrate full history of GCC's
>> subversion repository to git.  My hope is that these scripts will
>> finally allow GCC project to migrate to Git.
>
> Thank you for doing this.
>
>> The result of the conversion is at
>> https://github.com/maxim-kuvyrkov/gcc/branches/all .  Branches with
>> "@rev" suffixes represent branch points.  The conversion is still
>> running, so not all branches may appear right away.
>
> What exactly is a branch point here?
Branch point corresponds to parent branch's revision at fork.

>  Why is it useful to have tags
> at branch points?

This is to speedup git-svn, which creates uses such entries internally.  We need them for conversion's internals; I deleted them from github copy to avoid clutter.

>  Why did you make branches instead of tags?

For simplicity purposes, it's internals after all.

>
> Only very lightly tested so far, but it looks promising.
>
>
> Segher

I've fixed several cleanup bugs.  Updated patch attached.

--
Maxim Kuvyrkov
www.linaro.org



0001-Contrib-SVN-Git-conversion-scripts.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Richard Biener-2
In reply to this post by Maxim Kuvyrkov-7
On Tue, May 14, 2019 at 6:11 PM Maxim Kuvyrkov
<[hidden email]> wrote:

>
> This patch adds scripts to contrib/ to migrate full history of GCC's subversion repository to git.  My hope is that these scripts will finally allow GCC project to migrate to Git.
>
> The result of the conversion is at https://github.com/maxim-kuvyrkov/gcc/branches/all .  Branches with "@rev" suffixes represent branch points.  The conversion is still running, so not all branches may appear right away.
>
> The scripts are not specific to GCC repo and are usable for other projects.  In particular, they should be able to convert downstream GCC svn repos.
>
> The scripts convert svn history branch by branch.  They rely on git-svn on convert individual branches.  Git-svn is a good tool for converting individual branches.  It is, however, either very slow at converting the entire GCC repo, or goes into infinite loop.
>
> There are 3 scripts:
>
> - svn-git-repo.sh: top level script to convert entire repo or a part of it (e.g., branches/),
> - svn-list-branches.sh: helper script to output branches and their parents in bottom-up order,
> - svn-git-branch.sh: helper script to convert a single branch.
>
> Whenever possible, svn-git-branch.sh uses existing git branches as caches.
>
> What are your questions and comments?

Any comments on how it deals with "errors" like removing trunk which
happened a few times?
(not sure what other "errors" Eric refers to reposurgeon "deals" with...)

I suppose it converts only history of not deleted branches?

For the official converted repo do we really want all (old)
development branches to be in the
main git repo?  I suppose we could create a readonly git from the
state of the whole repository
at the point of conversion (and also keep the SVN in readonly mode),
just to make migration
of content we want easy in the future?

> The attached is cleaned up version, which hasn't been fully tested yet; typos and other silly mistakes are likely.  OK to commit after testing?

Thanks for taking up this ball!

Richard.

> --
> Maxim Kuvyrkov
> www.linaro.org
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Maxim Kuvyrkov-7
> On May 15, 2019, at 2:19 PM, Richard Biener <[hidden email]> wrote:
>
> On Tue, May 14, 2019 at 6:11 PM Maxim Kuvyrkov
> <[hidden email]> wrote:
>>
>> This patch adds scripts to contrib/ to migrate full history of GCC's subversion repository to git.  My hope is that these scripts will finally allow GCC project to migrate to Git.
>>
>> The result of the conversion is at https://github.com/maxim-kuvyrkov/gcc/branches/all . Branches with "@rev" suffixes represent branch points.  The conversion is still running, so not all branches may appear right away.
>>
>> The scripts are not specific to GCC repo and are usable for other projects.  In particular, they should be able to convert downstream GCC svn repos.
>>
>> The scripts convert svn history branch by branch.  They rely on git-svn on convert individual branches.  Git-svn is a good tool for converting individual branches.  It is, however, either very slow at converting the entire GCC repo, or goes into infinite loop.
>>
>> There are 3 scripts:
>>
>> - svn-git-repo.sh: top level script to convert entire repo or a part of it (e.g., branches/),
>> - svn-list-branches.sh: helper script to output branches and their parents in bottom-up order,
>> - svn-git-branch.sh: helper script to convert a single branch.
>>
>> Whenever possible, svn-git-branch.sh uses existing git branches as caches.
>>
>> What are your questions and comments?
>
> Any comments on how it deals with "errors" like removing trunk which
> happened a few times?
> (not sure what other "errors" Eric refers to reposurgeon "deals" with...)

Stock git-svn can deal with deleted parents; e.g., for the first deletion of trunk, git-svn treats trunk@180802 as a /generic/ parent path for trunk, and happily follows its history.

>
> I suppose it converts only history of not deleted branches?

The scripts can convert history of deleted and moved branches.  E.g., branches/gcc-3_2-rhl8-branch was moved (which is copy and delete for svn) to branches/redhat/gcc-3_2-rhl8-branch around revision 95470, so one would need to point the scripts to branches/gcc-3_2-rhl8-branch@95470 to convert its history.  Something like:

./svn-git-repo.sh --repo $HOME/gcc-branches --svnpath branches/gcc-3_2-rhl8-branch@95470

>
> For the official converted repo do we really want all (old)
> development branches to be in the
> main git repo?  I suppose we could create a readonly git from the
> state of the whole repository
> at the point of conversion (and also keep the SVN in readonly mode),
> just to make migration
> of content we want easy in the future?

Having a single full repo is simpler than having the main repo and the full one with all the history.  So, unless full repo is twice the size of the main one, let's keep all the branches.

We can also give a shout to representatives of RedHat, Google, and others to voluntarily remove their old maintenance branches from the repo, and, possibly, stash them somewhere on github.

>
>> The attached is cleaned up version, which hasn't been fully tested yet; typos and other silly mistakes are likely.  OK to commit after testing?
>
> Thanks for taking up this ball!

--
Maxim Kuvyrkov
www.linaro.org




Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Eric Gallager
On 5/15/19, Maxim Kuvyrkov <[hidden email]> wrote:

>> On May 15, 2019, at 2:19 PM, Richard Biener <[hidden email]>
>> wrote:
>>
>> On Tue, May 14, 2019 at 6:11 PM Maxim Kuvyrkov
>> <[hidden email]> wrote:
>>>
>>> This patch adds scripts to contrib/ to migrate full history of GCC's
>>> subversion repository to git.  My hope is that these scripts will finally
>>> allow GCC project to migrate to Git.
>>>
>>> The result of the conversion is at
>>> https://github.com/maxim-kuvyrkov/gcc/branches/all . Branches with "@rev"
>>> suffixes represent branch points.  The conversion is still running, so
>>> not all branches may appear right away.
>>>
>>> The scripts are not specific to GCC repo and are usable for other
>>> projects.  In particular, they should be able to convert downstream GCC
>>> svn repos.
>>>
>>> The scripts convert svn history branch by branch.  They rely on git-svn
>>> on convert individual branches.  Git-svn is a good tool for converting
>>> individual branches.  It is, however, either very slow at converting the
>>> entire GCC repo, or goes into infinite loop.
>>>
>>> There are 3 scripts:
>>>
>>> - svn-git-repo.sh: top level script to convert entire repo or a part of
>>> it (e.g., branches/),
>>> - svn-list-branches.sh: helper script to output branches and their
>>> parents in bottom-up order,
>>> - svn-git-branch.sh: helper script to convert a single branch.
>>>
>>> Whenever possible, svn-git-branch.sh uses existing git branches as
>>> caches.
>>>
>>> What are your questions and comments?
>>
>> Any comments on how it deals with "errors" like removing trunk which
>> happened a few times?
>> (not sure what other "errors" Eric refers to reposurgeon "deals" with...)
>
> Stock git-svn can deal with deleted parents; e.g., for the first deletion of
> trunk, git-svn treats trunk@180802 as a /generic/ parent path for trunk, and
> happily follows its history.
>
>>
>> I suppose it converts only history of not deleted branches?
>
> The scripts can convert history of deleted and moved branches.  E.g.,
> branches/gcc-3_2-rhl8-branch was moved (which is copy and delete for svn) to
> branches/redhat/gcc-3_2-rhl8-branch around revision 95470, so one would need
> to point the scripts to branches/gcc-3_2-rhl8-branch@95470 to convert its
> history.  Something like:
>
> ./svn-git-repo.sh --repo $HOME/gcc-branches --svnpath
> branches/gcc-3_2-rhl8-branch@95470
>
>>
>> For the official converted repo do we really want all (old)
>> development branches to be in the
>> main git repo?  I suppose we could create a readonly git from the
>> state of the whole repository
>> at the point of conversion (and also keep the SVN in readonly mode),
>> just to make migration
>> of content we want easy in the future?
>
> Having a single full repo is simpler than having the main repo and the full
> one with all the history.  So, unless full repo is twice the size of the
> main one, let's keep all the branches.
>
> We can also give a shout to representatives of RedHat, Google, and others to
> voluntarily remove their old maintenance branches from the repo, and,
> possibly, stash them somewhere on github.
>
>>
>>> The attached is cleaned up version, which hasn't been fully tested yet;
>>> typos and other silly mistakes are likely.  OK to commit after testing?
>>
>> Thanks for taking up this ball!
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>

Wasn't Eric S. Raymond working on his own conversion of the GCC repo
from SVN to Git? Whatever happened to his?
Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Segher Boessenkool
In reply to this post by Maxim Kuvyrkov-7
On Wed, May 15, 2019 at 11:34:34AM +0300, Maxim Kuvyrkov wrote:

> > On May 15, 2019, at 12:20 AM, Segher Boessenkool <[hidden email]> wrote:
> > On Tue, May 14, 2019 at 07:11:18PM +0300, Maxim Kuvyrkov wrote:
> >> This patch adds scripts to contrib/ to migrate full history of GCC's
> >> subversion repository to git.  My hope is that these scripts will
> >> finally allow GCC project to migrate to Git.
> >
> > Thank you for doing this.
> >
> >> The result of the conversion is at
> >> https://github.com/maxim-kuvyrkov/gcc/branches/all .  Branches with
> >> "@rev" suffixes represent branch points.  The conversion is still
> >> running, so not all branches may appear right away.
> >
> > What exactly is a branch point here?
>
> Branch point corresponds to parent branch's revision at fork.
>
> >  Why is it useful to have tags
> > at branch points?
>
> This is to speedup git-svn, which creates uses such entries internally.  We need them for conversion's internals; I deleted them from github copy to avoid clutter.

Ah!  Great.  Looks better now :-)

Has it finished conversion yet?  I don't see all branches.


Segher
Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Paul Koning-6
In reply to this post by Eric Gallager


> On May 15, 2019, at 2:42 PM, Eric Gallager <[hidden email]> wrote:
>
>> ...
>
> Wasn't Eric S. Raymond working on his own conversion of the GCC repo
> from SVN to Git? Whatever happened to his?

Yes, and from what I recall he found that doing it fully correctly is an extremely hard task.  It might be a good idea to ask him to comment.

        paul

Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Maxim Kuvyrkov-7
In reply to this post by Segher Boessenkool
> On May 15, 2019, at 9:47 PM, Segher Boessenkool <[hidden email]> wrote:
>
> On Wed, May 15, 2019 at 11:34:34AM +0300, Maxim Kuvyrkov wrote:
>>> On May 15, 2019, at 12:20 AM, Segher Boessenkool <[hidden email]> wrote:
>>> On Tue, May 14, 2019 at 07:11:18PM +0300, Maxim Kuvyrkov wrote:
>>>> This patch adds scripts to contrib/ to migrate full history of GCC's
>>>> subversion repository to git.  My hope is that these scripts will
>>>> finally allow GCC project to migrate to Git.
>>>
>>> Thank you for doing this.
>>>
>>>> The result of the conversion is at
>>>> https://github.com/maxim-kuvyrkov/gcc/branches/all .  Branches with
>>>> "@rev" suffixes represent branch points.  The conversion is still
>>>> running, so not all branches may appear right away.
>>>
>>> What exactly is a branch point here?
>>
>> Branch point corresponds to parent branch's revision at fork.
>>
>>> Why is it useful to have tags
>>> at branch points?
>>
>> This is to speedup git-svn, which creates uses such entries internally.  We need them for conversion's internals; I deleted them from github copy to avoid clutter.
>
> Ah!  Great.  Looks better now :-)
>
> Has it finished conversion yet?  I don't see all branches.

Still running.  I had to restart it a few times to fix bugs in the corner cases and to speed it up.  Luckily, the scripts seem to be able to pick up where they left off, so I restarts are relatively cheap.

For those interested in fixes and changes between scripts versions, I'm uploading updated patches to https://review.linaro.org/#/c/toolchain/gcc/+/31416/ .

--
Maxim Kuvyrkov
www.linaro.org

Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Maxim Kuvyrkov-7
In reply to this post by Paul Koning-6
> On May 16, 2019, at 3:33 AM, Paul Koning <[hidden email]> wrote:
>
>
>
>> On May 15, 2019, at 2:42 PM, Eric Gallager <[hidden email]> wrote:
>>
>>> ...
>>
>> Wasn't Eric S. Raymond working on his own conversion of the GCC repo
>> from SVN to Git? Whatever happened to his?
>
> Yes, and from what I recall he found that doing it fully correctly is an extremely hard task.  It might be a good idea to ask him to comment.

That's a good suggestion; thanks, Paul.

Hi Eric,

The svn->git conversion scripts I'm testing work on individual svn branches, and I would appreciate a list of svn branches in GCC's repo that caused problems.  It would be best to double-check conversion of these branches for any artifacts.

Regards,

--
Maxim Kuvyrkov
www.linaro.org

Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Jeff Law
In reply to this post by Richard Biener-2
On 5/15/19 5:19 AM, Richard Biener wrote:
>
> For the official converted repo do we really want all (old)
> development branches to be in the
> main git repo?  I suppose we could create a readonly git from the
> state of the whole repository
> at the point of conversion (and also keep the SVN in readonly mode),
> just to make migration
> of content we want easy in the future?
I've always assumed we'd keep the old SVN tree read-only for historical
purposes.  I strongly suspect that, ignoring release branches, that
non-active branches just aren't terribly interesting.


Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Maxim Kuvyrkov-7
> On May 16, 2019, at 7:22 PM, Jeff Law <[hidden email]> wrote:
>
> On 5/15/19 5:19 AM, Richard Biener wrote:
>>
>> For the official converted repo do we really want all (old)
>> development branches to be in the
>> main git repo?  I suppose we could create a readonly git from the
>> state of the whole repository
>> at the point of conversion (and also keep the SVN in readonly mode),
>> just to make migration
>> of content we want easy in the future?
> I've always assumed we'd keep the old SVN tree read-only for historical
> purposes.  I strongly suspect that, ignoring release branches, that
> non-active branches just aren't terribly interesting.

Let's avoid mixing the two discussions: (1) converting svn repo to git (and getting community consensus to switch to git) and (2) deciding on which branches to keep in the new repo.

With git, we can always split away unneeded history by removing unnecessary branches and tags and re-packing the repo.  We can equally easily bring that history back if we change our minds.

--
Maxim Kuvyrkov
www.linaro.org

Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Ramana Radhakrishnan [on liliput]
On Thu, May 16, 2019 at 5:41 PM Maxim Kuvyrkov
<[hidden email]> wrote:

>
> > On May 16, 2019, at 7:22 PM, Jeff Law <[hidden email]> wrote:
> >
> > On 5/15/19 5:19 AM, Richard Biener wrote:
> >>
> >> For the official converted repo do we really want all (old)
> >> development branches to be in the
> >> main git repo?  I suppose we could create a readonly git from the
> >> state of the whole repository
> >> at the point of conversion (and also keep the SVN in readonly mode),
> >> just to make migration
> >> of content we want easy in the future?
> > I've always assumed we'd keep the old SVN tree read-only for historical
> > purposes.  I strongly suspect that, ignoring release branches, that
> > non-active branches just aren't terribly interesting.
>
> Let's avoid mixing the two discussions: (1) converting svn repo to git (and getting community consensus to switch to git) and (2) deciding on which branches to keep in the new repo.
>

I'm hoping that there is still community consensus to switch to git.

Personally speaking, a +1 to switch to git.

regards
Ramana

> With git, we can always split away unneeded history by removing unnecessary branches and tags and re-packing the repo.  We can equally easily bring that history back if we change our minds.
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>
Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Jeff Law
On 5/16/19 12:36 PM, Ramana Radhakrishnan wrote:

> On Thu, May 16, 2019 at 5:41 PM Maxim Kuvyrkov
> <[hidden email]> wrote:
>>
>>> On May 16, 2019, at 7:22 PM, Jeff Law <[hidden email]> wrote:
>>>
>>> On 5/15/19 5:19 AM, Richard Biener wrote:
>>>>
>>>> For the official converted repo do we really want all (old)
>>>> development branches to be in the
>>>> main git repo?  I suppose we could create a readonly git from the
>>>> state of the whole repository
>>>> at the point of conversion (and also keep the SVN in readonly mode),
>>>> just to make migration
>>>> of content we want easy in the future?
>>> I've always assumed we'd keep the old SVN tree read-only for historical
>>> purposes.  I strongly suspect that, ignoring release branches, that
>>> non-active branches just aren't terribly interesting.
>>
>> Let's avoid mixing the two discussions: (1) converting svn repo to git (and getting community consensus to switch to git) and (2) deciding on which branches to keep in the new repo.
>>
>
> I'm hoping that there is still community consensus to switch to git.
>
> Personally speaking, a +1 to switch to git.
Absolutely +1 for converting as well.

jeff
Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Jonathan Wakely-3
On 16/05/19 13:07 -0600, Jeff Law wrote:

>On 5/16/19 12:36 PM, Ramana Radhakrishnan wrote:
>> On Thu, May 16, 2019 at 5:41 PM Maxim Kuvyrkov
>> <[hidden email]> wrote:
>>>
>>>> On May 16, 2019, at 7:22 PM, Jeff Law <[hidden email]> wrote:
>>>>
>>>> On 5/15/19 5:19 AM, Richard Biener wrote:
>>>>>
>>>>> For the official converted repo do we really want all (old)
>>>>> development branches to be in the
>>>>> main git repo?  I suppose we could create a readonly git from the
>>>>> state of the whole repository
>>>>> at the point of conversion (and also keep the SVN in readonly mode),
>>>>> just to make migration
>>>>> of content we want easy in the future?
>>>> I've always assumed we'd keep the old SVN tree read-only for historical
>>>> purposes.  I strongly suspect that, ignoring release branches, that
>>>> non-active branches just aren't terribly interesting.
>>>
>>> Let's avoid mixing the two discussions: (1) converting svn repo to git (and getting community consensus to switch to git) and (2) deciding on which branches to keep in the new repo.
>>>
>>
>> I'm hoping that there is still community consensus to switch to git.
>>
>> Personally speaking, a +1 to switch to git.
>Absolutely +1 for converting as well.

Yes please!

Thanks for working on this, Maxim.


Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Joseph Myers
In reply to this post by Maxim Kuvyrkov-7
On Tue, 14 May 2019, Maxim Kuvyrkov wrote:

> The scripts convert svn history branch by branch.  They rely on git-svn
> on convert individual branches.  Git-svn is a good tool for converting
> individual branches.  It is, however, either very slow at converting the
> entire GCC repo, or goes into infinite loop.

I think git-svn is in fact a bad tool for repository conversion when the
history is nontrivial (for the reasons that have been discussed at length
in the past), and we should convert with reposurgeon.

ESR, can you give an update on the status of the conversion with
reposurgeon?  You said "another serious attack on the repository
conversion is probably about two months out" in
<https://gcc.gnu.org/ml/gcc/2018-12/msg00112.html>.  Is it on target to be
done by the time of the GNU Tools Cauldron in Montreal in September?

And, could you bring git://thyrsus.com/repositories/gcc-conversion.git up
to date with changes since Jan 2018, or push the latest version of that
repository to some other public hosting location?  That repository
represents what I consider the collaboratively built consensus on such
things as the desired author map (including handling of the ambiguous
author name), which directories represent branches and tags, and what tags
should be kept or removed - but building up such a consensus and keeping
it up to date over time (for new committers etc.) requires that the public
repository actually reflects the latest version of the conversion
machinery, day by day as the consensus develops.  Review of that
repository will be important for reviewing the details of whether the
conversion is being done as desired - the details of the machinery will
help suggest things to spot-check in a converted repository.

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Joseph Myers
In reply to this post by Maxim Kuvyrkov-7
On Thu, 16 May 2019, Maxim Kuvyrkov wrote:

> Let's avoid mixing the two discussions: (1) converting svn repo to git
> (and getting community consensus to switch to git) and (2) deciding on
> which branches to keep in the new repo.
>
> With git, we can always split away unneeded history by removing
> unnecessary branches and tags and re-packing the repo.  We can equally
> easily bring that history back if we change our minds.

A prerequisite of a move to git is to have policies on branch deletion /
force-pushes, and hook implementations that ensure those policies are
followed (as well as implementing what's agreed on commit messages,
Bugzilla updates, etc.).  There has of course been a lot of past
discussion of those that someone will need to find, read and describe the
issues and conclusions from.  I think there was a view that branch
deletion and force-pushes should be limited to a particular namespace for
user branches.

(I support a move to git, but not one using git-svn, and only one that
properly takes into account the large amount of work previously done on
author maps, understanding the repository peculiarities and how to
correctly identify exactly which directories are branches or tags, fixing
cases where there are both a branch and tag of the same name, identifying
which tags to remove and which to keep, etc.)

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Richard Sandiford-9
Joseph Myers <[hidden email]> writes:

> On Thu, 16 May 2019, Maxim Kuvyrkov wrote:
>
>> Let's avoid mixing the two discussions: (1) converting svn repo to git
>> (and getting community consensus to switch to git) and (2) deciding on
>> which branches to keep in the new repo.
>>
>> With git, we can always split away unneeded history by removing
>> unnecessary branches and tags and re-packing the repo.  We can equally
>> easily bring that history back if we change our minds.
>
> A prerequisite of a move to git is to have policies on branch deletion /
> force-pushes, and hook implementations that ensure those policies are
> followed (as well as implementing what's agreed on commit messages,
> Bugzilla updates, etc.).  There has of course been a lot of past
> discussion of those that someone will need to find, read and describe the
> issues and conclusions from.  I think there was a view that branch
> deletion and force-pushes should be limited to a particular namespace for
> user branches.

We're not starting from scratch on that though.  The public git
(semi-)mirror has been going for a long time, so IMO we should just
inherit the policies for that.  (Like you say, forced pushed are
restricted to the user namespace.)  Policies can evoluve over time :-)

Agreeing on a format for commit messages would be good, but IMO it's
a separate improvement to the repo discussion.  We don't have an agreed
format for SVN commit messages either, and although it's not ideal,
it doesn't make SVN unworkable.  The same would be true for git.
Whatever policy we come up with can't apply retrospectively anyway,
so the full git history is always going to have a mixture of styles.

And I think that's the major downside of putting so many barriers
in the way of the conversion.  Switching to git without new commit
message guidelines might not be perfect, but if we'd done it two years
ago anyway, people would have been committing (mostly) git-friendly
commits since then, even if the messages weren't very consistent.
Whereas at the moment, many commit messages are neither git-friendly
nor consistent.  And that's going to continue to be the case until
we switch.

So although the intention of these requirements seems to be to make the
final git history as good as it can be, I think in practice it's having
the opposite effect.

> (I support a move to git, but not one using git-svn, and only one that
> properly takes into account the large amount of work previously done on
> author maps, understanding the repository peculiarities and how to
> correctly identify exactly which directories are branches or tags, fixing
> cases where there are both a branch and tag of the same name, identifying
> which tags to remove and which to keep, etc.)

But the discussion upthread seemed to be that having the very old stuff
in git wasn't necessarily that important anyway.

FWIW, I've been using the "official" git-svn based mirror for at least
the last five years, only using SVN to actually commit.  And I've never
needed any of the above during that time.

E.g. having proper author names seems like a nice-to-have rather than
a requirement.  A lot of the effort spent on compiling that list seemed
to be getting names and email addresses for people who haven't contributed
to gcc for a long time (in some cases 20 years or more).  It's interesting
historical data, but in almost all cases, the email addresses used are
going to be defunct anyway.

It would be a really neat project to create a GCC git repo that goes
far back in time and gives the closest illusion possible that git had
been used all that time.  And personally I'd be very interested in
seeing that.  But its main use would be as a historical artefact,
to show how a long-running software project evolved over time.

I think the focus for the development git repo should be on what's
needed for day-to-day work, and like I say, the git-svn mirror we
have now is in practice a good enough conversion for that.  If we can
do better then great.  But I think we're in serious danger of making the
best the enemy of the good here.

The big advantage of Maxim's approach is that it's a public script in
our own repo that anyone can contribute to.  So if there are specific
tweaks people want to make, there's now the opportunity to do that.

So FWIW, my vote would be for having a window to allow people to tweak
the script if they want to, then make the switch.

Thanks,
Richard
Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Martin Liška-2
In reply to this post by Jonathan Wakely-3
On 5/17/19 12:04 AM, Jonathan Wakely wrote:

> On 16/05/19 13:07 -0600, Jeff Law wrote:
>> On 5/16/19 12:36 PM, Ramana Radhakrishnan wrote:
>>> On Thu, May 16, 2019 at 5:41 PM Maxim Kuvyrkov
>>> <[hidden email]> wrote:
>>>>
>>>>> On May 16, 2019, at 7:22 PM, Jeff Law <[hidden email]> wrote:
>>>>>
>>>>> On 5/15/19 5:19 AM, Richard Biener wrote:
>>>>>>
>>>>>> For the official converted repo do we really want all (old)
>>>>>> development branches to be in the
>>>>>> main git repo?  I suppose we could create a readonly git from the
>>>>>> state of the whole repository
>>>>>> at the point of conversion (and also keep the SVN in readonly mode),
>>>>>> just to make migration
>>>>>> of content we want easy in the future?
>>>>> I've always assumed we'd keep the old SVN tree read-only for historical
>>>>> purposes.  I strongly suspect that, ignoring release branches, that
>>>>> non-active branches just aren't terribly interesting.
>>>>
>>>> Let's avoid mixing the two discussions: (1) converting svn repo to git (and getting community consensus to switch to git) and (2) deciding on which branches to keep in the new repo.
>>>>
>>>
>>> I'm hoping that there is still community consensus to switch to git.
>>>
>>> Personally speaking, a +1 to switch to git.
>> Absolutely +1 for converting as well.
>
> Yes please!
>
> Thanks for working on this, Maxim.
>
>

I fully support that and thank you Maxim for working on that!

Martin
Reply | Threaded
Open this post in threaded view
|

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Martin Liška-2
In reply to this post by Joseph Myers
On 5/17/19 1:06 AM, Joseph Myers wrote:
> That repository
> represents what I consider the collaboratively built consensus on such
> things as the desired author map (including handling of the ambiguous
> author name), which directories represent branches and tags, and what tags
> should be kept or removed - but building up such a consensus and keeping

About the map. I agree with Richard that we should do best approach and not
to fully reconstruct history of people who has switched email address multi
times. I cloned git://thyrsus.com/repositories/gcc-conversion.git and made
a clean up:

- for logins with duplicite emails I chose the latest one used on gcc-patches mailing list
- comments were removed
- a few entries contained timezone and I stripped that

Final version of the map can be seen here:
https://github.com/marxin/gcc-git-conversion/blob/cleanup/gcc.map

@Maxim: would it be possible to update your script so that it will use:
--authors-file=gcc.map ?

Is it desired for the transition to use the author map? Do we want it?

Martin

1234 ... 6