Mehdi Amini | d6afe38 | 2016-10-12 23:02:02 +0000 | [diff] [blame] | 1 | ============================== |
| 2 | Moving LLVM Projects to GitHub |
| 3 | ============================== |
| 4 | |
| 5 | .. contents:: Table of Contents |
| 6 | :depth: 4 |
| 7 | :local: |
| 8 | |
| 9 | Introduction |
| 10 | ============ |
| 11 | |
| 12 | This is a proposal to move our current revision control system from our own |
| 13 | hosted Subversion to GitHub. Below are the financial and technical arguments as |
| 14 | to why we are proposing such a move and how people (and validation |
| 15 | infrastructure) will continue to work with a Git-based LLVM. |
| 16 | |
| 17 | There will be a survey pointing at this document which we'll use to gauge the |
| 18 | community's reaction and, if we collectively decide to move, the time-frame. Be |
| 19 | sure to make your view count. |
| 20 | |
| 21 | Additionally, we will discuss this during a BoF at the next US LLVM Developer |
| 22 | meeting (http://llvm.org/devmtg/2016-11/). |
| 23 | |
| 24 | What This Proposal is *Not* About |
| 25 | ================================= |
| 26 | |
| 27 | Changing the development policy. |
| 28 | |
| 29 | This proposal relates only to moving the hosting of our source-code repository |
| 30 | from SVN hosted on our own servers to Git hosted on GitHub. We are not proposing |
| 31 | using GitHub's issue tracker, pull-requests, or code-review. |
| 32 | |
Sylvestre Ledru | 1d6becb | 2017-01-14 11:37:01 +0000 | [diff] [blame] | 33 | Contributors will continue to earn commit access on demand under the Developer |
Mehdi Amini | d6afe38 | 2016-10-12 23:02:02 +0000 | [diff] [blame] | 34 | Policy, except that that a GitHub account will be required instead of SVN |
| 35 | username/password-hash. |
| 36 | |
| 37 | Why Git, and Why GitHub? |
| 38 | ======================== |
| 39 | |
| 40 | Why Move At All? |
| 41 | ---------------- |
| 42 | |
| 43 | This discussion began because we currently host our own Subversion server |
| 44 | and Git mirror on a voluntary basis. The LLVM Foundation sponsors the server and |
| 45 | provides limited support, but there is only so much it can do. |
| 46 | |
| 47 | Volunteers are not sysadmins themselves, but compiler engineers that happen |
| 48 | to know a thing or two about hosting servers. We also don't have 24/7 support, |
| 49 | and we sometimes wake up to see that continuous integration is broken because |
| 50 | the SVN server is either down or unresponsive. |
| 51 | |
| 52 | We should take advantage of one of the services out there (GitHub, GitLab, |
| 53 | and BitBucket, among others) that offer better service (24/7 stability, disk |
| 54 | space, Git server, code browsing, forking facilities, etc) for free. |
| 55 | |
| 56 | Why Git? |
| 57 | -------- |
| 58 | |
| 59 | Many new coders nowadays start with Git, and a lot of people have never used |
| 60 | SVN, CVS, or anything else. Websites like GitHub have changed the landscape |
| 61 | of open source contributions, reducing the cost of first contribution and |
| 62 | fostering collaboration. |
| 63 | |
| 64 | Git is also the version control many LLVM developers use. Despite the |
| 65 | sources being stored in a SVN server, these developers are already using Git |
| 66 | through the Git-SVN integration. |
| 67 | |
| 68 | Git allows you to: |
| 69 | |
| 70 | * Commit, squash, merge, and fork locally without touching the remote server. |
| 71 | * Maintain local branches, enabling multiple threads of development. |
| 72 | * Collaborate on these branches (e.g. through your own fork of llvm on GitHub). |
| 73 | * Inspect the repository history (blame, log, bisect) without Internet access. |
| 74 | * Maintain remote forks and branches on Git hosting services and |
| 75 | integrate back to the main repository. |
| 76 | |
| 77 | In addition, because Git seems to be replacing many OSS projects' version |
| 78 | control systems, there are many tools that are built over Git. |
| 79 | Future tooling may support Git first (if not only). |
| 80 | |
| 81 | Why GitHub? |
| 82 | ----------- |
| 83 | |
| 84 | GitHub, like GitLab and BitBucket, provides free code hosting for open source |
| 85 | projects. Any of these could replace the code-hosting infrastructure that we |
| 86 | have today. |
| 87 | |
| 88 | These services also have a dedicated team to monitor, migrate, improve and |
| 89 | distribute the contents of the repositories depending on region and load. |
| 90 | |
| 91 | GitHub has one important advantage over GitLab and |
| 92 | BitBucket: it offers read-write **SVN** access to the repository |
| 93 | (https://github.com/blog/626-announcing-svn-support). |
| 94 | This would enable people to continue working post-migration as though our code |
| 95 | were still canonically in an SVN repository. |
| 96 | |
| 97 | In addition, there are already multiple LLVM mirrors on GitHub, indicating that |
| 98 | part of our community has already settled there. |
| 99 | |
| 100 | On Managing Revision Numbers with Git |
| 101 | ------------------------------------- |
| 102 | |
| 103 | The current SVN repository hosts all the LLVM sub-projects alongside each other. |
| 104 | A single revision number (e.g. r123456) thus identifies a consistent version of |
| 105 | all LLVM sub-projects. |
| 106 | |
| 107 | Git does not use sequential integer revision number but instead uses a hash to |
| 108 | identify each commit. (Linus mentioned that the lack of such revision number |
| 109 | is "the only real design mistake" in Git [TorvaldRevNum]_.) |
| 110 | |
| 111 | The loss of a sequential integer revision number has been a sticking point in |
| 112 | past discussions about Git: |
| 113 | |
| 114 | - "The 'branch' I most care about is mainline, and losing the ability to say |
| 115 | 'fixed in r1234' (with some sort of monotonically increasing number) would |
| 116 | be a tragic loss." [LattnerRevNum]_ |
| 117 | - "I like those results sorted by time and the chronology should be obvious, but |
| 118 | timestamps are incredibly cumbersome and make it difficult to verify that a |
| 119 | given checkout matches a given set of results." [TrickRevNum]_ |
| 120 | - "There is still the major regression with unreadable version numbers. |
| 121 | Given the amount of Bugzilla traffic with 'Fixed in...', that's a |
| 122 | non-trivial issue." [JSonnRevNum]_ |
| 123 | - "Sequential IDs are important for LNT and llvmlab bisection tool." [MatthewsRevNum]_. |
| 124 | |
| 125 | However, Git can emulate this increasing revision number: |
Mehdi Amini | 303d287 | 2016-10-17 19:23:19 +0000 | [diff] [blame] | 126 | ``git rev-list --count <commit-hash>``. This identifier is unique only |
| 127 | within a single branch, but this means the tuple `(num, branch-name)` uniquely |
| 128 | identifies a commit. |
Mehdi Amini | d6afe38 | 2016-10-12 23:02:02 +0000 | [diff] [blame] | 129 | |
| 130 | We can thus use this revision number to ensure that e.g. `clang -v` reports a |
| 131 | user-friendly revision number (e.g. `master-12345` or `4.0-5321`), addressing |
| 132 | the objections raised above with respect to this aspect of Git. |
| 133 | |
| 134 | What About Branches and Merges? |
| 135 | ------------------------------- |
| 136 | |
| 137 | In contrast to SVN, Git makes branching easy. Git's commit history is |
| 138 | represented as a DAG, a departure from SVN's linear history. However, we propose |
| 139 | to mandate making merge commits illegal in our canonical Git repository. |
| 140 | |
| 141 | Unfortunately, GitHub does not support server side hooks to enforce such a |
| 142 | policy. We must rely on the community to avoid pushing merge commits. |
| 143 | |
| 144 | GitHub offers a feature called `Status Checks`: a branch protected by |
| 145 | `status checks` requires commits to be whitelisted before the push can happen. |
| 146 | We could supply a pre-push hook on the client side that would run and check the |
| 147 | history, before whitelisting the commit being pushed [statuschecks]_. |
| 148 | However this solution would be somewhat fragile (how do you update a script |
| 149 | installed on every developer machine?) and prevents SVN access to the |
| 150 | repository. |
| 151 | |
| 152 | What About Commit Emails? |
| 153 | ------------------------- |
| 154 | |
| 155 | We will need a new bot to send emails for each commit. This proposal leaves the |
| 156 | email format unchanged besides the commit URL. |
| 157 | |
| 158 | Straw Man Migration Plan |
| 159 | ======================== |
| 160 | |
| 161 | Step #1 : Before The Move |
| 162 | ------------------------- |
| 163 | |
| 164 | 1. Update docs to mention the move, so people are aware of what is going on. |
| 165 | 2. Set up a read-only version of the GitHub project, mirroring our current SVN |
| 166 | repository. |
| 167 | 3. Add the required bots to implement the commit emails, as well as the |
| 168 | umbrella repository update (if the multirepo is selected) or the read-only |
| 169 | Git views for the sub-projects (if the monorepo is selected). |
| 170 | |
| 171 | Step #2 : Git Move |
| 172 | ------------------ |
| 173 | |
| 174 | 4. Update the buildbots to pick up updates and commits from the GitHub |
| 175 | repository. Not all bots have to migrate at this point, but it'll help |
| 176 | provide infrastructure testing. |
| 177 | 5. Update Phabricator to pick up commits from the GitHub repository. |
| 178 | 6. LNT and llvmlab have to be updated: they rely on unique monotonically |
| 179 | increasing integer across branch [MatthewsRevNum]_. |
| 180 | 7. Instruct downstream integrators to pick up commits from the GitHub |
| 181 | repository. |
| 182 | 8. Review and prepare an update for the LLVM documentation. |
| 183 | |
| 184 | Until this point nothing has changed for developers, it will just |
| 185 | boil down to a lot of work for buildbot and other infrastructure |
| 186 | owners. |
| 187 | |
| 188 | The migration will pause here until all dependencies have cleared, and all |
| 189 | problems have been solved. |
| 190 | |
| 191 | Step #3: Write Access Move |
| 192 | -------------------------- |
| 193 | |
| 194 | 9. Collect developers' GitHub account information, and add them to the project. |
| 195 | 10. Switch the SVN repository to read-only and allow pushes to the GitHub repository. |
| 196 | 11. Update the documentation. |
| 197 | 12. Mirror Git to SVN. |
| 198 | |
| 199 | Step #4 : Post Move |
| 200 | ------------------- |
| 201 | |
| 202 | 13. Archive the SVN repository. |
| 203 | 14. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to |
| 204 | point to GitHub instead. |
| 205 | |
| 206 | One or Multiple Repositories? |
| 207 | ============================= |
| 208 | |
| 209 | There are two major variants for how to structure our Git repository: The |
| 210 | "multirepo" and the "monorepo". |
| 211 | |
| 212 | Multirepo Variant |
| 213 | ----------------- |
| 214 | |
| 215 | This variant recommends moving each LLVM sub-project to a separate Git |
| 216 | repository. This mimics the existing official read-only Git repositories |
| 217 | (e.g., http://llvm.org/git/compiler-rt.git), and creates new canonical |
| 218 | repositories for each sub-project. |
| 219 | |
| 220 | This will allow the individual sub-projects to remain distinct: a |
| 221 | developer interested only in compiler-rt can checkout only this repository, |
| 222 | build it, and work in isolation of the other sub-projects. |
| 223 | |
| 224 | A key need is to be able to check out multiple projects (i.e. lldb+clang+llvm or |
| 225 | clang+llvm+libcxx for example) at a specific revision. |
| 226 | |
| 227 | A tuple of revisions (one entry per repository) accurately describes the state |
| 228 | across the sub-projects. |
| 229 | For example, a given version of clang would be |
| 230 | *<LLVM-12345, clang-5432, libcxx-123, etc.>*. |
| 231 | |
| 232 | Umbrella Repository |
| 233 | ^^^^^^^^^^^^^^^^^^^ |
| 234 | |
| 235 | To make this more convenient, a separate *umbrella* repository will be |
| 236 | provided. This repository will be used for the sole purpose of understanding |
| 237 | the sequence in which commits were pushed to the different repositories and to |
| 238 | provide a single revision number. |
| 239 | |
| 240 | This umbrella repository will be read-only and continuously updated |
| 241 | to record the above tuple. The proposed form to record this is to use Git |
| 242 | [submodules]_, possibly along with a set of scripts to help check out a |
| 243 | specific revision of the LLVM distribution. |
| 244 | |
| 245 | A regular LLVM developer does not need to interact with the umbrella repository |
| 246 | -- the individual repositories can be checked out independently -- but you would |
| 247 | need to use the umbrella repository to bisect multiple sub-projects at the same |
| 248 | time, or to check-out old revisions of LLVM with another sub-project at a |
| 249 | consistent state. |
| 250 | |
| 251 | This umbrella repository will be updated automatically by a bot (running on |
| 252 | notice from a webhook on every push, and periodically) on a per commit basis: a |
| 253 | single commit in the umbrella repository would match a single commit in a |
| 254 | sub-project. |
| 255 | |
| 256 | Living Downstream |
| 257 | ^^^^^^^^^^^^^^^^^ |
| 258 | |
| 259 | Downstream SVN users can use the read/write SVN bridges with the following |
| 260 | caveats: |
| 261 | |
| 262 | * Be prepared for a one-time change to the upstream revision numbers. |
| 263 | * The upstream sub-project revision numbers will no longer be in sync. |
| 264 | |
| 265 | Downstream Git users can continue without any major changes, with the minor |
| 266 | change of upstreaming using `git push` instead of `git svn dcommit`. |
| 267 | |
| 268 | Git users also have the option of adopting an umbrella repository downstream. |
| 269 | The tooling for the upstream umbrella can easily be reused for downstream needs, |
| 270 | incorporating extra sub-projects and branching in parallel with sub-project |
| 271 | branches. |
| 272 | |
| 273 | Multirepo Preview |
| 274 | ^^^^^^^^^^^^^^^^^ |
| 275 | |
| 276 | As a preview (disclaimer: this rough prototype, not polished and not |
| 277 | representative of the final solution), you can look at the following: |
| 278 | |
| 279 | * Repository: https://github.com/llvm-beanz/llvm-submodules |
| 280 | * Update bot: http://beanz-bot.com:8180/jenkins/job/submodule-update/ |
| 281 | |
| 282 | Concerns |
| 283 | ^^^^^^^^ |
| 284 | |
| 285 | * Because GitHub does not allow server-side hooks, and because there is no |
| 286 | "push timestamp" in Git, the umbrella repository sequence isn't totally |
| 287 | exact: commits from different repositories pushed around the same time can |
| 288 | appear in different orders. However, we don't expect it to be the common case |
| 289 | or to cause serious issues in practice. |
| 290 | * You can't have a single cross-projects commit that would update both LLVM and |
| 291 | other sub-projects (something that can be achieved now). It would be possible |
| 292 | to establish a protocol whereby users add a special token to their commit |
| 293 | messages that causes the umbrella repo's updater bot to group all of them |
| 294 | into a single revision. |
| 295 | * Another option is to group commits that were pushed closely enough together |
| 296 | in the umbrella repository. This has the advantage of allowing cross-project |
| 297 | commits, and is less sensitive to mis-ordering commits. However, this has the |
| 298 | potential to group unrelated commits together, especially if the bot goes |
| 299 | down and needs to catch up. |
| 300 | * This variant relies on heavier tooling. But the current prototype shows that |
| 301 | it is not out-of-reach. |
| 302 | * Submodules don't have a good reputation / are complicating the command line. |
| 303 | However, in the proposed setup, a regular developer will seldom interact with |
| 304 | submodules directly, and certainly never update them. |
| 305 | * Refactoring across projects is not friendly: taking some functions from clang |
| 306 | to make it part of a utility in libSupport wouldn't carry the history of the |
| 307 | code in the llvm repo, preventing recursively applying `git blame` for |
| 308 | instance. However, this is not very different than how most people are |
| 309 | Interacting with the repository today, by splitting such change in multiple |
| 310 | commits. |
| 311 | |
| 312 | Workflows |
| 313 | ^^^^^^^^^ |
| 314 | |
| 315 | * :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`. |
| 316 | * :ref:`Checkout/Clone a Single Project, with Commit Access <workflow-multicheckout-nocommit>`. |
| 317 | * :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-multicheckout-multicommit>`. |
| 318 | * :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`. |
| 319 | * :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-multi-branching>`. |
| 320 | * :ref:`Bisecting <workflow-multi-bisecting>`. |
| 321 | |
| 322 | Monorepo Variant |
| 323 | ---------------- |
| 324 | |
| 325 | This variant recommends moving all LLVM sub-projects to a single Git repository, |
| 326 | similar to https://github.com/llvm-project/llvm-project. |
| 327 | This would mimic an export of the current SVN repository, with each sub-project |
| 328 | having its own top-level directory. |
| 329 | Not all sub-projects are used for building toolchains. In practice, www/ |
| 330 | and test-suite/ will probably stay out of the monorepo. |
| 331 | |
| 332 | Putting all sub-projects in a single checkout makes cross-project refactoring |
| 333 | naturally simple: |
| 334 | |
| 335 | * New sub-projects can be trivially split out for better reuse and/or layering |
| 336 | (e.g., to allow libSupport and/or LIT to be used by runtimes without adding a |
| 337 | dependency on LLVM). |
| 338 | * Changing an API in LLVM and upgrading the sub-projects will always be done in |
| 339 | a single commit, designing away a common source of temporary build breakage. |
| 340 | * Moving code across sub-project (during refactoring for instance) in a single |
| 341 | commit enables accurate `git blame` when tracking code change history. |
| 342 | * Tooling based on `git grep` works natively across sub-projects, allowing to |
| 343 | easier find refactoring opportunities across projects (for example reusing a |
| 344 | datastructure initially in LLDB by moving it into libSupport). |
| 345 | * Having all the sources present encourages maintaining the other sub-projects |
| 346 | when changing API. |
| 347 | |
| 348 | Finally, the monorepo maintains the property of the existing SVN repository that |
| 349 | the sub-projects move synchronously, and a single revision number (or commit |
| 350 | hash) identifies the state of the development across all projects. |
| 351 | |
| 352 | .. _build_single_project: |
| 353 | |
| 354 | Building a single sub-project |
| 355 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 356 | |
| 357 | Nobody will be forced to build unnecessary projects. The exact structure |
| 358 | is TBD, but making it trivial to configure builds for a single sub-project |
| 359 | (or a subset of sub-projects) is a hard requirement. |
| 360 | |
| 361 | As an example, it could look like the following:: |
| 362 | |
| 363 | mkdir build && cd build |
| 364 | # Configure only LLVM (default) |
| 365 | cmake path/to/monorepo |
| 366 | # Configure LLVM and lld |
| 367 | cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=lld |
| 368 | # Configure LLVM and clang |
| 369 | cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=clang |
| 370 | |
| 371 | .. _git-svn-mirror: |
| 372 | |
| 373 | Read/write sub-project mirrors |
Mehdi Amini | d48e8c6 | 2016-10-12 23:36:11 +0000 | [diff] [blame] | 374 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Mehdi Amini | d6afe38 | 2016-10-12 23:02:02 +0000 | [diff] [blame] | 375 | |
| 376 | With the Monorepo, the existing single-subproject mirrors (e.g. |
| 377 | http://llvm.org/git/compiler-rt.git) with git-svn read-write access would |
| 378 | continue to be maintained: developers would continue to be able to use the |
| 379 | existing single-subproject git repositories as they do today, with *no changes |
| 380 | to workflow*. Everything (git fetch, git svn dcommit, etc.) could continue to |
| 381 | work identically to how it works today. The monorepo can be set-up such that the |
| 382 | SVN revision number matches the SVN revision in the GitHub SVN-bridge. |
| 383 | |
| 384 | Living Downstream |
| 385 | ^^^^^^^^^^^^^^^^^ |
| 386 | |
| 387 | Downstream SVN users can use the read/write SVN bridge. The SVN revision |
| 388 | number can be preserved in the monorepo, minimizing the impact. |
| 389 | |
| 390 | Downstream Git users can continue without any major changes, by using the |
| 391 | git-svn mirrors on top of the SVN bridge. |
| 392 | |
| 393 | Git users can also work upstream with monorepo even if their downstream |
| 394 | fork has split repositories. They can apply patches in the appropriate |
| 395 | subdirectories of the monorepo using, e.g., `git am --directory=...`, or |
| 396 | plain `diff` and `patch`. |
| 397 | |
| 398 | Alternatively, Git users can migrate their own fork to the monorepo. As a |
| 399 | demonstration, we've migrated the "CHERI" fork to the monorepo in two ways: |
| 400 | |
| 401 | * Using a script that rewrites history (including merges) so that it looks |
| 402 | like the fork always lived in the monorepo [LebarCHERI]_. The upside of |
| 403 | this is when you check out an old revision, you get a copy of all llvm |
| 404 | sub-projects at a consistent revision. (For instance, if it's a clang |
| 405 | fork, when you check out an old revision you'll get a consistent version |
| 406 | of llvm proper.) The downside is that this changes the fork's commit |
| 407 | hashes. |
| 408 | |
| 409 | * Merging the fork into the monorepo [AminiCHERI]_. This preserves the |
| 410 | fork's commit hashes, but when you check out an old commit you only get |
| 411 | the one sub-project. |
| 412 | |
| 413 | Monorepo Preview |
| 414 | ^^^^^^^^^^^^^^^^^ |
| 415 | |
| 416 | As a preview (disclaimer: this rough prototype, not polished and not |
| 417 | representative of the final solution), you can look at the following: |
| 418 | |
| 419 | * Full Repository: https://github.com/joker-eph/llvm-project |
| 420 | * Single sub-project view with *SVN write access* to the full repo: |
| 421 | https://github.com/joker-eph/compiler-rt |
| 422 | |
| 423 | Concerns |
| 424 | ^^^^^^^^ |
| 425 | |
| 426 | * Using the monolithic repository may add overhead for those contributing to a |
| 427 | standalone sub-project, particularly on runtimes like libcxx and compiler-rt |
| 428 | that don't rely on LLVM; currently, a fresh clone of libcxx is only 15MB (vs. |
| 429 | 1GB for the monorepo), and the commit rate of LLVM may cause more frequent |
| 430 | `git push` collisions when upstreaming. Affected contributors can continue to |
| 431 | use the SVN bridge or the single-subproject Git mirrors with git-svn for |
| 432 | read-write. |
| 433 | * Using the monolithic repository may add overhead for those *integrating* a |
| 434 | standalone sub-project, even if they aren't contributing to it, due to the |
| 435 | same disk space concern as the point above. The availability of the |
Sylvestre Ledru | 1d6becb | 2017-01-14 11:37:01 +0000 | [diff] [blame] | 436 | sub-project Git mirror addresses this, even without SVN access. |
Mehdi Amini | d6afe38 | 2016-10-12 23:02:02 +0000 | [diff] [blame] | 437 | * Preservation of the existing read/write SVN-based workflows relies on the |
| 438 | GitHub SVN bridge, which is an extra dependency. Maintaining this locks us |
| 439 | into GitHub and could restrict future workflow changes. |
| 440 | |
| 441 | Workflows |
| 442 | ^^^^^^^^^ |
| 443 | |
| 444 | * :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`. |
| 445 | * :ref:`Checkout/Clone a Single Project, with Commit Access <workflow-monocheckout-nocommit>`. |
| 446 | * :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-monocheckout-multicommit>`. |
| 447 | * :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`. |
| 448 | * :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-mono-branching>`. |
| 449 | * :ref:`Bisecting <workflow-mono-bisecting>`. |
| 450 | |
| 451 | Multi/Mono Hybrid Variant |
| 452 | ------------------------- |
| 453 | |
| 454 | This variant recommends moving only the LLVM sub-projects that are *rev-locked* |
| 455 | to LLVM into a monorepo (clang, lld, lldb, ...), following the multirepo |
| 456 | proposal for the rest. While neither variant recommends combining sub-projects |
| 457 | like www/ and test-suite/ (which are completely standalone), this goes further |
| 458 | and keeps sub-projects like libcxx and compiler-rt in their own distinct |
| 459 | repositories. |
| 460 | |
| 461 | Concerns |
| 462 | ^^^^^^^^ |
| 463 | |
| 464 | * This has most disadvantages of multirepo and monorepo, without bringing many |
| 465 | of the advantages. |
| 466 | * Downstream have to upgrade to the monorepo structure, but only partially. So |
| 467 | they will keep the infrastructure to integrate the other separate |
| 468 | sub-projects. |
| 469 | * All projects that use LIT for testing are effectively rev-locked to LLVM. |
| 470 | Furthermore, some runtimes (like compiler-rt) are rev-locked with Clang. |
| 471 | It's not clear where to draw the lines. |
| 472 | |
| 473 | |
| 474 | Workflow Before/After |
| 475 | ===================== |
| 476 | |
| 477 | This section goes through a few examples of workflows, intended to illustrate |
| 478 | how end-users or developers would interact with the repository for |
| 479 | various use-cases. |
| 480 | |
| 481 | .. _workflow-checkout-commit: |
| 482 | |
| 483 | Checkout/Clone a Single Project, without Commit Access |
| 484 | ------------------------------------------------------ |
| 485 | |
| 486 | Except the URL, nothing changes. The possibilities today are:: |
| 487 | |
| 488 | svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm |
| 489 | # or with Git |
| 490 | git clone http://llvm.org/git/llvm.git |
| 491 | |
| 492 | After the move to GitHub, you would do either:: |
| 493 | |
| 494 | git clone https://github.com/llvm-project/llvm.git |
| 495 | # or using the GitHub svn native bridge |
| 496 | svn co https://github.com/llvm-project/llvm/trunk |
| 497 | |
| 498 | The above works for both the monorepo and the multirepo, as we'll maintain the |
| 499 | existing read-only views of the individual sub-projects. |
| 500 | |
| 501 | Checkout/Clone a Single Project, with Commit Access |
| 502 | --------------------------------------------------- |
| 503 | |
| 504 | Currently |
| 505 | ^^^^^^^^^ |
| 506 | |
| 507 | :: |
| 508 | |
| 509 | # direct SVN checkout |
| 510 | svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm |
| 511 | # or using the read-only Git view, with git-svn |
| 512 | git clone http://llvm.org/git/llvm.git |
| 513 | cd llvm |
| 514 | git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username> |
| 515 | git config svn-remote.svn.fetch :refs/remotes/origin/master |
| 516 | git svn rebase -l # -l avoids fetching ahead of the git mirror. |
| 517 | |
| 518 | Commits are performed using `svn commit` or with the sequence `git commit` and |
| 519 | `git svn dcommit`. |
| 520 | |
| 521 | .. _workflow-multicheckout-nocommit: |
| 522 | |
| 523 | Multirepo Variant |
| 524 | ^^^^^^^^^^^^^^^^^ |
| 525 | |
| 526 | With the multirepo variant, nothing changes but the URL, and commits can be |
| 527 | performed using `svn commit` or `git commit` and `git push`:: |
| 528 | |
| 529 | git clone https://github.com/llvm/llvm.git llvm |
| 530 | # or using the GitHub svn native bridge |
| 531 | svn co https://github.com/llvm/llvm/trunk/ llvm |
| 532 | |
| 533 | .. _workflow-monocheckout-nocommit: |
| 534 | |
| 535 | Monorepo Variant |
| 536 | ^^^^^^^^^^^^^^^^ |
| 537 | |
| 538 | With the monorepo variant, there are a few options, depending on your |
| 539 | constraints. First, you could just clone the full repository:: |
| 540 | |
| 541 | git clone https://github.com/llvm/llvm-projects.git llvm |
| 542 | # or using the GitHub svn native bridge |
| 543 | svn co https://github.com/llvm/llvm-projects/trunk/ llvm |
| 544 | |
| 545 | At this point you have every sub-project (llvm, clang, lld, lldb, ...), which |
| 546 | :ref:`doesn't imply you have to build all of them <build_single_project>`. You |
| 547 | can still build only compiler-rt for instance. In this way it's not different |
| 548 | from someone who would check out all the projects with SVN today. |
| 549 | |
| 550 | You can commit as normal using `git commit` and `git push` or `svn commit`, and |
| 551 | read the history for a single project (`git log libcxx` for example). |
| 552 | |
| 553 | Secondly, there are a few options to avoid checking out all the sources. |
| 554 | |
| 555 | **Using the GitHub SVN bridge** |
| 556 | |
| 557 | The GitHub SVN native bridge allows to checkout a subdirectory directly: |
| 558 | |
| 559 | svn co https://github.com/llvm/llvm-projects/trunk/compiler-rt compiler-rt —username=... |
| 560 | |
| 561 | This checks out only compiler-rt and provides commit access using "svn commit", |
| 562 | in the same way as it would do today. |
| 563 | |
| 564 | **Using a Subproject Git Nirror** |
| 565 | |
| 566 | You can use *git-svn* and one of the sub-project mirrors:: |
| 567 | |
| 568 | # Clone from the single read-only Git repo |
| 569 | git clone http://llvm.org/git/llvm.git |
| 570 | cd llvm |
| 571 | # Configure the SVN remote and initialize the svn metadata |
| 572 | $ git svn init https://github.com/joker-eph/llvm-project/trunk/llvm —username=... |
| 573 | git config svn-remote.svn.fetch :refs/remotes/origin/master |
| 574 | git svn rebase -l |
| 575 | |
| 576 | In this case the repository contains only a single sub-project, and commits can |
| 577 | be made using `git svn dcommit`, again exactly as we do today. |
| 578 | |
| 579 | **Using a Sparse Checkouts** |
| 580 | |
| 581 | You can hide the other directories using a Git sparse checkout:: |
| 582 | |
| 583 | git config core.sparseCheckout true |
| 584 | echo /compiler-rt > .git/info/sparse-checkout |
| 585 | git read-tree -mu HEAD |
| 586 | |
| 587 | The data for all sub-projects is still in your `.git` directory, but in your |
| 588 | checkout, you only see `compiler-rt`. |
| 589 | Before you push, you'll need to fetch and rebase (`git pull --rebase`) as |
| 590 | usual. |
| 591 | |
| 592 | Note that when you fetch you'll likely pull in changes to sub-projects you don't |
| 593 | care about. If you are using spasre checkout, the files from other projects |
| 594 | won't appear on your disk. The only effect is that your commit hash changes. |
| 595 | |
| 596 | You can check whether the changes in the last fetch are relevant to your commit |
| 597 | by running:: |
| 598 | |
| 599 | git log origin/master@{1}..origin/master -- libcxx |
| 600 | |
| 601 | This command can be hidden in a script so that `git llvmpush` would perform all |
| 602 | these steps, fail only if such a dependent change exists, and show immediately |
| 603 | the change that prevented the push. An immediate repeat of the command would |
| 604 | (almost) certainly result in a successful push. |
| 605 | Note that today with SVN or git-svn, this step is not possible since the |
| 606 | "rebase" implicitly happens while committing (unless a conflict occurs). |
| 607 | |
| 608 | Checkout/Clone Multiple Projects, with Commit Access |
| 609 | ---------------------------------------------------- |
| 610 | |
| 611 | Let's look how to assemble llvm+clang+libcxx at a given revision. |
| 612 | |
| 613 | Currently |
| 614 | ^^^^^^^^^ |
| 615 | |
| 616 | :: |
| 617 | |
| 618 | svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm -r $REVISION |
| 619 | cd llvm/tools |
| 620 | svn co http://llvm.org/svn/llvm-project/clang/trunk clang -r $REVISION |
| 621 | cd ../projects |
| 622 | svn co http://llvm.org/svn/llvm-project/libcxx/trunk libcxx -r $REVISION |
| 623 | |
| 624 | Or using git-svn:: |
| 625 | |
| 626 | git clone http://llvm.org/git/llvm.git |
| 627 | cd llvm/ |
| 628 | git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username> |
| 629 | git config svn-remote.svn.fetch :refs/remotes/origin/master |
| 630 | git svn rebase -l |
| 631 | git checkout `git svn find-rev -B r258109` |
| 632 | cd tools |
| 633 | git clone http://llvm.org/git/clang.git |
| 634 | cd clang/ |
| 635 | git svn init https://llvm.org/svn/llvm-project/clang/trunk --username=<username> |
| 636 | git config svn-remote.svn.fetch :refs/remotes/origin/master |
| 637 | git svn rebase -l |
| 638 | git checkout `git svn find-rev -B r258109` |
| 639 | cd ../../projects/ |
| 640 | git clone http://llvm.org/git/libcxx.git |
| 641 | cd libcxx |
| 642 | git svn init https://llvm.org/svn/llvm-project/libcxx/trunk --username=<username> |
| 643 | git config svn-remote.svn.fetch :refs/remotes/origin/master |
| 644 | git svn rebase -l |
| 645 | git checkout `git svn find-rev -B r258109` |
| 646 | |
| 647 | Note that the list would be longer with more sub-projects. |
| 648 | |
| 649 | .. _workflow-multicheckout-multicommit: |
| 650 | |
| 651 | Multirepo Variant |
| 652 | ^^^^^^^^^^^^^^^^^ |
| 653 | |
| 654 | With the multirepo variant, the umbrella repository will be used. This is |
| 655 | where the mapping from a single revision number to the individual repositories |
| 656 | revisions is stored.:: |
| 657 | |
| 658 | git clone https://github.com/llvm-beanz/llvm-submodules |
| 659 | cd llvm-submodules |
| 660 | git checkout $REVISION |
| 661 | git submodule init |
| 662 | git submodule update clang llvm libcxx |
| 663 | # the list of sub-project is optional, `git submodule update` would get them all. |
| 664 | |
| 665 | At this point the clang, llvm, and libcxx individual repositories are cloned |
| 666 | and stored alongside each other. There are CMake flags to describe the directory |
| 667 | structure; alternatively, you can just symlink `clang` to `llvm/tools/clang`, |
| 668 | etc. |
| 669 | |
| 670 | Another option is to checkout repositories based on the commit timestamp:: |
| 671 | |
| 672 | git checkout `git rev-list -n 1 --before="2009-07-27 13:37" master` |
| 673 | |
| 674 | .. _workflow-monocheckout-multicommit: |
| 675 | |
| 676 | Monorepo Variant |
| 677 | ^^^^^^^^^^^^^^^^ |
| 678 | |
| 679 | The repository contains natively the source for every sub-projects at the right |
| 680 | revision, which makes this straightforward:: |
| 681 | |
| 682 | git clone https://github.com/llvm/llvm-projects.git llvm-projects |
| 683 | cd llvm-projects |
| 684 | git checkout $REVISION |
| 685 | |
| 686 | As before, at this point clang, llvm, and libcxx are stored in directories |
| 687 | alongside each other. |
| 688 | |
| 689 | .. _workflow-cross-repo-commit: |
| 690 | |
| 691 | Commit an API Change in LLVM and Update the Sub-projects |
| 692 | -------------------------------------------------------- |
| 693 | |
| 694 | Today this is possible, even though not common (at least not documented) for |
| 695 | subversion users and for git-svn users. For example, few Git users try to update |
| 696 | LLD or Clang in the same commit as they change an LLVM API. |
| 697 | |
| 698 | The multirepo variant does not address this: one would have to commit and push |
| 699 | separately in every individual repository. It would be possible to establish a |
| 700 | protocol whereby users add a special token to their commit messages that causes |
| 701 | the umbrella repo's updater bot to group all of them into a single revision. |
| 702 | |
| 703 | The monorepo variant handles this natively. |
| 704 | |
| 705 | Branching/Stashing/Updating for Local Development or Experiments |
| 706 | ---------------------------------------------------------------- |
| 707 | |
| 708 | Currently |
| 709 | ^^^^^^^^^ |
| 710 | |
| 711 | SVN does not allow this use case, but developers that are currently using |
| 712 | git-svn can do it. Let's look in practice what it means when dealing with |
| 713 | multiple sub-projects. |
| 714 | |
| 715 | To update the repository to tip of trunk:: |
| 716 | |
| 717 | git pull |
| 718 | cd tools/clang |
| 719 | git pull |
| 720 | cd ../../projects/libcxx |
| 721 | git pull |
| 722 | |
| 723 | To create a new branch:: |
| 724 | |
| 725 | git checkout -b MyBranch |
| 726 | cd tools/clang |
| 727 | git checkout -b MyBranch |
| 728 | cd ../../projects/libcxx |
| 729 | git checkout -b MyBranch |
| 730 | |
| 731 | To switch branches:: |
| 732 | |
| 733 | git checkout AnotherBranch |
| 734 | cd tools/clang |
| 735 | git checkout AnotherBranch |
| 736 | cd ../../projects/libcxx |
| 737 | git checkout AnotherBranch |
| 738 | |
| 739 | .. _workflow-multi-branching: |
| 740 | |
| 741 | Multirepo Variant |
| 742 | ^^^^^^^^^^^^^^^^^ |
| 743 | |
| 744 | The multirepo works the same as the current Git workflow: every command needs |
| 745 | to be applied to each of the individual repositories. |
| 746 | However, the umbrella repository makes this easy using `git submodule foreach` |
| 747 | to replicate a command on all the individual repositories (or submodules |
| 748 | in this case): |
| 749 | |
| 750 | To create a new branch:: |
| 751 | |
| 752 | git submodule foreach git checkout -b MyBranch |
| 753 | |
| 754 | To switch branches:: |
| 755 | |
| 756 | git submodule foreach git checkout AnotherBranch |
| 757 | |
| 758 | .. _workflow-mono-branching: |
| 759 | |
| 760 | Monorepo Variant |
| 761 | ^^^^^^^^^^^^^^^^ |
| 762 | |
| 763 | Regular Git commands are sufficient, because everything is in a single |
| 764 | repository: |
| 765 | |
| 766 | To update the repository to tip of trunk:: |
| 767 | |
| 768 | git pull |
| 769 | |
| 770 | To create a new branch:: |
| 771 | |
| 772 | git checkout -b MyBranch |
| 773 | |
| 774 | To switch branches:: |
| 775 | |
| 776 | git checkout AnotherBranch |
| 777 | |
| 778 | Bisecting |
| 779 | --------- |
| 780 | |
| 781 | Assuming a developer is looking for a bug in clang (or lld, or lldb, ...). |
| 782 | |
| 783 | Currently |
| 784 | ^^^^^^^^^ |
| 785 | |
| 786 | SVN does not have builtin bisection support, but the single revision across |
| 787 | sub-projects makes it possible to script around. |
| 788 | |
| 789 | Using the existing Git read-only view of the repositories, it is possible to use |
| 790 | the native Git bisection script over the llvm repository, and use some scripting |
| 791 | to synchronize the clang repository to match the llvm revision. |
| 792 | |
| 793 | .. _workflow-multi-bisecting: |
| 794 | |
| 795 | Multirepo Variant |
| 796 | ^^^^^^^^^^^^^^^^^ |
| 797 | |
| 798 | With the multi-repositories variant, the cross-repository synchronization is |
| 799 | achieved using the umbrella repository. This repository contains only |
| 800 | submodules for the other sub-projects. The native Git bisection can be used on |
| 801 | the umbrella repository directly. A subtlety is that the bisect script itself |
| 802 | needs to make sure the submodules are updated accordingly. |
| 803 | |
| 804 | For example, to find which commit introduces a regression where clang-3.9 |
| 805 | crashes but not clang-3.8 passes, one should be able to simply do:: |
| 806 | |
| 807 | git bisect start release_39 release_38 |
| 808 | git bisect run ./bisect_script.sh |
| 809 | |
| 810 | With the `bisect_script.sh` script being:: |
| 811 | |
| 812 | #!/bin/sh |
| 813 | cd $UMBRELLA_DIRECTORY |
| 814 | git submodule update llvm clang libcxx #.... |
| 815 | cd $BUILD_DIR |
| 816 | |
| 817 | ninja clang || exit 125 # an exit code of 125 asks "git bisect" |
| 818 | # to "skip" the current commit |
| 819 | |
| 820 | ./bin/clang some_crash_test.cpp |
| 821 | |
| 822 | When the `git bisect run` command returns, the umbrella repository is set to |
| 823 | the state where the regression is introduced. The commit diff in the umbrella |
| 824 | indicate which submodule was updated, and the last commit in this sub-projects |
| 825 | is the one that the bisect found. |
| 826 | |
| 827 | .. _workflow-mono-bisecting: |
| 828 | |
| 829 | Monorepo Variant |
| 830 | ^^^^^^^^^^^^^^^^ |
| 831 | |
| 832 | Bisecting on the monorepo is straightforward, and very similar to the above, |
| 833 | except that the bisection script does not need to include the |
| 834 | `git submodule update` step. |
| 835 | |
| 836 | The same example, finding which commit introduces a regression where clang-3.9 |
| 837 | crashes but not clang-3.8 passes, will look like:: |
| 838 | |
| 839 | git bisect start release_39 release_38 |
| 840 | git bisect run ./bisect_script.sh |
| 841 | |
| 842 | With the `bisect_script.sh` script being:: |
| 843 | |
| 844 | #!/bin/sh |
| 845 | cd $BUILD_DIR |
| 846 | |
| 847 | ninja clang || exit 125 # an exit code of 125 asks "git bisect" |
| 848 | # to "skip" the current commit |
| 849 | |
| 850 | ./bin/clang some_crash_test.cpp |
| 851 | |
| 852 | Also, since the monorepo handles commits update across multiple projects, you're |
| 853 | less like to encounter a build failure where a commit change an API in LLVM and |
| 854 | another later one "fixes" the build in clang. |
| 855 | |
| 856 | |
| 857 | References |
| 858 | ========== |
| 859 | |
| 860 | .. [LattnerRevNum] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041739.html |
| 861 | .. [TrickRevNum] Andrew Trick, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html |
| 862 | .. [JSonnRevNum] Joerg Sonnenberg, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html |
| 863 | .. [TorvaldRevNum] Linus Torvald, http://git.661346.n2.nabble.com/Git-commit-generation-numbers-td6584414.html |
| 864 | .. [MatthewsRevNum] Chris Matthews, http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html |
| 865 | .. [submodules] Git submodules, https://git-scm.com/book/en/v2/Git-Tools-Submodules) |
| 866 | .. [statuschecks] GitHub status-checks, https://help.github.com/articles/about-required-status-checks/ |
| 867 | .. [LebarCHERI] Port *CHERI* to a single repository rewriting history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html |
| 868 | .. [AminiCHERI] Port *CHERI* to a single repository preserving history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102804.html |