Cjku/git
Contents
Outline
- Commit graphs
- git directory - ref and object store
- git-add and git-commit - manipulate the index and commit tree
- git-reset and git-checkout - manipulate refs, the index and working directory
- git-merge and git-rebase
- Working with remote
- Extend git-commands
git directory
file system
Figure 1 depicts the tree architecture of git directory after git-clone command.
# Clone from a repository which has a README.md file and single commit only $ git clone https://github.com/CJKu/SingleCommit.git
class diagram
- Is it possible that a commit does not refer to another commit object?
- yes, the root commit object
- Is it possible that a commit does not refer to a tree?
- No, unless you step on a mine
- Is it possible that two commits point to the same tree?
- Yes. but these two commits can not be adjacent.
- Is it possible that HEAD is empty?
- Yes, after git-init and before the first commit
- Is it possible that a tree does not refer to any blobs?
- No, no content no tree. git stores content instead of file.
- Can I git-add a directory?
- Give me a break....
- Is it possible that a tree point to a commit?
- No, anti-acyclic and does not make sense at all.
- Is it possible that HEAD refers to branch name directly?
- Yes, enter detached branch state.
object type
Table 1 roughly describes four different object types. I will explain more detail in the next section. Just keep this table as a reference.
Object types | Description |
---|---|
Blobs | Binary large object. Hold only the content of a file, nothing more. |
Trees | Be similar to a file system directory. A tree object holds path names and identifiers(SHA1) of blobs and sub-tree objects. |
Commits | A snapshot of content change, which consist of parent commit(if any), the name of author and comitter, commit message, and identifier(SHA1) of new tree node. |
Tags | Human readable name of an object. |
Table 1. Object types.
object relation
In the beginning, let's create a repository, named sample_1, with single commit with following steps.
############################################### ## Create a simple commit graphs ############################################### $ mkdir sample_1 && cd sample_1 # Initiate git directory. $ git init $ echo a > a.txt && echo b > b.txt $ mkdir sub_dir && echo c > sub_dir/c.txt # push a.txt/b.txt/c.txt into the index. $ git add --all # from the index to a new commit object. $ git commit -m "add a.txt, b.txt, c.txt" [master (root-commit) 73b1c13] add a.txt, b.txt, c.txt 3 files changed, 3 insertions(+) create mode 100644 a.txt create mode 100644 b.txt create mode 100644 sub_dir/c.txt $ tree .git # or "find .git" 17 directories, 24 files
Figure 2 depicts the outline of the object store. The tree architecture on your local should be exact the same with figure 2, except the name of the commit object, that is because committer and author in your environment differ from mine, which lead to different SHA1.
When I said the name of something, it always means SHA1 of that object. In git directory, it's also equal to the file name in file sytem.
The next step is to figure out how many objects was been created and relation among them.
####################################################### ## Look into commit/tree/blob objects in object store. ####################################################### # HEAD(symref) refers to current branch, which is master $ cat .git/HEAD ref: refs/heads/master # master(ref) refers to the latest commit object $ cat .git/refs/heads/master ed0a8533886a3ca5810046506fec786648e89971 # SHA1 of the commit object # Alternatively, you may use rev-parse command to get SHA1 of the referee, which give you the same result. $ git rev-parse HEAD ed0a8533886a3ca5810046506fec786648e89971 $ git rev-parse master ed0a8533886a3ca5810046506fec786648e89971 # Read the type of object{ed0a8}. Generally, a branch points to a commit object. $ git cat-file -t ed0a8 commit # Read the content of commit{ed0a8}. $ git cat-file -p ed0a8 tree 92e349e2e492b65e8137a9b435c5360b51f16f24 author CJKu <cjcool.tw@gmail.com> 1414401560 +0800 committer CJKu <cjcool.tw@gmail.com> 1414401560 +0800 add a.txt, b.txt, c.txt # commit{ed0a8} refers to tree{92e34}. Read content of tree{92e34}. $ git cat-file -t 92e34 tree $ git cat-file -p 92e34 100644 blob 78981922613b2afb6025042ff6bd878ac1994e85 a.txt 100644 blob 61780798228d17af2d34fce4cfbdf35556832472 b.txt 040000 tree cf67e9ef3a0fc6d858423fc177f2fbbe985a6f17 sub_dir $ git cat-file -p cf67e 100644 blob f2ad6c76f0115a6ba5b00456a849810e7ec0af20 c.txt # Or, use ls-tree -r to dump all blobs in tree. $ git ls-tree -r 92e349e 100644 blob 78981922613b2afb6025042ff6bd878ac1994e85 a.txt 100644 blob 61780798228d17af2d34fce4cfbdf35556832472 b.txt 100644 blob f2ad6c76f0115a6ba5b00456a849810e7ec0af20 sub_dir/c.txt
git-add and git-commit
Linus Torvalds argued on the Git mailing list that you can’t grasp and fully appreciate the power of Git without first understanding the purpose of the index. - Version Control with Git, 2nd Edition, By: Jon Loeliger; Matthew McCullough, Publisher: O'Reilly Media, Inc.
So, it's extremly important to grok what index is. To prevent confusion, when a git user says
- Cache a file
- Put a file in the index
- Stage a file
All these statements mean the same thing: he called git-add command.
The index(referred as "stage area" sometimes) put things that you proposed for the next commit and does not contain file content. While a git-commit command submit, Git checks the index rather than your working directory to discover what to commit.
Here is a sample of using git-add and git-commit command to modify index and tree object. Figure 3 and 4 reveal git directory changes accordingly.
$ mkdir git_sample_cache $ cd git_sample_cache $ git init # Cache a.txt and b.txt and uses ls-files to peer into index $ echo a > a.txt && git add a.txt $ echo b > b.txt && git add b.txt $ git ls-files --stage 100644 78981922613b2afb6025042ff6bd878ac1994e85 0 a.txt 100644 61780798228d17af2d34fce4cfbdf35556832472 0 b.txt # Summit change. $ git commit -m "add" [master (root-commit) 9d262d6] add 2 files changed, 2 insertions(+) create mode 100644 a.txt create mode 100644 b.txt # Look into master-commit-tree chain. $ git rev-parse HEAD 9d262d6394b25573fb3287b5cd019281bc8dc3b8 $ git cat-file -t 9d262d6 commit $ git cat-file -p 9d262d6 tree f4b354863caa9cea99b95422c9dab70465757d87 author CJKu <cjcool.tw@gmail.com> 1414482450 +0800 committer CJKu <cjcool.tw@gmail.com> 1414482450 +0800 add $git cat-file -t f4b35486 tree $ git cat-file -p f4b354 100644 blob 78981922613b2afb6025042ff6bd878ac1994e85 a.txt 100644 blob 61780798228d17af2d34fce4cfbdf35556832472 b.txt # Modify content of a.txt and cache it. $ echo "add a line" >> a.txt $ git add a.txt # Print index. $ git ls-files --stage 100644 2915b75977f7d84d291f3329ce1cc251743a7c54 0 a.txt 100644 61780798228d17af2d34fce4cfbdf35556832472 0 b.txt $ git commit -m "modify" [master 0492bef] modify 1 file changed, 1 insertion(+) # Dig into new created tree. $ git cat-file -t 0492bef commit $ git cat-file -p 0492bef tree d3de1600ad651843b4659fc896c8686f76841824 parent 9d262d6394b25573fb3287b5cd019281bc8dc3b8 author CJKu <cjcool.tw@gmail.com> 1414483295 +0800 committer CJKu <cjcool.tw@gmail.com> 1414483295 +0800 modify $ git cat-file -t d3de160 tree $ git cat-file -p d3de160 100644 blob 2915b75977f7d84d291f3329ce1cc251743a7c54 a.txt 100644 blob 61780798228d17af2d34fce4cfbdf35556832472 b.txt
In previous example, by using git-add, we update change from working directory into the index; by using git-add, we update commit graphs by the index. Basically, that's what you need to know at high level. However, git-commmit does do many things, such as generate a new tree object and a commit object, add more edges in commit graphs, move refs. So..if you want to know more detailed...
############################################### ## git-commit ~= write-tree + commit-tree + reset ############################################### # There are two commit objects in store $ git log --graph --pretty=oneline --abbrev-commit * 43cf743 modify * 9d262d6 add # Reset to the first commit{"add"}, roll back index but don't touch working directory. $ git reset --mixed HEAD^ $ git log --graph --pretty=oneline --abbrev-commit * 9d262d6 add # Delete orphan objects. $ git gc # Add a.txt into index again. $ git add a.txt # Generate a tree object according to index $ git write-tree d3de1600ad651843b4659fc896c8686f76841824 # Confirm HEAD still refers to commit{SHA1: 9d262} $ git log --graph --pretty=oneline --abbrev-commit * 9d262d6 add # Create a new commit object associate with tree{SHA1: d3de1} and set parent as commit{SHA1: 9d262} $ echo -n "modify" | git commit-tree -p 9d262 d3de1 43cf743105b49941565f2295db6cf7aae6f2cdc1 # Move HEAD to this new commit{SHA1: 43cf7} $ git reset --soft 43cf7 # Done! Check log $ git log --graph --pretty=oneline --abbrev-commit * 43cf743 modify * 9d262d6 add
When you use git-checkout command to change current branch, the index will be recreated again. In fact, if you want, you may also manually destroy the index and recreate it. The next two examples explain how git regenerate the index by the root tree of a specific commit object.
############################################### ## reconstruct index via git-read-tree(plumbing) ############################################### # Take a look of index's content before wipe it out. $ git ls-files -s 100644 4331a357983b7acf195679e68e543405ef86cc15 0 README.md # Oops... delete the index occasionally. $ rm .git/index # get the identifier of commit object. $ git rev-parse HEAD 46fbc5468dab716b1baf2141b89780586a00556f $ git cat-file -p 46fbc54 tree 57a161d717cf98e78a5eea9a30f313b464fc0429 author CJKu <cku@mozilla.com> 1414423500 +0800 committer CJKu <cku@mozilla.com> 1414423500 +0800 Initial commit # git-read-tree - Reads tree information into the index # you can read-tree from a commit object, or the root tree object # Let's read from a commit object first<tree-ish>. $ git read-tree 46fbc546 # Hooray! index file returns back! $ git ls-files -s 100644 4331a357983b7acf195679e68e543405ef86cc15 0 README.md # Read again from the root tree node. $ rm .git/index $ git read-tree 57a161d7 $ git ls-files -s 100644 4331a357983b7acf195679e68e543405ef86cc15 0 README.md
In practice, using read-tree plumbing command is not encouraged. Reset, porcelain command, is more promising.
############################################### ## reconstruct index via git-reset(porcelain) ############################################### # Keep kill index file. $ rm .git/index # git reset [--mixed | --soft | --hard] [<commit>] # --hard reset HEAD, index and working tree # --mixed reset HEAD and index $ git reset --mixed HEAD $ git reset --hard HEAD
git-reset and git-checkout
git-checkout and git-reset are a bit confusing at times, at least for me. The functionalities of these two commands are overlapped:
- Both of them are able to move HEAD.
- You can recover what you change in working directory by either of them.
From high level, you may also notice some difference in between
- You can reset HEAD to a commit(a commit-ish) or branch(also, a commit-ish); You can checkout to a commit or a branch as well, though we usually git-checkout with a branch.
- git-checkout changes the current branch; while reset does not.
OK, let's take a look at reset first:
$ man git-reset $ git reset -h usage: git reset [--mixed | --soft | --hard] [<commit>] or: git reset <tree-ish> [--] <paths>...
The definition of <tree-ish> at [git-mamual-page]
- <object> indicates the object name for any type of object.
- <blob> indicates a blob object name.
- <tree> indicates a tree object name.
- <commit> indicates a commit object name.
- <tree-ish> indicates a tree, commit or tag object name. A command that takes a <tree-ish> argument ultimately wants to operate on a <tree> object but automatically dereferences <commit> and <tag> objects that point at a <tree>.
- <commit-ish> indicates a commit or tag object name. A command that takes a <commit-ish> argument ultimately wants to operate on a <commit> object but automatically dereferences <tag> objects that point at a <commit>.
Here are some valid examples of using git-reset command
$ git reset master -- README.txt
$ git reset master^ -- README.txt
$ git reset HEAD -- README.txt
$ git reset v1.0 -- README.txt # v1.0 is a tag name
$ git reset 0a11f0 -- README.txt # 0a11f0 is the name of a commit object
$ git reset 0b7123 -- README.txt # 0b7123 is the name of a tree object
$ git reset refs/heads/master -- README.txt
In short, a tree-ish is a thing that lead a way to a specific tree object. If you give a thing(name/ refname/ tag, ect...) to git, and git can resolve that thing to a unique tree object, then that thing is a tree-ish. In the following diagram, only the name of a blob is not a tree-ish, since git can not reach any tree object from a blob. I
There are two forms: reset HEAD to a <commit> and reset paths from a <tree-ish>. In the first case, you have tree options(--merge and --keep are ignore here)
git reset [--mixed | --soft | --hard] [<commit>]
Depend on how hard you want, different change scope applies:
- soft option: change the referee of the current branch
- index file and working directory keep untouched.
- scenario: squash change in several commits into one commit so that people think you are smart.
- mixed option: soft criteria; Plus, regenerate index according to the new tree.
- working directory keeps untouched.
- scenario: slice a single big commit to different small commits to haste review process.
- hard option: mixed criteria; Plus, overwrite all modified tracked files in working directory. (Note: only tracked and dirty files will be overwritten by hard-reset. Unchanged and untracked files are out of scope)
- scenario: overtime working, did lots of stupid things. You don't want to face what had done...
- scenario: recover context before a stale merge
Another usage of reset is to reset paths:
git reset [-q] <tree-ish> [--] <paths>
Using it while intending to recover files in working directory from a specific revision. Index or branch ref have nothing to do with this usage.
$ echo "Jerry" > a.txt $ git add a.txt $ git commit -m "initial commit" $ echo "is cool" >> a.txt $ cat a.txt Jerry is cool # Oops... that is definitely a typo. He is not cool at all. $ git reset HEAD -- a.txt $ cat a.txt Jerry
Now, move forward to checkout command:
$ git checkout -h
usage: git checkout [options] <branch>
or: git checkout [options] [<branch>] -- <file>...
or: git checkout [options] [<commit>]
No wonder, there are also two public forms in git-checkout: checkout HEAD to a branch or checkout files from a branch. I also appended a hidden usage, in red text color, after public forms. Let's talk about the first form: "git checkout [options] <branch>"
- (Different with git-reset) Change the referee(branch ref) of HEAD.
- (Different with git-reset) There is no hardness-choice in git-checkout.
- (__"Similar"__ to git-reset-hard) git creates the index according to the root tree of the new assigned commit, and updates working directory. Almost the same with git-reset-hard, huh? Yes, but only if you checkout from a clean working directory:
- git-reset-hard never abort, while git-checkout will.
- git-reset-hard compares the working directory and new tree, replace all modified tracked files in working directory by content in new tree; if your change in working directory does not cause any conflicts in checkout process, git will not erase the change. If the change indeed causes conflicts, then checkout aborting. Please see the next section for more details
At file/paths level, I did find any major differences between git-checkout and git-reset.
git reset <tree-ish> [--] <paths>... git checkout [options] [<branch>] -- <file>...
One thing that I want to bring up here is, comment for git-checkout is misleading. Here is the correct version
git checkout [options] [<tree-ish>] -- <file>...
Scott Chacon/Ben Straub wrote a fantastic article with regards to reset, and deserves you spend time on: http://git-scm.com/blog/2011/07/11/reset.html
checkout abort
I separate an individual section to discuss git-checkout aborting, since it's somehow annoyed to git new hands. Briefly speaking, if git think you may lose local change in working directory, it aborts your checkout command.
Longer version.
When doing a checkout then Git will check if your working directory is dirty, and if so check if the checkout will result in any conflicts, and if so abort the checkout with a message: $ git checkout new_branch error: Your local changes to the following files would be overwritten by checkout: some_file Please, commit your changes or stash them before you can switch branches. Aborting If no conflicts then: $ git checkout new_branch A/D/M some_file Switched to branch 'new_branch'
What is so called "conflict"? There are three factors, at least, define the logic of "conflict test"
Factor A: the file, you change/ add/ remove, exists in the new branch Factor B: the file, you change/ add/ remove, is the same in two branches Factor C: whether you cache the change
- Modify a file:
- A[Yes] B[Yes] C[Yes] Pass. After checkout done, keep change in working directory, clear change in the cache.
- A[Yes] B[Yes] C[No] Pass. After checkout done, keep change in working directory.
- A[Yes] B[No] C[Yes] Abort.
- A[Yes] B[No] C[No] Abort.
- A[No] B[N/A] C[Yes] Abort.
- A[No] B[N/A] C[No] Abort.
- Add a file:
- A[Yes] B[Yes] C[Yes] Pass. After checkout done, keep change in working directory, clear change in the cache.
- A[Yes] B[Yes] C[No] Abort.
- A[Yes] B[No] C[Yes] Abort.
- A[Yes] B[No] C[No] Abort.
- A[No] B[N/A] C[Yes] Pass. After checkout done, keep change in working directory, T.T keep change in the cache.
- A[No] B[N/A] C[No] Pass. After checkout done, keep change in working directory.
- Delete a file:
- A[Yes] B[Yes] C[Yes] Pass. After checkout done, keep change in working directory, T.T keep change in the cache.
- A[Yes] B[Yes] C[No] Pass. After checkout done, keep change in working directory.
- A[Yes] B[No] C[Yes] Abort.
- A[Yes] B[No] C[No] Pass. After checkout done, clear change in working directory.
- A[No] B[N/A] C[Yes] Pass. After checkout done, keep change in working directory, clear change in the cache.
- A[No] B[N/A] C[No] Pass. After checkout done, keep change in working directory.
10 out of 18 cases pass conflict test. 2 of these 10 cases keep your change in cache. In 1 of these 10 cases, you still lose the change in working directory immidiately after checkout, very 囧 ... If you checkout from a dirty branch with no conflict, many unexpected symptoms appears. Such as commit an unexpected file which was added from another branch. My suggestion is create a git alias, do status checking right ahead of git-checkout. Here is an examle
https://github.com/CJKu/git_utils
So, how to resolve aborting? Different strategies for different situations
- Force checkout: if you don't care change lose at all, use "git checkout -f <branch>" bravely.
- Hard reset: almost the same with previous one. Reset hardly and checkout again.
- Checkout merge: If you want to bring what you change in working directory into the new branch, choose this one. With --merge option, a three-way merge between the current branch, your working directory, and the new branch is done, and you will be on the new branch. And for sure, you have to fix conflicts if any on the new branch.
- Commit change: keep everything you change into object store by a new commit, then checkout again
- Stash: you need to rush into another branch to fix emergency thing. You want to keep the context of current working directory, the index and the tree, use "git stash save" and "git stash pop".
Recipes/ Troubles
List troubles that you met while using git. Why this vcs drives you crazy? :)
Questions
Please list questions that you want to ask here.
- Repo
- Stash
- Rerere
- Replace
Reference
- Version Control with Git, 2nd Edition, By: Jon Loeliger; Matthew McCullough, Publisher: O'Reilly Media, Inc.
- Git Recipes: A Problem-Solution Approach, By: Włodzimierz Gajda, Publisher: Apress
- https://github.com/git/git/blob/master/Documentation/gittutorial.txt
- https://github.com/git/git/blob/master/Documentation/gittutorial2.txt
- https://github.com/git/git/blob/master/Documentation/gitcore-tutorial.txt
- http://felipec.wordpress.com/2011/01/16/mercurial-vs-git-its-all-in-the-branches/
- http://git-scm.com/blog/2011/07/11/reset.html
- http://stackoverflow.com/questions/4044368/what-does-tree-ish-mean-in-git
- http://git-scm.com/blog/2010/03/08/rerere.html
- http://www.gitguys.com/topics/whats-the-deal-with-the-git-index/
- http://alblue.bandlem.com/2011/10/git-tip-of-week-understanding-index.html
- Directed acyclic graph, http://en.wikipedia.org/wiki/Directed_acyclic_graph
- tree-ish and commit-ish, https://www.kernel.org/pub/software/scm/git/docs/gitrevisions.html#_specifying_revisions
- tree-ish and commit-ish, https://www.kernel.org/pub/software/scm/git/docs/
- checkout conflict, http://stackoverflow.com/questions/22609566/how-to-force-git-to-abort-a-checkout-if-working-directory-is-not-clean-i-e-dis
- checkout conflict request, http://git.661346.n2.nabble.com/Git-feature-request-Option-to-force-Git-to-abort-a-checkout-if-working-directory-is-dirty-i-e-disreg-td7606840.html