Cjku/git

Outline

Commit graphs
git directory - ref and object store
git-add and git-commit - manipulate the index and commit tree
git-reset and git-checkout - manipulate refs, the index and working directory
git-merge and git-rebase
Working with remote
Extend git-commands

git directory

file system

Figure 1 depicts the tree architecture of git directory after git-clone command.

 # Clone from a repository which has a README.md file and single commit only
 $ git clone https://github.com/CJKu/SingleCommit.git

Figure 1. git directory

class diagram

Figure 2. git class diagram

Is it possible that a commit does not refer to another commit object?
- yes, the root commit object
Is it possible that a commit does not refer to a tree?
- No, unless you step on a mine
Is it possible that two commits point to the same tree?
- Yes. but these two commits can not be adjacent.
Is it possible that HEAD is empty?
- Yes, after git-init and before the first commit
Is it possible that a tree does not refer to any blobs?
- No, no content no tree. git stores content instead of file.
Can I git-add a directory?
- Give me a break....
Is it possible that a tree point to a commit?
- No, anti-acyclic and does not make sense at all.
Is it possible that HEAD refers to branch name directly?
- Yes, enter detached branch state.

object type

Table 1 roughly describes four different object types. I will explain more detail in the next section. Just keep this table as a reference.

Object types	Description
Blobs	Binary large object. Hold only the content of a file, nothing more.
Trees	Be similar to a file system directory. A tree object holds path names and identifiers(SHA1) of blobs and sub-tree objects.
Commits	A snapshot of content change, which consist of parent commit(if any), the name of author and comitter, commit message, and identifier(SHA1) of new tree node.
Tags	Human readable name of an object.

Table 1. Object types.

object relation

In the beginning, let's create a repository, named sample_1, with single commit with following steps.

 ###############################################
 ## Create a simple commit graphs
 ###############################################
 $ mkdir sample_1 && cd sample_1
 # Initiate git directory.
 $ git init
 $ echo a > a.txt && echo b > b.txt
 $ mkdir sub_dir && echo c > sub_dir/c.txt
 # push a.txt/b.txt/c.txt into the index.
 $ git add --all
 # from the index to a new commit object.
 $ git commit -m "add a.txt, b.txt, c.txt"
 [master (root-commit) 73b1c13] add a.txt, b.txt, c.txt
  3 files changed, 3 insertions(+)
  create mode 100644 a.txt
  create mode 100644 b.txt
  create mode 100644 sub_dir/c.txt
 $ tree .git # or "find .git"
 17 directories, 24 files

Figure 2. object relation

Figure 2 depicts the outline of the object store. The tree architecture on your local should be exact the same with figure 2, except the name of the commit object, that is because committer and author in your environment differ from mine, which lead to different SHA1.

Note: the name of an object <tree-ish>

When I said the name of something, it always means SHA1 of that object. In git directory, it's also equal to the file name in file sytem.

The next step is to figure out how many objects was been created and relation among them.

 #######################################################
 ## Look into commit/tree/blob objects in object store.
 #######################################################
 # HEAD(symref) refers to current branch, which is master
 $ cat .git/HEAD
 ref: refs/heads/master
 # master(ref) refers to the latest commit object
 $ cat .git/refs/heads/master
 ed0a8533886a3ca5810046506fec786648e89971 # SHA1 of the commit object
 
 # Alternatively, you may use rev-parse command to get SHA1 of the referee, which give you the same result.
 $ git rev-parse HEAD
 ed0a8533886a3ca5810046506fec786648e89971
 $ git rev-parse master
 ed0a8533886a3ca5810046506fec786648e89971
 
 # Read the type of object{ed0a8}. Generally, a branch points to a commit object.
 $ git cat-file -t ed0a8
 commit
 
 # Read the content of commit{ed0a8}.
 $ git cat-file -p ed0a8
 tree 92e349e2e492b65e8137a9b435c5360b51f16f24
  author CJKu <cjcool.tw@gmail.com> 1414401560 +0800
  committer CJKu <cjcool.tw@gmail.com> 1414401560 +0800
 
   add a.txt, b.txt, c.txt
 
 # commit{ed0a8} refers to tree{92e34}. Read content of tree{92e34}.
 $ git cat-file -t 92e34
 tree
 $ git cat-file -p 92e34
 100644 blob 78981922613b2afb6025042ff6bd878ac1994e85    a.txt
 100644 blob 61780798228d17af2d34fce4cfbdf35556832472    b.txt
 040000 tree cf67e9ef3a0fc6d858423fc177f2fbbe985a6f17    sub_dir
 $ git cat-file -p cf67e
 100644 blob f2ad6c76f0115a6ba5b00456a849810e7ec0af20    c.txt
 
 # Or, use ls-tree -r to dump all blobs in tree.
 $ git ls-tree -r 92e349e
 100644 blob 78981922613b2afb6025042ff6bd878ac1994e85    a.txt
 100644 blob 61780798228d17af2d34fce4cfbdf35556832472    b.txt
 100644 blob f2ad6c76f0115a6ba5b00456a849810e7ec0af20    sub_dir/c.txt

git-add and git-commit

  Linus Torvalds argued on the Git mailing list that you can’t grasp and fully appreciate the power of 
  Git without first understanding the purpose of the index.
   - Version Control with Git, 2nd Edition, By: Jon Loeliger; Matthew McCullough, Publisher: O'Reilly Media, Inc.

So, it's extremly important to grok what index is. To prevent confusion, when a git user says

Cache a file
Put a file in the index
Stage a file

All these statements mean the same thing: he called git-add command.

The index(referred as "stage area" sometimes) put things that you proposed for the next commit and does not contain file content. While a git-commit command submit, Git checks the index rather than your working directory to discover what to commit.

Here is a sample of using git-add and git-commit command to modify index and tree object. Figure 3 and 4 reveal git directory changes accordingly.

 $ mkdir git_sample_cache
 $ cd git_sample_cache
 $ git init
 
 # Cache a.txt and b.txt and uses ls-files to peer into index
 $ echo a > a.txt && git add a.txt
 $ echo b > b.txt && git add b.txt
 $ git ls-files --stage
 100644 78981922613b2afb6025042ff6bd878ac1994e85 0       a.txt
 100644 61780798228d17af2d34fce4cfbdf35556832472 0       b.txt
 
 # Summit change. 
 $ git commit -m "add"
 [master (root-commit) 9d262d6] add
  2 files changed, 2 insertions(+)
  create mode 100644 a.txt
  create mode 100644 b.txt
 
 # Look into master-commit-tree chain. 
 $ git rev-parse HEAD
 9d262d6394b25573fb3287b5cd019281bc8dc3b8
 $ git cat-file -t 9d262d6
 commit
 $ git cat-file -p 9d262d6
 tree f4b354863caa9cea99b95422c9dab70465757d87
 author CJKu <cjcool.tw@gmail.com> 1414482450 +0800
 committer CJKu <cjcool.tw@gmail.com> 1414482450 +0800
 
 add
 $git cat-file -t f4b35486
 tree
 $ git cat-file -p f4b354
 100644 blob 78981922613b2afb6025042ff6bd878ac1994e85    a.txt
 100644 blob 61780798228d17af2d34fce4cfbdf35556832472    b.txt
 
 # Modify content of a.txt and cache it. 
 $ echo "add a line" >> a.txt
 $ git add a.txt
 
 # Print index. 
 $ git ls-files --stage
 100644 2915b75977f7d84d291f3329ce1cc251743a7c54 0       a.txt
 100644 61780798228d17af2d34fce4cfbdf35556832472 0       b.txt
 
 $ git commit -m "modify"
 [master 0492bef] modify
  1 file changed, 1 insertion(+)
 
 # Dig into new created tree.
 $ git cat-file -t 0492bef
 commit
 $ git cat-file -p 0492bef
 tree d3de1600ad651843b4659fc896c8686f76841824
 parent 9d262d6394b25573fb3287b5cd019281bc8dc3b8
 author CJKu <cjcool.tw@gmail.com> 1414483295 +0800
 committer CJKu <cjcool.tw@gmail.com> 1414483295 +0800
 
 modify
 $ git cat-file -t d3de160
 tree
 $ git cat-file -p d3de160
 100644 blob 2915b75977f7d84d291f3329ce1cc251743a7c54    a.txt
 100644 blob 61780798228d17af2d34fce4cfbdf35556832472    b.txt

Figure 3. cache - add files.

Figure 4. cache - modify a file.

In previous example, by using git-add, we update change from working directory into the index; by using git-add, we update commit graphs by the index. Basically, that's what you need to know at high level. However, git-commmit does do many things, such as generate a new tree object and a commit object, add more edges in commit graphs, move refs. So..if you want to know more detailed...

 ###############################################
 ## git-commit ~= write-tree + commit-tree + reset
 ###############################################
 # There are two commit objects in store
 $ git log --graph --pretty=oneline --abbrev-commit
 * 43cf743 modify
 * 9d262d6 add
 # Reset to the first commit{"add"}, roll back index but don't touch working directory.
 $ git reset --mixed HEAD^
 $ git log --graph --pretty=oneline --abbrev-commit
 * 9d262d6 add
 # Delete orphan objects.
 $ git gc
 # Add a.txt into index again.
 $ git add a.txt
 # Generate a tree object according to index
 $ git write-tree
 d3de1600ad651843b4659fc896c8686f76841824
 # Confirm HEAD still refers to commit{SHA1: 9d262}
 $ git log --graph --pretty=oneline --abbrev-commit
 * 9d262d6 add
 # Create a new commit object associate with tree{SHA1: d3de1} and set parent as commit{SHA1: 9d262}
 $ echo -n "modify" | git commit-tree -p 9d262 d3de1
 43cf743105b49941565f2295db6cf7aae6f2cdc1
 # Move HEAD to this new commit{SHA1: 43cf7}
 $ git reset --soft 43cf7
 # Done! Check log
 $ git log --graph --pretty=oneline --abbrev-commit
 * 43cf743 modify
 * 9d262d6 add

When you use git-checkout command to change current branch, the index will be recreated again. In fact, if you want, you may also manually destroy the index and recreate it. The next two examples explain how git regenerate the index by the root tree of a specific commit object.

 ###############################################
 ## reconstruct index via git-read-tree(plumbing)
 ###############################################  
 # Take a look of index's content before wipe it out.
 $ git ls-files -s 
 100644 4331a357983b7acf195679e68e543405ef86cc15 0       README.md
 
 # Oops... delete the index occasionally.
 $ rm .git/index
 
 # get the identifier of commit object.
 $ git rev-parse HEAD
 46fbc5468dab716b1baf2141b89780586a00556f
 $ git cat-file -p 46fbc54
 tree 57a161d717cf98e78a5eea9a30f313b464fc0429
 author CJKu <cku@mozilla.com> 1414423500 +0800
 committer CJKu <cku@mozilla.com> 1414423500 +0800
 
 Initial commit
 
 # git-read-tree - Reads tree information into the index
 # you can read-tree from a commit object, or the root tree object
 # Let's read from a commit object first<tree-ish>.
 $ git read-tree 46fbc546  
 
 # Hooray! index file returns back!
 $ git ls-files -s 
 100644 4331a357983b7acf195679e68e543405ef86cc15 0       README.md
 
 # Read again from the root tree node.
 $ rm .git/index
 $ git read-tree 57a161d7 
 $ git ls-files -s 
 100644 4331a357983b7acf195679e68e543405ef86cc15 0       README.md

In practice, using read-tree plumbing command is not encouraged. Reset, porcelain command, is more promising.

 ###############################################
 ## reconstruct index via git-reset(porcelain) 
 ############################################### 
 # Keep kill index file.
 $ rm .git/index
 
 # git reset [--mixed | --soft | --hard] [<commit>]
 # --hard    reset HEAD, index and working tree
 # --mixed   reset HEAD and index
 $ git reset --mixed HEAD  
 $ git reset --hard HEAD

git-reset and git-checkout

git-checkout and git-reset are a bit confusing at times, at least for me. The functionalities of these two commands are overlapped:

Both of them are able to move HEAD.
You can recover what you change in working directory by either of them.

From high level, you may also notice some difference in between

You can reset HEAD to a commit(a commit-ish) or branch(also, a commit-ish); You can checkout to a commit or a branch as well, though we usually git-checkout with a branch.
git-checkout changes the current branch; while reset does not.

Figure N. reset and checkout at commit-ish level

OK, let's take a look at reset first:

 $ man git-reset
 $ git reset -h
 usage: git reset [--mixed | --soft | --hard] [<commit>]
    or: git reset <tree-ish> [--] <paths>...

Note: what is <tree-ish>

The definition of <tree-ish> at [git-mamual-page]

<object> indicates the object name for any type of object.
<blob> indicates a blob object name.
<tree> indicates a tree object name.
<commit> indicates a commit object name.
<tree-ish> indicates a tree, commit or tag object name. A command that takes a <tree-ish> argument ultimately wants to operate on a <tree> object but automatically dereferences <commit> and <tag> objects that point at a <tree>.
<commit-ish> indicates a commit or tag object name. A command that takes a <commit-ish> argument ultimately wants to operate on a <commit> object but automatically dereferences <tag> objects that point at a <commit>.

Here are some valid examples of using git-reset command

$ git reset master -- README.txt

$ git reset master^ -- README.txt

$ git reset HEAD -- README.txt

$ git reset v1.0 -- README.txt # v1.0 is a tag name

$ git reset 0a11f0 -- README.txt # 0a11f0 is the name of a commit object

$ git reset 0b7123 -- README.txt # 0b7123 is the name of a tree object

$ git reset refs/heads/master -- README.txt

In short, a tree-ish is a thing that lead a way to a specific tree object. If you give a thing(name/ refname/ tag, ect...) to git, and git can resolve that thing to a unique tree object, then that thing is a tree-ish. In the following diagram, only the name of a blob is not a tree-ish, since git can not reach any tree object from a blob. I

Figure N. tree-ish

There are two forms: reset HEAD to a <commit> and reset paths from a <tree-ish>. In the first case, you have tree options(--merge and --keep are ignore here)

 git reset [--mixed | --soft | --hard] [<commit>]

Depend on how hard you want, different change scope applies:

soft option: change the referee of the current branch
- index file and working directory keep untouched.
- scenario: squash change in several commits into one commit so that people think you are smart.
mixed option: soft criteria; Plus, regenerate index according to the new tree.
- working directory keeps untouched.
- scenario: slice a single big commit to different small commits to haste review process.
hard option: mixed criteria; Plus, overwrite all modified tracked files in working directory. (Note: only tracked and dirty files will be overwritten by hard-reset. Unchanged and untracked files are out of scope)
- scenario: overtime working, did lots of stupid things. You don't want to face what had done...
- scenario: recover context before a stale merge

Another usage of reset is to reset paths:

  git reset [-q] <tree-ish> [--] <paths>

Using it while intending to recover files in working directory from a specific revision. Index or branch ref have nothing to do with this usage.

 $ echo "Jerry" > a.txt
 $ git add a.txt
 $ git commit -m "initial commit"
 $ echo "is cool" >> a.txt
 $ cat a.txt
 Jerry
 is cool
 # Oops... that is definitely a typo. He is not cool at all.
 $ git reset HEAD -- a.txt
 $ cat a.txt
 Jerry

Now, move forward to checkout command:

 $ git checkout -h
 usage: git checkout [options] <branch>
    or: git checkout [options] [<branch>] -- <file>...
    or: git checkout [options] [<commit>]

No wonder, there are also two public forms in git-checkout: checkout HEAD to a branch or checkout files from a branch. I also appended a hidden usage, in red text color, after public forms. Let's talk about the first form: "git checkout [options] <branch>"

(Different with git-reset) Change the referee(branch ref) of HEAD.
(Different with git-reset) There is no hardness-choice in git-checkout.
(__"Similar"__ to git-reset-hard) git creates the index according to the root tree of the new assigned commit, and updates working directory. Almost the same with git-reset-hard, huh? Yes, but only if you checkout from a clean working directory:
- git-reset-hard never abort, while git-checkout will.
- git-reset-hard compares the working directory and new tree, replace all modified tracked files in working directory by content in new tree; if your change in working directory does not cause any conflicts in checkout process, git will not erase the change. If the change indeed causes conflicts, then checkout aborting. Please see the next section for more details

At file/paths level, I did find any major differences between git-checkout and git-reset.

 git reset <tree-ish> [--] <paths>...
 git checkout [options] [<branch>] -- <file>...

One thing that I want to bring up here is, comment for git-checkout is misleading. Here is the correct version

 git checkout [options] [<tree-ish>] -- <file>...

Scott Chacon/Ben Straub wrote a fantastic article with regards to reset, and deserves you spend time on: http://git-scm.com/blog/2011/07/11/reset.html

checkout abort

I separate an individual section to discuss git-checkout aborting, since it's somehow annoyed to git new hands. Briefly speaking, if git think you may lose local change in working directory, it aborts your checkout command.

Longer version.

 When doing a checkout then Git will check if your working directory is dirty, and if so check 
 if the checkout will result in any conflicts, and if so abort the checkout with a message:
 $ git checkout new_branch
 error: Your local changes to the following files would be overwritten by
 checkout:
        some_file
 Please, commit your changes or stash them before you can switch branches.
 Aborting
 
 If no conflicts then:
 $ git checkout new_branch
 A/D/M    some_file
 Switched to branch 'new_branch'

What is so called "conflict"? There are three factors, at least, define the logic of "conflict test"

 Factor A: the file, you change/ add/ remove, exists in the new branch
 Factor B: the file, you change/ add/ remove, is the same in two branches
 Factor C: whether you cache the change

Modify a file:
- A[Yes] B[Yes] C[Yes] Pass. After checkout done, keep change in working directory, clear change in the cache.
- A[Yes] B[Yes] C[No] Pass. After checkout done, keep change in working directory.
- A[Yes] B[No] C[Yes] Abort.
- A[Yes] B[No] C[No] Abort.
- A[No] B[N/A] C[Yes] Abort.
- A[No] B[N/A] C[No] Abort.
Add a file:
- A[Yes] B[Yes] C[Yes] Pass. After checkout done, keep change in working directory, clear change in the cache.
- A[Yes] B[Yes] C[No] Abort.
- A[Yes] B[No] C[Yes] Abort.
- A[Yes] B[No] C[No] Abort.
- A[No] B[N/A] C[Yes] Pass. After checkout done, keep change in working directory, T.T keep change in the cache.
- A[No] B[N/A] C[No] Pass. After checkout done, keep change in working directory.
Delete a file:
- A[Yes] B[Yes] C[Yes] Pass. After checkout done, keep change in working directory, T.T keep change in the cache.
- A[Yes] B[Yes] C[No] Pass. After checkout done, keep change in working directory.
- A[Yes] B[No] C[Yes] Abort.
- A[Yes] B[No] C[No] Pass. After checkout done, clear change in working directory.
- A[No] B[N/A] C[Yes] Pass. After checkout done, keep change in working directory, clear change in the cache.
- A[No] B[N/A] C[No] Pass. After checkout done, keep change in working directory.

10 out of 18 cases pass conflict test. 2 of these 10 cases keep your change in cache. In 1 of these 10 cases, you still lose the change in working directory immidiately after checkout, very 囧 ... If you checkout from a dirty branch with no conflict, many unexpected symptoms appears. Such as commit an unexpected file which was added from another branch. My suggestion is create a git alias, do status checking right ahead of git-checkout. Here is an examle

 https://github.com/CJKu/git_utils

So, how to resolve aborting? Different strategies for different situations

Force checkout: if you don't care change lose at all, use "git checkout -f <branch>" bravely.
Hard reset: almost the same with previous one. Reset hardly and checkout again.
Checkout merge: If you want to bring what you change in working directory into the new branch, choose this one. With --merge option, a three-way merge between the current branch, your working directory, and the new branch is done, and you will be on the new branch. And for sure, you have to fix conflicts if any on the new branch.
Commit change: keep everything you change into object store by a new commit, then checkout again
Stash: you need to rush into another branch to fix emergency thing. You want to keep the context of current working directory, the index and the tree, use "git stash save" and "git stash pop".

Recipes/ Troubles

List troubles that you met while using git. Why this vcs drives you crazy? :)

Questions

Please list questions that you want to ask here.

Repo
Stash
Rerere
Replace

Reference

Version Control with Git, 2nd Edition, By: Jon Loeliger; Matthew McCullough, Publisher: O'Reilly Media, Inc.
Git Recipes: A Problem-Solution Approach, By: Włodzimierz Gajda, Publisher: Apress
https://github.com/git/git/blob/master/Documentation/gittutorial.txt
https://github.com/git/git/blob/master/Documentation/gittutorial2.txt
https://github.com/git/git/blob/master/Documentation/gitcore-tutorial.txt
http://felipec.wordpress.com/2011/01/16/mercurial-vs-git-its-all-in-the-branches/
http://git-scm.com/blog/2011/07/11/reset.html
http://stackoverflow.com/questions/4044368/what-does-tree-ish-mean-in-git
http://git-scm.com/blog/2010/03/08/rerere.html
http://www.gitguys.com/topics/whats-the-deal-with-the-git-index/
http://alblue.bandlem.com/2011/10/git-tip-of-week-understanding-index.html
Directed acyclic graph, http://en.wikipedia.org/wiki/Directed_acyclic_graph
tree-ish and commit-ish, https://www.kernel.org/pub/software/scm/git/docs/gitrevisions.html#_specifying_revisions
tree-ish and commit-ish, https://www.kernel.org/pub/software/scm/git/docs/
checkout conflict, http://stackoverflow.com/questions/22609566/how-to-force-git-to-abort-a-checkout-if-working-directory-is-not-clean-i-e-dis
checkout conflict request, http://git.661346.n2.nabble.com/Git-feature-request-Option-to-force-Git-to-abort-a-checkout-if-working-directory-is-dirty-i-e-disreg-td7606840.html

Cjku/git

Contents

Outline

git directory

file system

class diagram

object type

object relation

git-add and git-commit

git-reset and git-checkout

checkout abort

Recipes/ Troubles

Questions

Reference

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

How to Contribute

MozillaWiki

Around Mozilla

Tools