View Online — Speakers Notes — Source Code
A hash \(h(x)\) is a fixed length value derived from some data \(x\).
(*) It is very unlikely that different values hashes to the same hash. If this happens it is called a hash collision.
Collisions depend on the number of changed files & commits.
Rule of thumb: Don't create more than \(10^{10}\) (ten US-billion) files with \(10^{10}\) commits per repository
Git
basicsThink of git as a filesystem with extra dimensions , or, if you like math, a directed acyclic graph.
The following pages show how git
implements the "file system" used for its magic. Files, directories and commits are handled all in the same way!
A content addressed storage is a very simple database
PRO | CON |
---|---|
- simple | - no query method besides hash |
- data not duplicated |
sha1(data) == hash
This is the empty object store. It is located in the .git/
directory
We can store the content of README.txt
but not the file name. Git
calls this a blob
.
A directory (tree
in git
-speak) is a special file that contains file names and links to the content via the hash.
E.g. B 0xafde README.txt
is the README.
A commit (circle) points to a tree
(the files) and has e.g. a commit message.
README.txt
( B 0xdead README.txt
)?
Changing README.txt
also changes the tree
! The hash changed from 0x4711
to 0x0815
.
This is very important : Changing a file will change its hash. This will change the content of the parent tree. This will change the hash of the parent tree. This will change all the tree hashes up to the top.
Starting out from the filesystem, let's have a look at how a branch can be constructed.
In order to to so, we need to answer a very important question:
How does git
know which commit is the current commit?
git
finds the parent commitLet's recapitulate:
Git
heavily relies on content addressed storageget(hash)
In order to know the current commit, we need to look at the paperwork.
Create a fresh repository:
mkdir -p "${repo}" && cd "${repo}"
git init
Initialized empty Git repository in /tmp/git-from-the-inside/branches/.git/
and commit something:
echo "Please read me" >README.txt
git add README.txt
git commit -m"1st commit"
[master (root-commit) 50ad332] 1st commit
1 file changed, 1 insertion(+)
create mode 100644 README.txt
The value 50ad332
in the first line of the output is the ID of the new commit.
.git/
Question: How does git
know which commit is the current commit?
Answer: The .git/
directory provides additional context:
tree -L 1 -hF "${repo}/.git"
/tmp/git-from-the-inside/branches/.git
|-- [ 11] COMMIT_EDITMSG
|-- [ 23] HEAD
|-- [4.0K] branches/
|-- [ 143] config
|-- [ 73] description
|-- [4.0K] hooks/
|-- [ 145] index
|-- [4.0K] info/
|-- [4.0K] logs/
|-- [4.0K] objects/
`-- [4.0K] refs/
6 directories, 5 files
.git/HEAD
- Tells git
what the current commit is.git/refs/..
and .git/branches/..
- later…./git/HEAD
Let's see what .git/HEAD
contains.
cat .git/HEAD
ref: refs/heads/master
What does git
make of ref: refs/heads/master
?
git rev-parse refs/heads/master
50ad33284a01b5c440ffa1c1ac0b848100943039
What is the last commit?
git log
commit 50ad33284a01b5c440ffa1c1ac0b848100943039
Author: Alice <alice@neuhalfen.name>
Date: Fri Feb 19 08:39:36 2021 +0000
1st commit
./git/HEAD
part II
Question: How does git
know which commit is the current commit?
Answer: HEAD
points to the current branch. The branch resolves to the current commit.
git commit -m'2nd commit' --allow-empty
[master ded185a] 2nd commit
git checkout -b devel HEAD^1 # Start "devel" from "2nd commit"
git commit -m'Commit 1 on devel' --allow-empty >/dev/null # Ignore output
git commit -m'Commit 2 on devel' --allow-empty >/dev/null
git log --oneline # since we have "devel" checked out, this shows the "devel" branch
84f83f3 Commit 2 on devel
4983875 Commit 1 on devel
ded185a 2nd commit
50ad332 1st commit
# --topo-order: Sort by graph layout, not date.
# --decorate: Print out the ref names of any commits that are shown.
git log devel --oneline --decorate --topo-order
84f83f3 (HEAD -> devel) Commit 2 on devel
4983875 Commit 1 on devel
ded185a 2nd commit
50ad332 1st commit
A merge commit has more than one parent and includes the commits of multiple branches.
git checkout master
# GIT_MERGE_AUTOEDIT=no uses the automatically created commit message
GIT_MERGE_AUTOEDIT=no git merge devel
Already up to date!
Merge made by the 'recursive' strategy.
git log --oneline --decorate --topo-order
6904d70 (HEAD -> master) Merge branch 'devel'
84f83f3 (devel) Commit 2 on devel
4983875 Commit 1 on devel
471953e 3rd commit - only master
ded185a 2nd commit
50ad332 1st commit
git checkout devel
git commit --allow-empty -m"Hotfix on devel"
git checkout master
GIT_MERGE_AUTOEDIT=no git merge devel
[devel f0dd7ab] Hotfix on devel
Already up to date!
Merge made by the 'recursive' strategy.
Rebasing "transplants" commits and can be a better way to merge.
git checkout devel
git rebase master
First, rewinding head to replay your work on top of it...
Applying: devel: 1st commit
Applying: devel: 2nd commit
git log devel --oneline --decorate
10a0e31 (HEAD -> devel) devel: 2nd commit
b98f451 devel: 1st commit
20f6bee (master) master: 3rd commit
12ed95b master: 2nd commit
95f0cef master: 1st commit
git checkout master
GIT_MERGE_AUTOEDIT=no git merge devel
Updating 20f6bee..10a0e31
Fast-forward
change_devel | 2 ++
1 file changed, 2 insertions(+)
create mode 100644 change_devel
push -f
wip