A hash \(h(x)\) is a fixed length value derived from some data \(x\).
(*) It is very unlikely that different values hashes to the same hash. If this happens it is called a hash collision.
Collisions depend on the number of changed files & commits.
Rule of thumb: Don't create more than \(10^{10}\) (ten US-billion) files with \(10^{10}\) commits per repository
Git
basicsThink of git as a filesystem with extra dimensions , or, if you like math, a directed acyclic graph.
The following pages show how git
implements the "file system" used for its magic. Files, directories and commits are handled all in the same way!
A content addressed storage is a very simple database
PRO | CON |
---|---|
- simple | - no query method besides hash |
- data not duplicated |
sha1(data) == hash
This is the empty object store. It is located in the .git/
directory
We can store the content of README.txt
but not the file name. Git
calls this a blob
.
A directory (tree
in git
-speak) is a special file that contains file names and links to the content via the hash.
E.g. B 0xafde README.txt
is the README.
A commit (circle) points to a tree
(the files) and has e.g. a commit message.
README.txt
( B 0xdead README.txt
)?
Changing README.txt
also changes the tree
! The hash changed from 0x4711
to 0x0815
.
This is very important : Changing a file will change its hash. This will change the content of the parent tree. This will change the hash of the parent tree. This will change all the tree hashes up to the top.
Starting out from the filesystem, let's have a look at how a branch can be constructed.
In order to to so, we need to answer a very important question:
How does git
know which commit is the current commit?
git
finds the parent commitLet's recapitulate:
Git
heavily relies on content addressed storageget(hash)
In order to know the current commit, we need to look at the paperwork.
Create a fresh repository:
mkdir -p "${repo}" && cd "${repo}"
git init
Initialized empty Git repository in /tmp/git-from-the-inside/branches/.git/
and commit something:
echo "Please read me" >README.txt
git add README.txt
git commit -m"1st commit"
[master (root-commit) 9d1704e] 1st commit
1 file changed, 1 insertion(+)
create mode 100644 README.txt
The value 9d1704e
in the first line of the output is the ID of the new commit.
.git/
Question: How does git
know which commit is the current commit?
Answer: The .git/
directory provides additional context:
tree -L 1 -hF "${repo}/.git"
/tmp/git-from-the-inside/branches/.git
|-- [ 11] COMMIT_EDITMSG
|-- [ 23] HEAD
|-- [4.0K] branches/
|-- [ 143] config
|-- [ 73] description
|-- [4.0K] hooks/
|-- [ 145] index
|-- [4.0K] info/
|-- [4.0K] logs/
|-- [4.0K] objects/
`-- [4.0K] refs/
6 directories, 5 files
.git/HEAD
- Tells git
what the current commit is.git/refs/..
and .git/branches/..
- later…./git/HEAD
Let's see what .git/HEAD
contains.
cat .git/HEAD
ref: refs/heads/master
What does git
make of ref: refs/heads/master
?
git rev-parse refs/heads/master
9d1704e1300fe89c47cca353569e0377609fcc7f
What is the last commit?
git log
commit 9d1704e1300fe89c47cca353569e0377609fcc7f
Author: Alice <alice@neuhalfen.name>
Date: Wed Apr 28 06:45:54 2021 +0000
1st commit
./git/HEAD
part II
Question: How does git
know which commit is the current commit?
Answer: HEAD
points to the current branch. The branch resolves to the current commit.
git commit -m'2nd commit' --allow-empty
[master dcddac0] 2nd commit
git checkout -b devel HEAD^1 # Start "devel" from "2nd commit"
git commit -m'Commit 1 on devel' --allow-empty >/dev/null # Ignore output
git commit -m'Commit 2 on devel' --allow-empty >/dev/null
git log --oneline # since we have "devel" checked out, this shows the "devel" branch
cf44312 Commit 2 on devel
5542c3f Commit 1 on devel
dcddac0 2nd commit
9d1704e 1st commit
# --topo-order: Sort by graph layout, not date.
# --decorate: Print out the ref names of any commits that are shown.
git log devel --oneline --decorate --topo-order
cf44312 (HEAD -> devel) Commit 2 on devel
5542c3f Commit 1 on devel
dcddac0 2nd commit
9d1704e 1st commit
A merge commit has more than one parent and includes the commits of multiple branches.
git checkout master
# GIT_MERGE_AUTOEDIT=no uses the automatically created commit message
GIT_MERGE_AUTOEDIT=no git merge devel
Already up to date!
Merge made by the 'recursive' strategy.
git log --oneline --decorate --topo-order
3cbe7ef (HEAD -> master) Merge branch 'devel'
cf44312 (devel) Commit 2 on devel
5542c3f Commit 1 on devel
3fd43b5 3rd commit - only master
dcddac0 2nd commit
9d1704e 1st commit
git checkout devel
git commit --allow-empty -m"Hotfix on devel"
git checkout master
GIT_MERGE_AUTOEDIT=no git merge devel
[devel 03a47b3] Hotfix on devel
Already up to date!
Merge made by the 'recursive' strategy.
Rebasing "transplants" commits and can be a better way to merge.
git checkout devel
git rebase master
First, rewinding head to replay your work on top of it...
Applying: devel: 1st commit
Applying: devel: 2nd commit
git log devel --oneline --decorate
02b45a9 (HEAD -> devel) devel: 2nd commit
81e0bb3 devel: 1st commit
b170544 (master) master: 3rd commit
09f2508 master: 2nd commit
603c85a master: 1st commit
git checkout master
GIT_MERGE_AUTOEDIT=no git merge devel
Updating b170544..02b45a9
Fast-forward
change_devel | 2 ++
1 file changed, 2 insertions(+)
create mode 100644 change_devel
push -f
remote
repository is just a normal repositoryorigin
origin
are named origin/...
. E.g. origin/master
master
follows origin/master
Why is git push -f
considered a bad idea? And when would you need it?
git rebase
can change historygit reset HEAD^
git push -f
git push -f
Act I: Bob added a feature!
git push -f
Act II: But my history is ugly!
git push -f
Act III: It's just a little bit of history rewriting
git push -f
Act IV: Drama foreshadowing
git push -f
Act V: Explanations are needed
push -f
) can get messy0abef12...
) are hashes used to build the DAG
git
in the browser. Highly recommended!git
WTFsPRO git
book. More than you will ever want to knowngit
fiddle = 0 # A cryptographer/BitCoiner calls this the "nonce"
while True:
my_block = build_block(last_block_hash, fiddle, my_transactions)
my_block_hash = hash(my_block)
if my_block_hash.startswith("0"* 19):
break
else:
fiddle = fiddle + 1 # Bad luck - No new block
print("Yay, I found a new Block!")
To find a hash that starts with 19 zeroes in hexadecimal you need a lot of luck. Or ~ \(10^{11}\) guesses.