Skip to content

Latest commit

 

History

History
337 lines (233 loc) · 8.39 KB

File metadata and controls

337 lines (233 loc) · 8.39 KB

An Introduction to git

Andreas Poehlmann
2024-03-22

About me

__name__ = "Andreas Poehlmann"
__background__ = [
    {"degree": "Physics", "lvl": "MSc"),
    {"degree": "Neuroscience", "lvl": "PhD"),
    {"job": "Software Engineering"},
]
__position__ = "Senior Machine Learning Data Engineer"
__employer__ = "MLCS - Pfizer"

follow me: https://github.com/ap--

get this talk: https://github.com/ap--/teaching-git-aidd-mar-2024

terminalpoint: https://github.com/ap--/terminalpoint


Before we start

(1) Who has used git before?
  • yes: "F" in chat please
(2) Who has used GitHub before?
  • yes: "GH" in chat please
(3) Who does NOT have a GH account?

What is git?

A distributed revision control system, started 2005 by Linus Torvalds.

from git's readme:

GIT - the stupid content tracker
"git" can mean anything, depending on your mood.
  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronounciation of "get" may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • "global information tracker" you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • "goddamn idiotic truckload of sh*t": when it breaks

Why use version control?

You need a tool to manage changes to files:

$ ls ~/OneDrive/reports/
my_report.txt
my_report_version_2.txt
my_report_version_3.txt
my_report_version_4.txt
my_report_version_4-2024-03-22.txt
my_report_version_4-2024-03-22_2.txt
my_report_version_4-2024-03-22_2_final.txt
my_report_version_4-2024-03-22_2_final_final.txt
my_report_version_4-2024-03-22_2_final_final_AAAAHHHH.txt
history: my_report.txt [v1]---[v2]---[v3]---[...]--->

Why use git?

  • It's now by far the most popular vcs
  • Speed Optimized for performance
  • Data Integrity Your project history is ensured via checksums
  • Distributed When collaborating on a project git allows to work simultaneously on different features without interference
|       ()------()
|      /          \
| ()--()---()-----()---()-->
|           \          /
|            ()--()--()

How does git work internally?


Let's create a new repository

git init my_project
cd my_project

And let's check out the .git directory it created!


Working Directory, Staging Area, and Repository

How do start tracking the history of our files?

via -> via ->
Working git add Staging
Staging git commit Repository

check history with git log


Blobs, Trees, and Commits (and Tags)

Let's check out what's in the .git directory!

Blobs

  • A blob is the content of the file added to git with:
    • a header
    • the content of the file
    • and it's zlib compressed
    • and it's filename is the sha1 hash of the blob content

demo commands:

# stored in .git/objects/ab/cd1234...
$ git add my_file.txt          # add a file as a blob
$ git show abcd1234...         # contents
$ git cat-file -p abcd1234...  # contents
$ git cat-file -t abcd1234...  # type

Trees

  • Stores the filename of blobs (files) and trees (folders)
    • a header
    • a list of blobs and trees
    • and it's zlib compressed
    • and it's filename is the sha1 hash of the tree content

demo commands:

# stored in .git/objects/ab/cd1234...
$ git write-tree              # create a tree from staging
$ git show abcd1234...        # pretty print content
$ git cat-file -p abcd1234... # actual content

Commits

  • A commit is a snapshot of the project at a certain point in time
    • a header
    • a tree
    • an (optional) parent commit
    • a commit message
    • and it's zlib compressed
    • and it's filename is the sha1 hash of the commit content

demo commands:

# stored in .git/objects/ab/cd1234...
$ git commit -m "initial commit"  # create a commit
$ git show abcd1234...            # pretty print content
$ git cat-file -p abcd1234...     # actual content
$ git log                         # show history

Branches and HEAD

|       ()------(bugfix/issue-123)
|      /                  
| ()--()---()------------()---()--(main)
|           \          
|            ()--()--(feat/awesome)

Branches

|       ()------(bugfix/issue-123)
|      /                  
| ()--()---()------------()---()--(main)
|           \          
|            ()--()--(feat/awesome)
  • Branches are pointers to commits!
  • The history is embedded in the commit itself
  • main (or previously master) is the default branch by convention

HEAD

  • HEAD usually points to a branch and moves when a new commit is made to the branch
  • detached HEAD means HEAD is pointing to a commit or tag directly

demo commands:

$ cat .git/HEAD
# try `git checkout <commit-sha>` to enter "detached HEAD" state

Combining / changing histories

git merge and git rebase are the two ways to combine histories


Merging

  • Merging is the process of combining two (or more) branches into one
  • A merge commit is created that has two (or more) parents
|       ()------(bugfix)
|      /                \   
| ()--()---()------------(main)
|

demo commands:

$ git checkout main
$ git merge bugfix   # I prefer --no-ff

fun fact: biggest octopus merge in the kernel has 66 parents!


Rebasing

  • Rebasing is the process of moving a branch to a new base commit
  • And / or changing the history of a branch
  • The history is rewritten and the branch is linearized
|       ()------(bugfix)
|      /                
| ()--()---()---(main)


| ()--()---()---(main)---()---(bugfix)  # linear history

demo commands:

$ git rebase main     # best practice: `--interactive`

Remotes

  • remotes are pointers to other "repositories"
  • origin is the default remote name by convention
  • git fetch gathers changes from the remote
  • git pull fetches and merges changes from the remote

demo commands:

# copy the entire repository to /some/other/path
$ git remote add origin /some/other/path  # pretend it's a remote
$ git fetch --all

--set-upstream-to: provide information about updates on the remote


Andreas's recommendations for a cli git workflow

  • git add -i (for interactive) is a game changer!
  • pre-commit hooks are your friend!

demo commands:

$ brew install pre-commit   # <- best to install globally
$ pre-commit install        # run this in your repository
$ pre-commit sample-config > .pre-commit-config.yaml
$ git add -i

github.com/ap--

Thank you for your attention!

Questions?