22 comments
  • tyingq2y

    Somewhat in the same space, I saw an interesting stackoverflow post[1] about how to generically retrieve a single file from git. Apparently since git version 1.7.9.5, you can do this:

      git archive --remote=ssh://host/pathto/repo.git HEAD README.md | tar xO
    
    Though apparently support depends on how you're running git, and potentially some enabled server side options.

    [1] https://stackoverflow.com/questions/1125476/retrieve-a-singl...

    • matheust2y

      Interesting! I didn't know about the --remote flag and this usage. I would probably have done something like:

      git clone --filter=tree:0 --depth=1 --sparse --no-checkout && git checkout HEAD <desired_file>

      (But that would still end up fetching a few more objects than just the desired file.)

      • eru2y

        The other comment above would probably also need to transmit a few more objects. At least if you are starting from a commit hash (instead of HEAD) and don't trust the server.

        (I would assume git doesn't trust the server, and will verify that the chain of hashes works out.)

  • bsimpson2y

    He's effectively writing a minimal git client in Python.

    "Fun" is a good reason for this sort or exploration, but it also has its practical uses. For instance, I use `isomorphic-git` (a git client written in JavaScript) to include a comment with the relevant commit hash in my bundled JavaScript. I can run it on any machine with Node, without opening a shell or trying to find where/if git is installed locally.

    • Something12342y

      Alright I'm really curious on the use case of this. Is it for debugging assistance or anything else?

      • bsimpson2y

        They get deployed into a Colab cell or Google Drive. It's nice to be able to see what version is there without running anything.

        • Something12342y

          There's an api to update a Colab cell publicly available? That's wild. I thought for sure it would be locked down.

          • bsimpson2y

            Best solution I've found is to paste the whole thing into a cell with this header

                #@title Give your cell a headline
                %%js_module
            
            where js_module is defined here:

                # Copyright 2023 Google LLC.
                # SPDX-License-Identifier: Apache-2.0
                from IPython.core.display import display, HTML
                from IPython.core.magic import register_cell_magic
                import time
            
                @register_cell_magic
                def js_module(line, cell):
                  if cell.startswith('http'):
                    src, hash = cell.split('#') if '#' in cell else [cell, '']
                    timestamp = time.time() if 'cachebust' in line else ''
            
                    display(HTML(
                      "<script type = 'module' src = '{src}?{timestamp}'></script>".format(src = src, hash = hash, timestamp = timestamp)
                    ))
                  else:
                    display(HTML(
                      "<script type = 'module'>{cell}</script>".format(cell = cell)
                    ))
            
            The #@title lets you collapse the cell, so the JavaScript isn't showing. Colab has a strong philosophy of ensuring the user can see everything they're running, so I don't think there's a better way to encapsulate the bundled JS.
  • absoluteunit12y

    As a side note, this is why I love hacker news. Just in the last two days I’ve come across a couple posts like this (this one outlining some git internals, another explaining a language server, etc)

    Blog posts are becoming my favourite form of knowledge transfer

    I’ve been meaning to start one myself and posts like this are making me move it up my todo list

  • semiquaver2y

    If this post interests you, James Coglan’s Building Git is an extremely thorough book-length treatment of the same concept. Highly recommended for learning git internals in depth.

    https://shop.jcoglan.com/building-git/

  • chx2y

    This sort of work is a lot of fun

    A few months ago I figured out how to modify remote refs in a similar fashion although I wrote it in go. https://stackoverflow.com/a/77210784/308851

  • seodisparate2y

    One could use `printf 'first\0second'` in the Bash shell as an example of making a "string" with null embedded into it, but you can't store that in a Bash variable.

        $ TEST_VAR="$(printf 'first\0second')"
        bash: warning: command substitution: ignored null byte in input
    
    I'm not familiar with working with null characters in Bash in this way, but I think there might be a way to do it.
    • tyingq2y

      Shells and their aversion to null characters was my first introduction to Perl way back when. Tcl, at the time, couldn't handle null characters either. Various Awk implementations had different issues with them. Python didn't yet exist, C was too tedious for many things.

      One option with shells is some set/get functions that encode/decode to base64, hex, etc. Feels pretty clunky though.

      • cryptonector2y

        I am sure you know this, so I'm just being pedantic here, but it's not that really the shells that have an aversion to the null character so much as that the exec() system call and the main() convention require C strings for program names, program arguments and environment variables, and since shells are thin layers above exec() and the environment, shells kinda have to also use C strings. Sure, nothing stops a shell from using counted byte strings and then allowing nulls in non-exported variable values, but because they evolved in a system that was so deeply based on C and C strings... they don't.

        • tyingq2y

          That's the path TCL initially took, but they added counted byte strings later when it became a barrier. And zsh appears to support them relatively well.

          • cryptonector2y

            Once you get past merely putting together pipelines of command executions and grow all the language functionality you need to much beyond then yeah, you end up needing a language that allows you to have nulls embedded in strings.

    • olddustytrail2y

      You don't actually need to store it in a variable. Do it the bash way and write it to a temporary file. You can then get the sha1 of that file and then zlib compress and copy it to the appropriate dir and filename.

  • olivergregory2y

    Reminds me of this simplified git-like project[0] once featured here. I learned a lot about git with this project.

    [0]: https://www.leshenko.net/p/ugit/

  • lucasoshiro2y

    Nice writing! Other articles about Git in this page are also interesting!

  • zubairq2y

    When I read this I thought it was a different back end to git. Not such an absurd idea, I do with with my own low code tool. I store the commit history in IPFS using the IPFS content ID as the code identifier, with each commit pointing to the parent commit, and it works quite well:

    https://github.com/yazz/yazz

  • palata2y

    A nice small tutorial explaining the basics of Git under the hood.

  • tonymet2y

    good example of how a good data structure leads to a simple algorithm (thanks Knuth!)