Using GitHub and Google Drive with google collab to manage code and data

Google colab is an incredible resource bringing GPU capabilities to those with just an internet connection (it doesn’t even have to be the best at that!) It’s literally an out of the box, ready set up Jupyter Notebook environment with pretty much every module you’ll want. However, it has just one teeny ‘obstacle’, how does one get their data into the colab virtual machine?
In this tutorial I cover the basic methods of getting data in (Gdrive/public git) and extracting from the more complicated data source (private git). You might already know about ‘mounting’ your Gdrive to directly access your cloud files or just cloning your public github repository. But what if you’re working on a collaborative project all based on a private repository? We’ve got that covered too!
Google Drive
Drop this in a *code cell* and hit run, a pop-up will prompt you for authorisation. Once you choose to accept, a code will appear for you to copy and paste below the code cell in google colab. Completing this should sucessfully mount your gdrive.
from google.colab import drivedrive.mount(‘/content/gdrive’)
Public Git Repository
Probably the most simple method, using an !
to signify shell commands you can call upon git to clone a github repository.
!git clone [https://github.com/username/repository.git](https://github.com/username/repository.git) /content/foldername
a) Private Git Repository — Password
(NOT recommended) This method no longer works with Github as it shutdown git password authentication as of 13th August 2021
This is included for the sake of completeness if you like, its more to show you the easy way of doing it but I certainly do not recommend it otherwise I wouldn’t have bothered to write the next bit! I don’t like to expose my password.
!git clone [https://username:password@github.com/username/repository.git](https://username:password@github.com/username/repository.git)
b) Private Git Repository— SSH key
(Recommended)
This is much more hands on, but once set up, it’s effortless. It requires the generation of *SSH keys*, one that you register with github, the other you save on your virtual system. You may be familiar with the SSH protocal for remote (terminal) access to servers/cluster/desktops, in the case of github there is no shell access but instead it is used as an alternate method to authenticate.
We start off by generating the keys which will be saved in /root/.ssh/
, one of these keys must be added into your private repository, the other we must take note of to access in the future. Finally we add github to our known hosts. Using SSH we authenticate in which the keys are identified and you’ll be able to access the private repo corresponding to the provided key.
First of all run the below and hit enter when prompted for any answer in this section.
!ssh-keygen -t rsa -b 4096 -C “username@github.com”
You wont be able to see the generated keys this in google colab’s file explorer, but shell commands can access them. Here we call the ls
(list) function, where we should find the key pairs id_rsa.pub
and id_rsa
file.
!ls /root/.ssh/
Next we display the public key using cat
, copy everything that is outputted from this cell.
Now switch over to your Github, open up your chosen private repository and open the repo’s settings>deploy keys. Add deploy key, paste the code and give it a name, hit add key.
!cat /root/.ssh/id\_rsa.pub
Display your private key, which you’ll need to copy and paste into a cell for future access.
!cat /root/.ssh/id\_rsa
Here we add Github to your known hosts, gain authentication and clone the repo, provided the above steps have been followed correctly! chmod
is used to change file access permissions to ensure that they can be correctly accessed for the following steps.
!ssh-keyscan github.com >> /root/.ssh/known\_hosts
!chmod 644 /root/.ssh/known\_hosts!chmod 600 /root/.ssh/id\_rsa
!ssh -T git@github.com!git clone git@github.com:username/privaterepo.git /content/foldername
Future imports
Now, so that we can access the clone everytime, paste the private key from the cell above, below row 2 replacing what is already here. Make sure the last line corresponds to your repo, this cell regenerates the same private key file necessary to authenticate and clone your file.
This should work for the private repo that you set it up for as long the key is not deleted.
key = \\
— — -BEGIN RSA PRIVATE KEY — — -
ABCDEFGHIJKLMN1093428I4WT34k,jkljg
AWE;JSFLKHSAGKLJSDFKLGJDFSLKLDFKSG
FAKJSDAKLJSDFKL;ADSJFLAKSDJFLAKS;D
AKLSJDFDSAKJHFASDLJKFLASDKJFKLDJFK
NOT A REAL KEY
a;lsdkfjadsl;kfjadslkldfkjdfkljdff
AL;SKDJFASLDKJFLAKSDJFLDKASJFLKASD
A;SLDKFJADSLKFJALKDSJFKLJFDSLSetc.
— — -END RSA PRIVATE KEY — — -
‘’’!mkdir -p /root/.ssh
with open(r’/root/.ssh/id\_rsa’, ‘w’, encoding=’utf8') as fh:
fh.write(key)
!chmod 600 /root/.ssh/id\_rsa!ssh -T git@github.com
!git clone git@github.com:username/privaterepo.git /content/foldername
And… that’s it! You’re good to go. Below are a list of resources I came across to do this but I felt something specific to the google Colaboratory environment should be useful! They should provided a few more details should you need them.
Resources
- https://developer.github.com/v3/guides/managing-deploy-keys/#deploy-keys
- https://help.github.com/en/enterprise/2.17/user/authenticating-to-github/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent
- https://stackoverflow.com/a/51441990
- https://stackoverflow.com/a/49933595
Originally published on the 4th November 2019 at medium.com