Life is short. You don't have time to do everything manually. Automation helps.
In this lesson, we are going to automate parts of Google Cloud to save YOU time. In particular, we're going to build on our knowledge of logging from the previous lesson in order to set up buckets in Cloud Storage, to create subdirectories in buckets, and to programmatically delete buckets.
A bucket is a container used to store data. In Google Cloud Documentation, you might see a bucket referenced by a Uniform Resource Indicator such as gs://my-bucket.
test_create_bucket: How to create a bucket on the cloud and then delete that storage container
test_create_path: How to add an empty subdirectory to a bucket
test_create: Shows you how to create a bucket with a Cloud Storage unique identifier (e.g. gs://my-bucket )
test_create_bucket
If you run programs that store data on the cloud, you want to make sure that your bucket exists, and that your program can communicate with the cloud. It seems basic, but it is helpful to implement these processes programmatically so that your programs don’t have errors when they are running.
Let's debug a unit test to better understand how to automatically create buckets.
PREPARE TO WALK THROUGH THE CODE
Set breakpoints in tests/test_gcp_storage.py in your code editor
First, prepare to run the unit test by setting breakpoints at notable areas in the code.
# Excerpt from tests/test_gcp_storage.py
class TestStorage(unittest.TestCase):
def setUp(self):
# Place breakpoint at the line below
infer_credential_set()
self.creds = confirm_credentials()
self.project_name = self.creds.google.project_name
self.bucket_name = self.creds.google.project_name + \
"_whiteowl_test_bucket"
def test_create_bucket(self):
# Place breakpoint at the line below
Storage().create_bucket(self.bucket_name,
self.creds.google.bucket_location)
time.sleep(3)
self.assertTrue(Storage().is_bucket(self.bucket_name))
# Place breakpoint at the line below
Storage().delete_bucket(self.bucket_name)
time.sleep(3)
self.assertFalse(Storage().is_bucket(self.bucket_name))
Set additional breakpoints
Place breakpoints as indicated in the code comments below:
# Code excerpts from feeds/util/gcp/storage.py
class Storage():
class __OnlyOneStorage:
def __init__(self):
# Place breakpoint at the line below
confirm_credentials()
self.project = os.environ["GCLOUD_PROJECT"]
self.storage_client = storage.Client()
return
instance = None
def __init__(self):
# Place breakpoint at the line below
if not Storage.instance:
Storage.instance = Storage.__OnlyOneStorage()
self.storage_client = Storage.instance.storage_client
self.project = Storage.instance.project
def create_bucket(self,
bucket_name,
bucket_location="us-central1",
storage_class="STANDARD"):
try:
# Place breakpoint at the line below
client = self.storage_client
...
def is_bucket(self, bucket_name):
try:
# Place breakpoint at the line below
client = self.storage_client
...
VISUALLY CONFIRM A "CLEAN SLATE"
Go to https://console.cloud.google.com/storage. Confirm storage is empty.
It is helpful to know what you currently have in Cloud Storage before you start adding items.
For the new project that you just created, you should not see any buckets and the result should look like the picture below:
WALK THROUGH THE CODE
Now that we have gone through all this preparation, we're ready to start to take a look at the code in order to solidify concepts about how to automate storage.
Start the debugging process
Initially you need to start the unit test. In PyCharm, this means that you're right clicking on the green arrow next to test_create_bucket, and you are selecting debug.
In PyCharm, you can run the unit test by right clicking on the green arrow.
Confirm that you can advance from where you are in the debugging process to the next breakpoint.
In PyCharm, this is done by pressing the F9 key. If you press the F9 key after reaching the first breakpoint in setup, you should now advance to the first line in test_create_bucket.
Examine Storage initialization to understand code which only uses one instance of the google-cloud-storage Client.
Press F9 again. If you have set up your breakpoints correctly, you will now be in the first line for Storage().__init()
# Code excerpt from feeds/util/gcp/storage.py
class Storage():
class __OnlyOneStorage:
...
def __init__(self):
if not Storage.instance:
Storage.instance = Storage.__OnlyOneStorage()
...
This Storage utility class is set up so that you are not simultaneously making calls to create and delete a bucket.
Pressing F9 again should take you to the start of the create_bucket function.
Step through the code (repeatedly hitting F8 in PyCharm) in order to create the bucket.
As seen above, as you press F8 multiple times, a bucket is created for a project. It is assigned either a bucket_location or a default location of “us-central1.” After the bucket is configured, the Client is called to create the bucket on the cloud.
Storage location should be close to the consumer of the data. If you do not live close to the us-central1 region, change this default to a region that is closer to you.
View the created bucket using the Google Cloud Console
After the bucket is created, the unit test will execute the sleep function shown below:
# Code from tests/test_gcp_storage.py
def test_create_bucket(self):
"""
Test confirms that you can create a bucket
on the cloud with the JSON key file.
Results can be visually verified at
https://console.cloud.google.com/storage.
"""
# create bucket in default region (e.g. "us-central1")
Storage().create_bucket(self.bucket_name,
self.creds.google.bucket_location)
time.sleep(3)
self.assertTrue(Storage().is_bucket(self.bucket_name))
Storage().delete_bucket(self.bucket_name)
time.sleep(3)
self.assertFalse(Storage().is_bucket(self.bucket_name))
After the bucket is created, it will show up in the Google Cloud Console.
Use Python to confirm that the bucket exists
At this point, if you continue through the code (F9 in PyCharm), you will see the implementation of the is_bucket function that uses the google-cloud-storage client to determine if a bucket exists.
As we continue to step through the code, the last major piece of unit test that we want to examine is a function that deletes a bucket.
It is important to have automation in place that removes resources that are no longer in use. This is critical to best manage the fees that Google Cloud charges for the use of its services.
The code to delete the bucket looks like the following:
It could be quite possible in the future that you generate data that you would like to store. In that event, it is better if you could set up a subdirectory to better organize your data.
In this section, we are going to run a test that helps us to learn this concept of building a subdirectory programmatically.
PREPARE TO WALK THROUGH THE CODE
Set breakpoints in a unit test so that you can move through code quickly
Let's go ahead and set some breakpoints in test_create_path.
# Code from tests/test_gcp_storage.py
def test_create_path(self):
"""
Test the creation of a path in a bucket on the cloud
:return:
"""
Storage().create_bucket(self.bucket_name)
time.sleep(3)
# Place breakpoint below to confirm bucket is created
self.assertTrue(Storage().is_bucket(self.bucket_name))
result = Storage().create_path(self.bucket_name,
"sample/dir/structure/")
# Place breakpoint below to confirm subdir is created
self.assertTrue(result)
# Bucket can only be deleted if it is empty
result = Storage().delete_full_path_all_contents(
self.bucket_name, "sample/dir/structure/")
# Place breakpoint below to confirm subdir is deleted
self.assertTrue(result)
Storage().delete_bucket(self.bucket_name)
time.sleep(3)
self.assertFalse(Storage().is_bucket(self.bucket_name))
A couple of things are going on here
The first breakpoint above confirms that there is a bucket that has been created on the cloud.
We will need to implement a create_path function that returns true if a subdirectory is created in a bucket, and that returns false if the subdirectory. The associated breakpoint confirms that this function is working.
We need to create a delete_full_path_all_contents function because we can’t release the unused bucket unless it is empty. The third breakpoint confirms the successful deletion of the subdirectory.
In this exercise, we really want to understand what is required to create and delete a subdirectory, so lets go ahead and place the breakpoints seen in the comments below:
# Code excerpt - feeds/util/gcp/storage.py
def create_path(self, bucket_name, path):
try:
# Set breakpoint in the line below
gcs_client = self.storage_client
...
return True
except Exception:
return False
# Code from feeds/util/gcp/storage.py
def delete_full_path_all_contents(self,
bucket_name,
path):
try:
# Set breakpoint in the line below
gcs_client = self.storage_client
...
return True
except Exception as e:
# This is currently returning false
return False
WALK THROUGH THE CODE
In the previous exercise, we confirmed that we can create buckets and programmatically see that they exist. The first part of this test recreates that work.
Create a subdirectory programmatically
Google Cloud treats everything in a bucket as an object.
This means that if we want to create a subdirectory, we really want to create an empty object.
# Code from feeds/util/gcp/storage.py
def create_path(self, bucket_name, path):
"""
path: some/folder/name/ , MUST have the trailing slash
Return: bool - True if successful and False otherwise.
"""
try:
gcs_client = self.storage_client
bucket = gcs_client.get_bucket(bucket_name)
blob = bucket.blob(path)
blob.upload_from_string('', content_type=
'application/x-www-form-urlencoded;charset=UTF-8')
return True
except Exception:
return False
In the above code, we use a bucket that we created at the beginning of the test to construct an object.
The google-cloud-storage library refers to these objects as blobs, and as a result, we create a placeholder variable called “blob” using the following syntax:
blob = bucket.blob(path)
Then, we upload an empty string (a “0 byte object”) to create a subdirectory in the cloud.
If you pause the code at this point, go into https://console.cloud.google.com/storage,
and “drill-down” into the bucket, you will see that the directory has been created.
Remove a subdirectory programmatically
Now, we are going to do the reverse, and remove the subdirectory. In short, you are going to use Python to remove ALL objects from a bucket before you delete the bucket.
Because delete_full_path_all_contents REMOVES all objects from a bucket, you want to only use this function when you are doing final cleanup of resources.
# Code from feeds/util/gcp/storage.py
def delete_full_path_all_contents(self,
bucket_name,
path):
"""
path: some/folder/name/ - MUST have the trailing slash
Return: bool - True if successful and False otherwise.
"""
try:
gcs_client = self.storage_client
bucket = gcs_client.get_bucket(bucket_name)
blob_name_iterator = gcs_client.list_blobs(
bucket, prefix=path)
# The following code works with data at a small scale.
# It pulls in file names up to max of what fits in memory.
blob_name_list = list(blob_name_generator)
# error is thrown because there is no length
bucket.delete_blobs(blob_name_list)
# You might have to traverse up to delete the tree
return True
except Exception as e:
# This is currently returning false
return False
In the code above, we get an iterator which can be used to find blobs in the bucket. For illustration purposes, all files in this path are then listed from the iterator. Finally, we delete any files that we found INCLUDING the 0 byte placeholder for the path.
If you continue now with the remainder of test_create_path, the test will complete successfully, and when you go into https://console.cloud.google.com/storage , you will see that the bucket has been deleted.
test_create
Finally, it is worth examining a test covering a helper function that can create the bucket and the subdirectory in one line of code.
PREPARE TO WALK THROUGH THE CODE
Set breakpoints for easy code navigation
Go ahead and set some breakpoints in the create and the delete helper functions.
# Code Excerpt from feeds/util/gcp/storage.py
def create(self, path):
"""
This function creates the bucket and the full path specified.
Path: A string such as gs://{bucket-name}/path/in/bucket/
The path must have a / at the end.
Return: bool - Return True if successful and False otherwise
"""
try:
# Set a breakpoint on the line below
pattern = re.compile(r"(gs:\/\/)(?P[a-zA-Z0-9_\- ]+?)(\/)(?P.*)")
...
# Set a breakpoint on the line below
return True
except Exception:
return False
# Code Excerpt from feeds/util/gcp/storage.py
def delete(self, path):
"""
This function deletes the bucket and the full path specified.
Path: A URI (e.g. gs://{bucket-name}/path/in/bucket/ )
Return: bool - Return True if successful and False otherwise
"""
try:
# Set the breakpoint on the line below
pattern = re.compile(r"(gs:\/\/)(?P[a-zA-Z0-9_\- ]+?)(\/)(?P.*)")
...
# Set a breakpoint on the line below
return True
except Exception:
return False
In both cases, we are simply setting breakpoints at the top of each function so that we can go through these functions “line by line.”
We are also going to set a breakpoint at the end of the function so that we can pause the code and visually inspect results in the console.
WALK THROUGH THE CODE
Google Cloud typically uses Uniform Resource Indicators (URIs) to uniquely identify resources. For Cloud Storage, these URIs start with a gs:// prefix.
The unit test that we're examining is going to create and delete buckets and subdirectories using this URI format.
# Code from tests/test_gcp_storage.py
def test_create(self):
storage = Storage()
# The bucket name that you use will be determined by the
# project name that you changed in the credentials file
# STEP 1 – Use a URI to create a bucket
result = storage.create(f"gs://{self.bucket_name}/")
time.sleep(3)
self.assertTrue(result)
# STEP 2 – Use a URI to delete a bucket
delete_result = storage.delete(f"gs://{self.bucket_name}/")
time.sleep(3)
self.assertTrue(delete_result)
# STEP 3 – Use a URI to create a bucket and a subdirectory
result = storage.create(f"gs://{self.bucket_name}/nested/file_structure/")
time.sleep(3)
self.assertTrue(result)
# STEP 4 – Use a URI to delete a bucket that has an existing path
# Use the following to clean up unused resources.
# This will delete the bucket as well as the path
delete_result = storage.delete(f"gs://{self.bucket_name}/nested/file_structure/")
time.sleep(3)
self.assertTrue(delete_result)
Examine code to see how Cloud Storage URI can be used to create a bucket with or without a subdirectory
After starting to debug the unit test above, you will “step through” code that creates a bucket using just gs://{self.bucket_name} or that creates a bucket and a subdirectory using a string similar to gs://my-bucket/my/sub/directory/ .
def create(self, path):
try:
# Code block that uses regex
pattern = re.compile(r"(gs:\/\/)(?P[a-zA-Z0-9_\- ]+?)(\/)(?P.*)")
m = pattern.search(path)
bucket_name = m.group('bucket')
folder = m.group('folder')
# Now, use code that we have already constructed to determine
# if the bucket already exists
bucket_exists = self.is_bucket(bucket_name)
# If the bucket does not exist create it
if bucket_exists == False:
self.create_bucket(bucket_name)
# If a folder was specified, create the corresponding path
if len(folder) > 0:
self.create_path(bucket_name, folder)
# Set a breakpoint on the line below
# When you reach this breakpoint, visually inspect the console
return True
except Exception:
return False
A couple things to note about this create function:
The first code block uses regular expressions to identify a bucket and an optional subdirectory (which is called ‘folder’ in the above code).
At the time of this writing, Pythex is one webpage that can be used in order to get a deeper understanding of regular expressions.
Once the regular expression “figures out” the name of the bucket and the optional name of the subdirectory, then we simply reuse code discussed earlier in this lesson to create resources within Google Cloud.
One of the breakpoints that we set is on the line that occurs right before you return out of the function.
If you are debugging this function, it is helpful to go to https://console.cloud.google.com/storage to visually confirm that the buckets and subdirectories are being created.
Examine code to see how to delete a bucket with or without a subdirectory
def delete(self, path):
"""
This function creates the bucket and the full path specified.
Path: A URI - gs://{bucket-name}/path/in/bucket/
Return: bool - Return True if successful and False otherwise
"""
try:
pattern = re.compile(r"(gs:\/\/)(?P[a-zA-Z0-9_\- ]+?)(\/)(?P.*)")
m = pattern.search(path)
bucket_name = m.group('bucket')
folder = m.group('folder')
if len(folder) > 0:
self.delete_full_path_all_contents(bucket_name, folder)
bucket_exists = self.is_bucket(bucket_name)
if bucket_exists == True:
self.delete_bucket(bucket_name)
return True
except Exception:
return False
A couple things to note here:
This code is extremely similar to the create function in how the bucket and folder is identified from the URI.
If a subdirectory is detected as part of the URI, the folder is deleted from the bucket.
If the bucket identified in the URI exists on the cloud, then the bucket is also deleted.
CONGRATULATIONS
You should now have a better understanding of how to use Python to create buckets and subdirectories in Google Cloud.