Code Review for Project by learningsomethingnew · Pull Request #1 · Data4Democracy/zip-code-to-county

learningsomethingnew · 2017-07-30T21:34:23Z

WonderingStar to perform code review & LearningSomethingNew to update code on feedback.

WanderingStar

Overall comments:

Good division of work into bite-sized functions
Unfortunately, many of the functions are single-purpose because they hard-code application logic in them. Think about whether you can factor things like the names of the directories out to make more generally useful functions that you could reuse elsewhere

Most of the things I comment on below are stylistic or for greater intelligibility of the code. However the use of \ for directory separators means that this code doesn't work on non-Windows OSes (and fails in a way that gives the runner very little information about what went wrong)

WanderingStar · 2017-07-30T21:36:09Z

python-code/code.py

+
+def verify_dirs_exist():
+    #This notebook requires a few directories
+    dirs = ["download", "download\csv", "download\excel"]


Are these directories intended to be literally named download\csv and download\excel? Or are csv and excel intended to be subdirectories under download? On non-Windows OSes, \ is not the directory separator character, so this does the former. You can use os.path.join('download', 'csv') to work cross-platform.

If the excel and csv directories are meant to be subdirectories of download, you don't need to make download separately. os.makedirs operates recursively.

WanderingStar · 2017-07-30T21:36:58Z

python-code/code.py

+        full_path = os.path.join(curpath, d) # join cwd with proposed d
+        create_dir_if_not_exists(full_path)
+
+def create_dir_if_not_exists(full_path):


os.makedirs already does the file existence check, essentially.

try: os.makedirs(full_path) print(f"Created directory {full_path}") except FileExistsError: print(f"Directory {full_path} already existed")

The try/except version is slightly preferable because in the if version, there's a tiny chance that someone could create the directory between when you do the if and when you do the makedirs.

WanderingStar · 2017-07-30T21:38:31Z

python-code/code.py

+import glob
+import datetime as dt
+
+def verify_dirs_exist():


The naming of this function is a bit misleading. It doesn't just verify that they exist, it actually creates them if they don't exist. I think it should be called ensure_dirs_exist.

I would also make this function take in the list of directories to create, so that it's re-usable.

WanderingStar · 2017-07-30T21:41:00Z

python-code/code.py

+    else:
+        print("Directory ", full_path, " already existed")
+
+def generate_file_name_from_url(url):


A comment with a sample URL and the file name you're generating from it would make this much easier to understand.

WanderingStar · 2017-07-30T21:47:39Z

python-code/code.py

+        print("Directory ", full_path, " already existed")
+
+def generate_file_name_from_url(url):
+    month_year = url.split("\\")[-1].split("_")[-1].split(".")[0]


Does url.split("\\") do anything for these URLs? It looks like a sample url is https://www.huduser.gov/portal/datasets/usps/ZIP_COUNTY_032010.xlsx, which doesn't contain the \ character.

This is one of the (rare?) cases where a regular expression might make things clearer...

m = re.search('ZIP_COUNTY_(\d\d)(\d\d\d\d)\.xlsx', url) month, year = m.groups()

WanderingStar · 2017-07-30T22:13:45Z

python-code/code.py

+            #open and get the excel dataframe
+            excel_df = process_excel_file(full_file_path)
+            #merge the excel file with the fips data
+            merged_df = fips_df.merge(excel_df)


It would be better if this function did not refer to fits_df, which is defined outside of the function. That should be passed in as an argument or defined within the function.

The same is true for cur_year

WanderingStar · 2017-07-30T22:17:15Z

python-code/code.py

+            print(csv_path)
+            try:
+                merged_df.to_csv(csv_path, encoding='utf-8', index=False)
+            except:


This except is too broad. When I first tried to run this, the code did nothing because it was trying to write to a directory that didn't exist. It didn't show me the error because this clause swallowed it. If you're expecting a particular error, you should catch it as specifically as possible.

WanderingStar · 2017-07-30T22:17:45Z

python-code/code.py

+                merged_df.to_csv(csv_path, encoding='utf-8', index=False)
+            except:
+                #once we get to a Q that hasn't happened yet, we'll get an XLDRerror
+                print("Operation has completed")


Which operation has completed? This message could be more descriptive. "All files have been downloaded and processed"?

WanderingStar · 2017-07-30T22:33:35Z

python-code/code.py

+            try:
+                merged_df.to_csv(csv_path, encoding='utf-8', index=False)
+            except:
+                #once we get to a Q that hasn't happened yet, we'll get an XLDRerror


Is there a more straightforward way to detect this? Check to see if the input is empty? Check if year > now.year or year == now.year and month > now.month?

WanderingStar · 2017-07-30T22:35:02Z

python-code/code.py

+import os
+import time
+import pandas as pd
+import glob


I don't think you use glob in this code.

WanderingStar · 2017-07-30T23:36:38Z

Oh, one thing I forgot to mention. Your requirements.txt needs xlrd

converted the jupyter notebook to a .py file for review.

5632dce

WanderingStar requested changes Jul 30, 2017

View reviewed changes

Conversation

learningsomethingnew commented Jul 30, 2017

Uh oh!

WanderingStar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WanderingStar commented Jul 30, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants