Raw Data
The Supreme Court has its own Website, from which one can download all of their Slips.I downloaded the Slips. Read the Slips. And as I did, I recorded such information, as I thought would be pertinent for an End of Year Analysis, which is what this project is all about.
Collated Data
The Supreme Court Slips are text based (in a Natural Language sort of way), which is not overly conducive to analysis.So at the start of every Webpage in my write-up, I noted the following information:
R-
DATE: 2019-
DOCKET: 17-
NAME: v.
WORTHY: True, False
OPINION: {Court, Concurring, Dissenting}
AUTHOR: Per Curiam
JOINING: Roberts, Thomas, Ginsburg, Breyer, Alito, Sotomayor, Kagan, Gorsuch, Kavanaugh
GOOD: {Yes, No}
PAGES: #
Since the same Slip (the same Court Decision) may include multiple Opinions, that section was repeated as necessary.
Of course, what is shown above is what a Web Browser displays. The incoming source code (i.e. the raw html page that the browser receives) looks like the following:
<code class="summary_analysis">
R-<br>
DATE: 2019-<br>
DOCKET: 17-<br>
NAME: v. <br>
WORTHY: True, False<br>
<br class="opinion">
OPINION: {Court, Concurring, Dissenting}<br>
AUTHOR: Per Curiam<br>
JOINING: Roberts, Thomas, Ginsburg, Breyer, Alito, Sotomayor, Kagan, Gorsuch, Kavanaugh<br>
GOOD: {Yes, No}<br>
PAGES: #<br>
</code>
And this might not look any better than what one can get from a Supreme Court Slip. But it does have the advantage of being consistent. And that consistency allowed me to write a Python Script, which extracted the data from the Raw HTML and Condensed it into a simple (well, relatively simple) Python Object.
Python
There's not much to say about 2018_judges_html_extract.txt:- It's a Python Script saved as a
.txt
file, so it's easier on the server. - It will need to be renamed as a
.py
file to work.- And if you know how to do that, you probably realize there's not much point.
- It's a highly specific one-shot script.
- The Input Data is the html pages from my 2018 Term Write-Ups.
- These need to be put in a
./html
directory. - Similarly, if an
./input
directory does not exist, the program will crash.
2018_judges_html_extract
outputs three files, all of which contain the same data, utilizing different formats:2018_judges_text.txt: Most Human Readable
2018_judges_json.txt: A Popular Format
2018_judges_pickle.txt: Easiest For Me
And that's that.
From here on, I will only be using
2018_judges_pickle.txt
. It will be the input for all future analysis.In fact, the other data formats and even the extraction script itself are so specialized they serve no further purpose. And as such, I have already removed the working copies from my file system.