Climate coding challenge, Part 6¶

Getting your own data

There are more Earth Observation data online than any one person could ever look at¶

NASA’s Earth Observing System Data and Information System (EOSDIS) alone manages over 9PB of data. 1 PB is roughly 100 times the entire Library of Congress (a good approximation of all the books available in the US). It’s all available to you once you learn how to download what you want.

Here we’re using the NOAA National Centers for Environmental Information (NCEI) Access Data Service application progamming interface (API) to request data from their web servers. We will be using data collected as part of the Global Historical Climatology Network daily (GHCNd) from their Climate Data Online library program at NOAA.

For this example we’re requesting daily summary data in Boulder, CO (station ID USC00050848).

** Your task:**

  1. Research the Global Historical Climatology Network - Daily data source.
  2. In the cell below, write a 2-3 sentence description of the data source.
  3. Include a citation of the data (HINT: See the ‘Data Citation’ tab on the GHCNd overview page).

Your description should include:

  • who takes the data
  • where the data were taken
  • what the maximum temperature units are
  • how the data are collected

YOUR DATA DESCRIPTION AND CITATION HERE 🛎️

Access NCEI GHCNd Data from the internet using its API 🖥️ 📡 🖥️¶

The cell below contains the URL for the data you will use in this part of the notebook. We created this URL by generating what is called an API endpoint using the NCEI API documentation.

Note

An application programming interface (API) is a way for two or more computer programs or components to communicate with each other. It is a type of software interface, offering a service to other pieces of software (Wikipedia).

First things first – you will need to import the pandas library to access NCEI data through its URL:

In [16]:
# Import required packages
import pandas as pd
import numpy as np

Your task:

  1. Pick an expressive variable name for the URL.
  2. Reformat the URL so that it adheres to the 79-character PEP-8 line limit. You should see two vertical lines in each cell - don’t let your code go past the second line.
  3. At the end of the cell where you define your url variable, call your variable (type out its name) so it can be tested.
In [17]:
Boulder_data = ('https://www.ncei.noaa.gov/access/services/data/v1'
                '?dataset=daily-summaries'
                '&dataTypes=TOBS,PRCP'
                '&stations=USC00050848'
                '&units=standard'
                '&startDate=1901-10-01'
                '&endDate=2023-09-30')
Boulder_data
Out[17]:
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TOBS,PRCP&stations=USC00050848&units=standard&startDate=1901-10-01&endDate=2023-09-30'

Download and get started working with NCEI data¶

Just like you did with the practice data, go ahead and use pandas to import data from your API URL into Python. If you didn’t do it already, you should import the pandas library at the top of this notebook so that others who want to use your code can find it easily.

In [18]:
# Import data into Python from NCEI API
Boulder_df = pd.read_csv(
    Boulder_data,
    index_col='DATE', 
    parse_dates=True,
    na_values=['NaN'])
Boulder_df
Out[18]:
STATION PRCP TOBS
DATE
1901-10-01 USC00050848 0.0 62.0
1901-10-02 USC00050848 0.0 60.0
1901-10-03 USC00050848 0.0 57.0
1901-10-04 USC00050848 0.0 53.0
1901-10-05 USC00050848 0.0 49.0
... ... ... ...
2023-09-26 USC00050848 0.0 74.0
2023-09-27 USC00050848 0.0 69.0
2023-09-28 USC00050848 0.0 73.0
2023-09-29 USC00050848 0.0 66.0
2023-09-30 USC00050848 0.0 78.0

43862 rows × 3 columns

In [19]:
#Choose a different location
DC_data = ('https://www.ncei.noaa.gov/access/services/data/v1'
                '?dataset=daily-summaries'
                '&dataTypes=TOBS,PRCP'
                '&stations=USC00186350'
                '&units=standard'
                '&startDate=1950-10-01'
                '&endDate=2023-09-30')
DC_data
Out[19]:
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TOBS,PRCP&stations=USC00186350&units=standard&startDate=1950-10-01&endDate=2023-09-30'
In [11]:
# Import data into Python from NCEI API
DC_df = pd.read_csv(
    DC_data,
    index_col='DATE', 
    parse_dates=True,
    na_values=['NaN'])
DC_df
Out[11]:
STATION PRCP TOBS
DATE
1950-10-01 USC00186350 0.00 80.0
1950-10-02 USC00186350 0.00 81.0
1950-10-03 USC00186350 0.00 81.0
1950-10-04 USC00186350 0.27 61.0
1950-10-05 USC00186350 0.00 62.0
... ... ... ...
2023-09-26 USC00186350 0.07 58.0
2023-09-27 USC00186350 0.04 58.0
2023-09-28 USC00186350 0.00 53.0
2023-09-29 USC00186350 0.01 61.0
2023-09-30 USC00186350 0.00 66.0

25895 rows × 3 columns

In [14]:
# Plot data
DC_df.plot(
    y='TOBS',
    title='Washington, DC Temperature',
    xlabel='Year',
    ylabel='Temp ($^\circ$F)',
    legend = False)
Out[14]:
<Axes: title={'center': 'Washington, DC Temperature'}, xlabel='Year', ylabel='Temp ($^\\circ$F)'>
No description has been provided for this image
In [20]:
# Resample to annual
ann_DC_df = DC_df.resample('YS').agg({'PRCP': np.sum, 'TOBS': np.mean})
ann_DC_df
/tmp/ipykernel_30674/2078979801.py:2: FutureWarning: The provided callable <function sum at 0x73932fd37250> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass 'sum' instead.
  ann_DC_df = DC_df.resample('YS').agg({'PRCP': np.sum, 'TOBS': np.mean})
/tmp/ipykernel_30674/2078979801.py:2: FutureWarning: The provided callable <function mean at 0x73932fd581f0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass 'mean' instead.
  ann_DC_df = DC_df.resample('YS').agg({'PRCP': np.sum, 'TOBS': np.mean})
Out[20]:
PRCP TOBS
DATE
1950-01-01 9.05 53.363636
1951-01-01 39.90 64.176966
1952-01-01 48.09 61.715190
1953-01-01 20.73 67.351695
1954-01-01 32.56 64.362573
... ... ...
2019-01-01 43.92 54.866667
2020-01-01 63.66 55.389831
2021-01-01 42.55 55.108635
2022-01-01 43.04 53.752941
2023-01-01 27.35 56.803922

74 rows × 2 columns

In [22]:
# Plot the annual data
ann_DC_df.plot(
    y='PRCP',
    title='Washington, DC Average Annual Precip',
    xlabel='Year',
    ylabel='Precip (in)',
    legend = False)
Out[22]:
<Axes: title={'center': 'Washington, DC Average Annual Precip'}, xlabel='Year', ylabel='Precip (in)'>
No description has been provided for this image