Return to JUNTO

# JUNTO Practice: Data Analysis, Historical Crop Yields (Part 2)

Discussed on November 10, 2020.

Analyze this dataset and create a short report:

Please submit an abstract (~200 words) with your report.

Click to see:

# Crop Data 2

## 0. Abstract

What differentiates high-yielding countries from low-yielding ones?

Last time, we broke out the yield plots (an individual lat-long combination) by Percentile Rank among all yields. We looked at the top countries in the Percentile Rank brackets to see if we could infer anything about what kinds of countries were better or worse crop producers. We saw that there was variation among crops, with some crops showing higher yields for higher GDP countries and others showing the opposite relationship.

Now, we seek to better understand what really sets the countries apart in order to understand why some crops benefit from industrialization. We expand on this by pulling country-time level data metrics from https://databank.worldbank.org/home.aspx and exploring correlations with yield and percentile rank.

We find the external data is useful in telling a more detailed story about industrialization and crop yields. The external data helps undertand what underlying processes actually contribute to higher yields. Without the external data, GDP is too blunt of a tool to see these differences.

Further work should address the concerns we outline below and expand on statistical modeling on some of these metrics.

## 1. Concerns and Further Work

### 1.0 Country Data, Multicollinearity, and Confounding Variables

Tying the data to yields via country creates a modeling problem. While the individual metrics may appear to be numeric and varied, in aggregate, each country can be represented as a linear combination of their respective metrics. This creates problems for linear regression methods, and also casts doubt on how to interpret correlations between factors. I.e. a correlation with Factor 1 may represent a direct relationship with crop yields, or simply a correlation with country which has the characteristics of Factor 1.

While this creates challenges for modeling, it does not prevent us from asking the question “what are the characteristics of Country X that yields more for country Y?” We just can’t say anything about causation.

Further work should seek to mitigate this. Possibly by itegrating time, or creating new clustered groups that are independent from countries.

### 1.1 Time Is On My Side

In order to match the aggregation used by the previous work, we mostly ignored the time component by rolling up all time periods into an average yield for a given lat-long plot. We did the same for the country metrics. Information is lost with this step, and further work should attempt to integrate both time and geography to paint a more complete picture about the nature of crop yields.

### 1.2 Big Trouble in Big China

As before, percentile rank is calculated on the entire set of lat-long plots. The problem with this is that larger countries may be overrepresented across percentiles because of their sheer size, skewing results. For example, China regularly appears across all percentile groups; this isn’t a surprise because China is a large countrye with a varied geography likely to produce a spread of yields.

## 2. The Dirty Work

### 2.0 Listicles: Top 100 Country Metrics I Arbitrarily Chose

I chose data from the World Bank(https://databank.worldbank.org/home.aspx). The metrics were chosen based on what I thought was interesting. Given the volume of data available, there is plenty of room for future exploration with different data sources.

In our case, the volume of data helped inspire the direction of this work, forcing me to narrow down what data is interesting.

``````world_data_list.columns
# Data supplied by https://databank.worldbank.org/home.aspx
``````
``````Index(['Access to electricity (% of population)',
'Access to electricity, rural (% of rural population)',
'Access to electricity, urban (% of urban population)',
'Adjusted net enrollment rate, primary (% of primary school age children)',
'Agricultural irrigated land (% of total agricultural land)',
'Agricultural land (% of land area)', 'Agricultural land (sq. km)',
'Agricultural machinery, tractors',
'Agricultural machinery, tractors per 100 sq. km of arable land',
'Agricultural methane emissions (% of total)',
'Agricultural methane emissions (thousand metric tons of CO2 equivalent)',
'Agricultural nitrous oxide emissions (% of total)',
'Agricultural nitrous oxide emissions (thousand metric tons of CO2 equivalent)',
'Agricultural raw materials exports (% of merchandise exports)',
'Agricultural raw materials imports (% of merchandise imports)',
'Air transport, freight (million ton-km)',
'Arable land (% of land area)', 'Arable land (hectares per person)',
'Arable land (hectares)',
'Average precipitation in depth (mm per year)',
'CO2 emissions (kg per 2010 US\$ of GDP)',
'CO2 emissions (kg per 2017 PPP \$ of GDP)',
'CO2 emissions (kg per PPP \$ of GDP)', 'CO2 emissions (kt)',
'CO2 emissions (metric tons per capita)',
'CO2 emissions from manufacturing industries and construction (% of total fuel combustion)',
'CO2 emissions from transport (% of total fuel combustion)',
'Central government debt, total (% of GDP)',
'Cereal production (metric tons)', 'Cereal yield (kg per hectare)',
'Consumer price index (2010 = 100)',
'Crop production index (2004-2006 = 100)',
'Current account balance (% of GDP)',
'Current account balance (BoP, current US\$)',
'Customs and other import duties (% of tax revenue)',
'Electric power consumption (kWh per capita)', 'Expense (% of GDP)',
'Expense (current LCU)',
'Fertilizer consumption (% of fertilizer production)',
'Fertilizer consumption (kilograms per hectare of arable land)',
'Fixed telephone subscriptions',
'Fixed telephone subscriptions (per 100 people)',
'Food exports (% of merchandise exports)',
'Food imports (% of merchandise imports)',
'Food production index (2004-2006 = 100)',
'Forest area (% of land area)', 'Forest area (sq. km)',
'Fossil fuel energy consumption (% of total)',
'Fuel exports (% of merchandise exports)',
'Fuel imports (% of merchandise imports)',
'Gini index (World Bank estimate)', 'Income share held by fourth 20%',
'Income share held by highest 10%', 'Income share held by highest 20%',
'Income share held by lowest 10%', 'Income share held by lowest 20%',
'Income share held by second 20%', 'Income share held by third 20%',
'Individuals using the Internet (% of population)',
'Inflation, GDP deflator (annual %)',
'Inflation, GDP deflator: linked series (annual %)',
'Inflation, consumer prices (annual %)', 'Land area (sq. km)',
'Land area where elevation is below 5 meters (% of total land area)',
'Land under cereal production (hectares)', 'Lending interest rate (%)',
'Market capitalization of listed domestic companies (current US\$)',
'Methane emissions (% change from 1990)',
'Methane emissions (kt of CO2 equivalent)',
'Patent applications, nonresidents', 'Patent applications, residents',
'Population, total',
'Quality of port infrastructure, WEF (1=extremely underdeveloped to 7=well developed and efficient by international standards)',
'Rail lines (total route-km)',
'Railways, goods transported (million ton-km)',
'Railways, passengers carried (million passenger-km)',
'Researchers in R&D (per million people)', 'Rural land area (sq. km)',
'Rural land area where elevation is below 5 meters (% of total land area)',
'Rural land area where elevation is below 5 meters (sq. km)',
'Rural population', 'Rural population (% of total population)',
'Rural population growth (annual %)',
'Rural population living in areas where elevation is below 5 meters (% of total population)',
'Tax revenue (% of GDP)', 'Taxes on exports (% of tax revenue)',
'Taxes on international trade (% of revenue)',
'Terrestrial and marine protected areas (% of total territorial area)',
'Terrestrial protected areas (% of total land area)',
'Total greenhouse gas emissions (kt of CO2 equivalent)',
'Urban land area (sq. km)',
'Urban land area where elevation is below 5 meters (% of total land area)',
'Urban land area where elevation is below 5 meters (sq. km)',
'Urban population', 'Urban population (% of total population)',
'Urban population growth (annual %)',
'Urban population living in areas where elevation is below 5 meters (% of total population)'],
dtype='object', name='Series Name')
``````

### 2.1

We take our geographic data and roll it up to the country level. Then we calculate a correlation matrix and take the top 10 country metrics that correlate with mean yields.

After finding the most correlated columns, we take all the lat-long plots (i.e. we do not groupby country), and we group them by percentile rank. We plot the table to show that the metrics do maintain their correlation across the percentiles.

We plot these charts for each crop, allowing us to tell a story for each crop.

``````top_col_sets = []
cm_pos = sns.light_palette("green", as_cmap=True)
cm_neg = sns.light_palette(
"red", as_cmap=True, reverse=True
)

for crop in crops:
print(f"\n=========\n{crop}\n=========\n")

"""
# examining distribution of yields
figs,axs = plt.subplots(2,1,sharex=True,figsize=(20,10))

(yields_by_geo_dfs[crop]
.mean_yield
.apply(np.log10)
.plot.hist(title="Mean yields(log)",ax=axs[0],bins=20)
)
(yields_by_geo_dfs[crop]
.mean_yield
.plot
.hist(title="Mean yields",ax=axs[1],bins=20)
)

plt.show()
"""
# looking for top correlates among our metrics
metrics_cols = [
i
for i in yields_by_geo_dfs[crop].columns
if i not in "percentile_class" and "Cereal" not in i
]
crops_by_country = (
yields_by_geo_dfs[crop][metrics_cols]
.groupby("iso_a3")
.mean()
)

crop_corr = crops_by_country.corr().iloc[1:]

"""sns.heatmap(crop_corr,
xticklabels=sb_corr.columns,
yticklabels=sb_corr.columns)"""
corr_thresh = 0.3
mask = crop_corr["mean_yield"].abs() > corr_thresh
top_corr_inds = (
crop_corr["mean_yield"]
.loc[mask]
.abs()
.sort_values(ascending=False)
.head(10)
.index
)
top_corr_values = crop_corr["mean_yield"].loc[
top_corr_inds
]

top_corr_values.plot.barh(figsize=(20, 10), fontsize=20)
plt.show()

top_col_sets.append(set(top_corr_inds.tolist()))

# Looking at percentile_class
top_corr_inds_list = top_corr_inds.tolist()
top_corr_inds_list.reverse()

positive_cols = [
c
for c in top_corr_inds_list
if top_corr_values[c] > 0
]
negative_cols = [
c
for c in top_corr_inds_list
if top_corr_values[c] < 0
]

metrics_by_pr = (
yields_by_geo_dfs[crop]
.groupby("percentile_class")
.mean()[["mean_yield"] + top_corr_inds_list]
)

display(
metrics_by_pr.transpose()
.style.background_gradient(
cmap=cm_pos,
subset=pd.IndexSlice[
["mean_yield"] + positive_cols, :
],
axis=1,
)
.background_gradient(
cmap=cm_neg,
subset=pd.IndexSlice[negative_cols, :],
axis=1,
)
)
"""
# Modeling
n_model_cols = 2
top_cols = False

if top_cols:
model_cols = top_corr_inds[-n_model_cols:]
else:
model_cols = random.choices(top_corr_inds,k=n_model_cols)

X = yields_by_geo_dfs[crop][model_cols]
y = yields_by_geo_dfs[crop]["mean_yield"]

mean_yield_lmodel = sm.OLS(y,X,missing='drop',hasconst=True)
result = mean_yield_lmodel.fit()

print(f"MSE for {crop} model:",result.mse_total)
print(f"Standard Deviation for {crop} model:",y.std())
print(result.summary())
"""
``````
``````
=========
maize_major
=========
``````
percentile_class 0 1 2 3 4
mean_yield 0.592356 1.564861 3.496307 6.679924 13.734310
Income share held by third 20% 13.619052 14.094869 15.015676 15.417159 16.170740
Agricultural machinery, tractors per 100 sq. km of arable land 80.050802 55.430307 93.072368 209.270874 981.151219
lat 0.779182 4.696533 19.573285 29.359489 42.512500
Access to electricity (% of population) 40.624679 45.090158 82.270251 95.167721 99.597070
Access to electricity, rural (% of rural population) 27.304298 32.855920 76.887651 93.573158 99.128826
Urban population growth (annual %) 4.205305 3.897089 2.955311 2.814845 1.648404
Electric power consumption (kWh per capita) 854.053313 875.572700 1865.735656 2379.465473 3584.012619
Individuals using the Internet (% of population) 7.282272 8.269198 16.786779 22.074695 28.611557
Researchers in R&D (per million people) 267.544297 244.235693 773.632564 1176.247780 1766.765528
Fixed telephone subscriptions (per 100 people) 4.050600 3.773479 10.744100 15.550867 28.995657
``````
=========
soybean
=========
``````
percentile_class 0 1 2 3 4
mean_yield 0.788951 1.639987 2.345444 3.071726 4.827471
Lending interest rate (%) 11.138837 8.324481 8.434298 9.461449 12.827564
CO2 emissions from transport (% of total fuel combustion) 15.293679 8.109633 8.434483 9.829745 16.973631
Fuel exports (% of merchandise exports) 19.698995 5.994777 6.064009 6.303387 7.916197
Arable land (hectares per person) 0.171237 0.120684 0.130597 0.151387 0.260377
Taxes on international trade (% of revenue) 8.463240 4.589914 3.984755 4.347143 2.775755
Electric power consumption (kWh per capita) 1175.188909 1407.539600 1621.888600 1973.892509 4312.567223
Expense (current LCU) 5256469028254.278320 4686867863730.831055 3330644652508.867188 2480628120253.042480 482373687236.632446
Tax revenue (% of GDP) 11.313044 10.684264 11.384108 13.231229 23.633708
Expense (% of GDP) 18.429210 20.122115 23.822303 26.693116 33.017359
``````
=========
rice
=========
``````
percentile_class 0 1 2 3 4
mean_yield 0.935897 2.406612 4.313348 7.542025 12.998670
Income share held by third 20% 14.681113 14.803341 15.198031 15.305737 15.240720
Access to electricity (% of population) 45.864454 57.986399 79.710170 91.778798 93.860171
Adjusted net enrollment rate, primary (% of primary school age children) 73.375721 81.880451 90.008541 93.144190 92.703290
Researchers in R&D (per million people) 219.088665 381.852483 733.729399 960.684628 1076.399917
Urban population growth (annual %) 3.991512 3.634623 3.274958 3.391018 3.408660
Access to electricity, rural (% of rural population) 35.180937 48.939374 73.548574 89.413661 92.222032
Electric power consumption (kWh per capita) 498.877711 801.926860 1328.999935 1556.894479 2017.888472
Rural population growth (annual %) 1.587362 1.147698 0.290760 -0.462446 -0.413433
Individuals using the Internet (% of population) 6.970590 9.599750 14.816496 18.813903 20.517214
Fixed telephone subscriptions (per 100 people) 3.718424 4.317866 8.821257 11.737639 13.795124
``````
=========
rice_second
=========
``````
percentile_class 0 1 2 3 4
mean_yield 0.495623 2.354802 5.776738 9.722909 12.972619
Forest area (sq. km) 167747.740688 287374.041026 803351.706042 1454904.125231 1640273.265236
CO2 emissions (kg per 2017 PPP \$ of GDP) 0.173595 0.201124 0.348299 0.714498 0.789174
CO2 emissions (kg per PPP \$ of GDP) 0.213343 0.248324 0.414826 0.840366 0.924186
Market capitalization of listed domestic companies (current US\$) 152631776211.559784 400479979991.687805 1486586138998.351562 3859454048978.813477 4264154942928.778320
Patent applications, nonresidents 1919.332596 4014.474351 17926.039673 48646.329836 55955.436106
Patent applications, residents 461.742019 1869.908870 68140.067104 217713.650051 252359.439916
Rural land area where elevation is below 5 meters (sq. km) 8360.623341 18592.774299 44402.563471 69689.341156 75311.808007
CO2 emissions (kg per 2010 US\$ of GDP) 0.470778 0.608688 0.964552 1.803189 1.957212
CO2 emissions from manufacturing industries and construction (% of total fuel combustion) 13.279388 18.890531 27.977784 33.104289 33.684439
Taxes on exports (% of tax revenue) 1.112040 0.835889 -8.304417 -24.142773 -25.224244
``````
=========
maize_second
=========
``````
percentile_class 0 1 2 3
mean_yield 0.631479 1.101083 1.801461 2.791181
Fixed telephone subscriptions (per 100 people) 0.329384 0.298429 0.739953 1.201236
Agricultural nitrous oxide emissions (% of total) 61.925051 53.406660 71.644598 62.115542
Access to electricity (% of population) 22.321011 22.452637 40.429321 54.341885
Railways, goods transported (million ton-km) 573.152763 433.440309 743.441704 1140.684975
Methane emissions (% change from 1990) 2.447300 -3.328620 19.114687 30.990442
Agricultural methane emissions (% of total) 46.735139 38.196258 43.203511 34.143230
Average precipitation in depth (mm per year) 1189.500000 1306.481928 1239.469325 1405.955056
CO2 emissions (metric tons per capita) 0.218554 0.205518 0.701570 1.613084
Electric power consumption (kWh per capita) 114.981324 115.216630 195.715393 383.576433
Fertilizer consumption (% of fertilizer production) 322.609966 316.737609 357.392386 357.392386
``````
=========
rice_major
=========
``````
percentile_class 0 1 2 3 4
mean_yield 0.951237 2.458035 4.231717 7.223358 12.986398
Income share held by third 20% 14.666686 14.814449 15.152415 15.339024 15.248445
Access to electricity (% of population) 45.395807 59.451673 79.670136 91.111486 92.954605
Adjusted net enrollment rate, primary (% of primary school age children) 73.711620 81.142516 89.787228 93.063050 92.557435
Researchers in R&D (per million people) 235.750445 365.889818 668.255414 995.584467 1065.651683
Urban population growth (annual %) 3.946648 3.737227 3.351102 3.338890 3.426379
Access to electricity, rural (% of rural population) 35.735062 49.086575 73.438285 88.612698 91.158421
Electric power consumption (kWh per capita) 538.445389 754.034621 1220.737731 1608.275922 2025.649150
Rural population growth (annual %) 1.608235 1.080201 0.271220 -0.431068 -0.363604
Individuals using the Internet (% of population) 6.872543 10.045160 14.339164 18.951563 20.351186
Fixed telephone subscriptions (per 100 people) 3.966128 4.212658 8.283567 11.986666 13.696573
``````
=========
wheat
=========
``````
percentile_class 0 1 2 3 4
mean_yield 0.506366 1.702720 3.553375 6.531177 11.842349
Fossil fuel energy consumption (% of total) 81.120018 74.994030 77.992119 65.483996 45.348982
Electric power consumption (kWh per capita) 3050.152924 3007.906089 3085.679617 3034.329050 980.168079
CO2 emissions (kg per 2017 PPP \$ of GDP) 0.571563 0.518335 0.506201 0.447019 0.473240
CO2 emissions (kg per 2010 US\$ of GDP) 1.247140 1.175806 1.282768 1.027570 1.183025
Fuel exports (% of merchandise exports) 37.855791 19.836241 10.280509 8.849995 4.831284
Average precipitation in depth (mm per year) 471.551174 653.715802 712.572745 860.988685 774.483366
Terrestrial protected areas (% of total land area) 9.056012 12.017347 16.215313 19.973694 19.641635
Terrestrial and marine protected areas (% of total territorial area) 9.699308 12.517975 15.454703 18.791450 18.996231
Forest area (% of land area) 10.522849 17.900772 23.755045 28.779749 27.334238
``````
=========
wheat_winter
=========
``````
percentile_class 0 1 2 3 4
mean_yield 0.547509 1.769330 3.479169 6.052648 10.143167
Taxes on international trade (% of revenue) 9.601313 8.214604 5.147834 5.568284 4.996082
Individuals using the Internet (% of population) 13.402568 16.283826 23.999291 27.788992 23.037379
Customs and other import duties (% of tax revenue) 17.511811 12.499375 7.661093 8.673798 8.297058
Researchers in R&D (per million people) 521.221876 685.849185 1187.190664 1881.717183 1418.515805
Terrestrial protected areas (% of total land area) 7.659199 10.343171 18.740654 21.454271 21.364614
Fixed telephone subscriptions (per 100 people) 9.526607 10.437566 16.060387 23.011040 18.844587
Terrestrial and marine protected areas (% of total territorial area) 6.845986 9.222490 16.529489 19.771846 20.472229
Average precipitation in depth (mm per year) 514.973547 714.585729 765.819104 942.033024 936.851429
Electric power consumption (kWh per capita) 1440.640343 1863.992327 2504.619674 3574.804830 2509.923154
Forest area (% of land area) 10.093498 17.635377 27.594366 34.539210 34.141302
``````
=========
wheat_spring
=========
``````
percentile_class 0 1 2 3 4
mean_yield 0.483316 1.622768 3.585138 8.011703 13.527698
Income share held by lowest 10% 2.971000 2.769359 2.921584 2.690314 2.725081
Access to electricity, rural (% of rural population) 90.155599 88.362529 91.898084 74.307826 27.460144
Income share held by fourth 20% 22.165894 22.232218 22.361242 21.773560 20.937038
Income share held by lowest 20% 7.381283 7.025003 7.379500 6.599536 6.560988
lat 24.927911 17.820053 26.676357 32.453340 0.144942
Gini index (World Bank estimate) 35.222547 36.164093 34.473186 38.637675 40.438264
Income share held by highest 10% 27.504606 27.937561 26.790070 30.139639 32.268456
Income share held by third 20% 15.879847 15.768626 16.213255 15.196842 14.522152
Income share held by second 20% 11.679072 11.462021 11.932645 10.766121 10.348148
Income share held by highest 20% 42.890836 43.506083 42.108930 45.634286 47.611437
``````
=========
maize
=========
``````
percentile_class 0 1 2 3 4
mean_yield 0.599288 1.581045 3.509866 6.682521 13.734310
Income share held by third 20% 13.614513 14.093985 15.038031 15.416953 16.170740
lat 0.690326 4.884267 19.640476 29.398837 42.512500
Agricultural machinery, tractors per 100 sq. km of arable land 79.657236 55.061393 94.302815 209.574884 981.151219
Access to electricity (% of population) 40.574729 45.018935 82.919112 95.181296 99.597070
Access to electricity, rural (% of rural population) 27.348178 32.753088 77.525433 93.594916 99.128826
Urban population growth (annual %) 4.203101 3.897683 2.947758 2.813349 1.648404
Electric power consumption (kWh per capita) 861.331634 860.185594 1884.788776 2382.177496 3584.012619
Individuals using the Internet (% of population) 7.304090 8.208503 16.936062 22.090383 28.611557
Researchers in R&D (per million people) 270.847740 240.989832 777.537122 1177.623224 1766.765528
Fixed telephone subscriptions (per 100 people) 4.079436 3.731866 10.839374 15.565544 28.995657

### 2.2 Top Metrics

We denote the top metrics columns across all the crops.

``````set.union(*top_col_sets)
``````
``````{'Access to electricity (% of population)',
'Access to electricity, rural (% of rural population)',
'Adjusted net enrollment rate, primary (% of primary school age children)',
'Agricultural machinery, tractors per 100 sq. km of arable land',
'Agricultural methane emissions (% of total)',
'Agricultural nitrous oxide emissions (% of total)',
'Arable land (hectares per person)',
'Average precipitation in depth (mm per year)',
'CO2 emissions (kg per 2010 US\$ of GDP)',
'CO2 emissions (kg per 2017 PPP \$ of GDP)',
'CO2 emissions (kg per PPP \$ of GDP)',
'CO2 emissions (metric tons per capita)',
'CO2 emissions from manufacturing industries and construction (% of total fuel combustion)',
'CO2 emissions from transport (% of total fuel combustion)',
'Cereal yield (kg per hectare)',
'Customs and other import duties (% of tax revenue)',
'Electric power consumption (kWh per capita)',
'Expense (% of GDP)',
'Expense (current LCU)',
'Fertilizer consumption (% of fertilizer production)',
'Fixed telephone subscriptions (per 100 people)',
'Forest area (% of land area)',
'Fossil fuel energy consumption (% of total)',
'Fuel exports (% of merchandise exports)',
'Gini index (World Bank estimate)',
'Income share held by fourth 20%',
'Income share held by highest 10%',
'Income share held by highest 20%',
'Income share held by lowest 10%',
'Income share held by lowest 20%',
'Income share held by second 20%',
'Income share held by third 20%',
'Individuals using the Internet (% of population)',
'Lending interest rate (%)',
'Market capitalization of listed domestic companies (current US\$)',
'Methane emissions (% change from 1990)',
'Patent applications, nonresidents',
'Patent applications, residents',
'Railways, goods transported (million ton-km)',
'Researchers in R&D (per million people)',
'Rural land area where elevation is below 5 meters (sq. km)',
'Rural population growth (annual %)',
'Tax revenue (% of GDP)',
'Taxes on exports (% of tax revenue)',
'Taxes on international trade (% of revenue)',
'Terrestrial and marine protected areas (% of total territorial area)',
'Terrestrial protected areas (% of total land area)',
'Urban population growth (annual %)',
'lat'}
``````

## 3. Closing Thoughts

We see that, for many crops, the main factors driving higher mean yields are related to indicators for technological progress.

In order to better understand the industrialization factor in crop yields, future work should look at total yield rather than mean yield. This would help clarify if countries with lower yields make up for it with larger overall production.

### Daniel Bassett

``````from shutil import unpack_archive

unpack_archive("gdhy_v1.2_v1.3_20190128.zip")
``````

# Crop Yield Analysis

by Daniel Bassett

## Abstract

Based on data[1] from Pangaea, Data Publisher for Earth and Environmental Science, we are going to run analysis on global historical crop yields from 1981-2016. We will be testing the thesis: crop yields are best grown in a specified range away from latitudinal 0. We first test this with several crops in the year 1990, before testing over the span of multiple years using only wheat. In conclusion, we do find that each crop ‘prefers’ a certain latitude away from 0. In particular, we find that wheat yields the best between 45-50 degrees away from 0.

``````import netCDF4
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
``````

## 1.0: Scope

Because this is a large dataset with several different crops, and a large time interval, we are going to narrow our scope. A good starting place is to focus on the year 1990. Subsequently, we will add in the next years in the time sequence, namely, 1991-1995. We will be assessing all four of the available crops, wheat, soybean, rice, maize, during this period. After cleaning our data to fit the scope, we will continue on to our next section wherein some basic analysis of these variables will be tested.

``````rice_path = "rice/yield_1990.nc4"
wheat_path = "wheat/yield_1990.nc4"
maize_path = "maize/yield_1990.nc4"
soy_path = "soybean/yield_1990.nc4"
``````
``````ncr = netCDF4.Dataset(rice_path, "r")
ncw = netCDF4.Dataset(wheat_path, "r")
ncm = netCDF4.Dataset(maize_path, "r")
ncs = netCDF4.Dataset(soy_path, "r")
``````
``````print(ncr.variables.keys(), ncw.variables.keys())
``````
``````dict_keys(['lon', 'lat', 'var']) dict_keys(['lon', 'lat', 'var'])
``````

As we expect, the crops seem to all have the same variables.

``````print(ncr["lon"], ncr["var"])
``````
``````<class 'netCDF4._netCDF4.Variable'>
float64 lon(lon)
units: degrees_east
long_name: Longitude
unlimited dimensions:
current shape = (720,)
filling on, default _FillValue of 9.969209968386869e+36 used <class 'netCDF4._netCDF4.Variable'>
float32 var(lat, lon)
_FillValue: -999000000.0
unlimited dimensions:
current shape = (360, 720)
filling on
``````
``````for dim in ncr.dimensions.items():
print(dim)
``````
``````('lon', <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 720)
('lat', <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 360)
``````
``````rlon = ncr["lon"]
rice_lon = rlon[:]
mlon = ncm["lon"]
maize_lon = mlon[:]
wlon = ncw["lon"]
wheat_lon = wlon[:]
slon = ncs["lon"]
soy_lon = slon[:]
``````
``````print(rice_lon, maize_lon)
``````
``````[2.5000e-01 7.5000e-01 1.2500e+00 1.7500e+00 2.2500e+00 2.7500e+00
3.2500e+00 3.7500e+00 4.2500e+00 4.7500e+00 5.2500e+00 5.7500e+00
6.2500e+00 6.7500e+00 7.2500e+00 7.7500e+00 8.2500e+00 8.7500e+00
9.2500e+00 9.7500e+00 1.0250e+01 1.0750e+01 1.1250e+01 1.1750e+01
1.2250e+01 1.2750e+01 1.3250e+01 1.3750e+01 1.4250e+01 1.4750e+01
1.5250e+01 1.5750e+01 1.6250e+01 1.6750e+01 1.7250e+01 1.7750e+01
1.8250e+01 1.8750e+01 1.9250e+01 1.9750e+01 2.0250e+01 2.0750e+01
2.1250e+01 2.1750e+01 2.2250e+01 2.2750e+01 2.3250e+01 2.3750e+01
2.4250e+01 2.4750e+01 2.5250e+01 2.5750e+01 2.6250e+01 2.6750e+01
2.7250e+01 2.7750e+01 2.8250e+01 2.8750e+01 2.9250e+01 2.9750e+01
3.0250e+01 3.0750e+01 3.1250e+01 3.1750e+01 3.2250e+01 3.2750e+01
3.3250e+01 3.3750e+01 3.4250e+01 3.4750e+01 3.5250e+01 3.5750e+01
3.6250e+01 3.6750e+01 3.7250e+01 3.7750e+01 3.8250e+01 3.8750e+01
3.9250e+01 3.9750e+01 4.0250e+01 4.0750e+01 4.1250e+01 4.1750e+01
4.2250e+01 4.2750e+01 4.3250e+01 4.3750e+01 4.4250e+01 4.4750e+01
4.5250e+01 4.5750e+01 4.6250e+01 4.6750e+01 4.7250e+01 4.7750e+01
4.8250e+01 4.8750e+01 4.9250e+01 4.9750e+01 5.0250e+01 5.0750e+01
5.1250e+01 5.1750e+01 5.2250e+01 5.2750e+01 5.3250e+01 5.3750e+01
5.4250e+01 5.4750e+01 5.5250e+01 5.5750e+01 5.6250e+01 5.6750e+01
5.7250e+01 5.7750e+01 5.8250e+01 5.8750e+01 5.9250e+01 5.9750e+01
6.0250e+01 6.0750e+01 6.1250e+01 6.1750e+01 6.2250e+01 6.2750e+01
6.3250e+01 6.3750e+01 6.4250e+01 6.4750e+01 6.5250e+01 6.5750e+01
6.6250e+01 6.6750e+01 6.7250e+01 6.7750e+01 6.8250e+01 6.8750e+01
6.9250e+01 6.9750e+01 7.0250e+01 7.0750e+01 7.1250e+01 7.1750e+01
7.2250e+01 7.2750e+01 7.3250e+01 7.3750e+01 7.4250e+01 7.4750e+01
7.5250e+01 7.5750e+01 7.6250e+01 7.6750e+01 7.7250e+01 7.7750e+01
7.8250e+01 7.8750e+01 7.9250e+01 7.9750e+01 8.0250e+01 8.0750e+01
8.1250e+01 8.1750e+01 8.2250e+01 8.2750e+01 8.3250e+01 8.3750e+01
8.4250e+01 8.4750e+01 8.5250e+01 8.5750e+01 8.6250e+01 8.6750e+01
8.7250e+01 8.7750e+01 8.8250e+01 8.8750e+01 8.9250e+01 8.9750e+01
9.0250e+01 9.0750e+01 9.1250e+01 9.1750e+01 9.2250e+01 9.2750e+01
9.3250e+01 9.3750e+01 9.4250e+01 9.4750e+01 9.5250e+01 9.5750e+01
9.6250e+01 9.6750e+01 9.7250e+01 9.7750e+01 9.8250e+01 9.8750e+01
9.9250e+01 9.9750e+01 1.0025e+02 1.0075e+02 1.0125e+02 1.0175e+02
1.0225e+02 1.0275e+02 1.0325e+02 1.0375e+02 1.0425e+02 1.0475e+02
1.0525e+02 1.0575e+02 1.0625e+02 1.0675e+02 1.0725e+02 1.0775e+02
1.0825e+02 1.0875e+02 1.0925e+02 1.0975e+02 1.1025e+02 1.1075e+02
1.1125e+02 1.1175e+02 1.1225e+02 1.1275e+02 1.1325e+02 1.1375e+02
1.1425e+02 1.1475e+02 1.1525e+02 1.1575e+02 1.1625e+02 1.1675e+02
1.1725e+02 1.1775e+02 1.1825e+02 1.1875e+02 1.1925e+02 1.1975e+02
1.2025e+02 1.2075e+02 1.2125e+02 1.2175e+02 1.2225e+02 1.2275e+02
1.2325e+02 1.2375e+02 1.2425e+02 1.2475e+02 1.2525e+02 1.2575e+02
1.2625e+02 1.2675e+02 1.2725e+02 1.2775e+02 1.2825e+02 1.2875e+02
1.2925e+02 1.2975e+02 1.3025e+02 1.3075e+02 1.3125e+02 1.3175e+02
1.3225e+02 1.3275e+02 1.3325e+02 1.3375e+02 1.3425e+02 1.3475e+02
1.3525e+02 1.3575e+02 1.3625e+02 1.3675e+02 1.3725e+02 1.3775e+02
1.3825e+02 1.3875e+02 1.3925e+02 1.3975e+02 1.4025e+02 1.4075e+02
1.4125e+02 1.4175e+02 1.4225e+02 1.4275e+02 1.4325e+02 1.4375e+02
1.4425e+02 1.4475e+02 1.4525e+02 1.4575e+02 1.4625e+02 1.4675e+02
1.4725e+02 1.4775e+02 1.4825e+02 1.4875e+02 1.4925e+02 1.4975e+02
1.5025e+02 1.5075e+02 1.5125e+02 1.5175e+02 1.5225e+02 1.5275e+02
1.5325e+02 1.5375e+02 1.5425e+02 1.5475e+02 1.5525e+02 1.5575e+02
1.5625e+02 1.5675e+02 1.5725e+02 1.5775e+02 1.5825e+02 1.5875e+02
1.5925e+02 1.5975e+02 1.6025e+02 1.6075e+02 1.6125e+02 1.6175e+02
1.6225e+02 1.6275e+02 1.6325e+02 1.6375e+02 1.6425e+02 1.6475e+02
1.6525e+02 1.6575e+02 1.6625e+02 1.6675e+02 1.6725e+02 1.6775e+02
1.6825e+02 1.6875e+02 1.6925e+02 1.6975e+02 1.7025e+02 1.7075e+02
1.7125e+02 1.7175e+02 1.7225e+02 1.7275e+02 1.7325e+02 1.7375e+02
1.7425e+02 1.7475e+02 1.7525e+02 1.7575e+02 1.7625e+02 1.7675e+02
1.7725e+02 1.7775e+02 1.7825e+02 1.7875e+02 1.7925e+02 1.7975e+02
1.8025e+02 1.8075e+02 1.8125e+02 1.8175e+02 1.8225e+02 1.8275e+02
1.8325e+02 1.8375e+02 1.8425e+02 1.8475e+02 1.8525e+02 1.8575e+02
1.8625e+02 1.8675e+02 1.8725e+02 1.8775e+02 1.8825e+02 1.8875e+02
1.8925e+02 1.8975e+02 1.9025e+02 1.9075e+02 1.9125e+02 1.9175e+02
1.9225e+02 1.9275e+02 1.9325e+02 1.9375e+02 1.9425e+02 1.9475e+02
1.9525e+02 1.9575e+02 1.9625e+02 1.9675e+02 1.9725e+02 1.9775e+02
1.9825e+02 1.9875e+02 1.9925e+02 1.9975e+02 2.0025e+02 2.0075e+02
2.0125e+02 2.0175e+02 2.0225e+02 2.0275e+02 2.0325e+02 2.0375e+02
2.0425e+02 2.0475e+02 2.0525e+02 2.0575e+02 2.0625e+02 2.0675e+02
2.0725e+02 2.0775e+02 2.0825e+02 2.0875e+02 2.0925e+02 2.0975e+02
2.1025e+02 2.1075e+02 2.1125e+02 2.1175e+02 2.1225e+02 2.1275e+02
2.1325e+02 2.1375e+02 2.1425e+02 2.1475e+02 2.1525e+02 2.1575e+02
2.1625e+02 2.1675e+02 2.1725e+02 2.1775e+02 2.1825e+02 2.1875e+02
2.1925e+02 2.1975e+02 2.2025e+02 2.2075e+02 2.2125e+02 2.2175e+02
2.2225e+02 2.2275e+02 2.2325e+02 2.2375e+02 2.2425e+02 2.2475e+02
2.2525e+02 2.2575e+02 2.2625e+02 2.2675e+02 2.2725e+02 2.2775e+02
2.2825e+02 2.2875e+02 2.2925e+02 2.2975e+02 2.3025e+02 2.3075e+02
2.3125e+02 2.3175e+02 2.3225e+02 2.3275e+02 2.3325e+02 2.3375e+02
2.3425e+02 2.3475e+02 2.3525e+02 2.3575e+02 2.3625e+02 2.3675e+02
2.3725e+02 2.3775e+02 2.3825e+02 2.3875e+02 2.3925e+02 2.3975e+02
2.4025e+02 2.4075e+02 2.4125e+02 2.4175e+02 2.4225e+02 2.4275e+02
2.4325e+02 2.4375e+02 2.4425e+02 2.4475e+02 2.4525e+02 2.4575e+02
2.4625e+02 2.4675e+02 2.4725e+02 2.4775e+02 2.4825e+02 2.4875e+02
2.4925e+02 2.4975e+02 2.5025e+02 2.5075e+02 2.5125e+02 2.5175e+02
2.5225e+02 2.5275e+02 2.5325e+02 2.5375e+02 2.5425e+02 2.5475e+02
2.5525e+02 2.5575e+02 2.5625e+02 2.5675e+02 2.5725e+02 2.5775e+02
2.5825e+02 2.5875e+02 2.5925e+02 2.5975e+02 2.6025e+02 2.6075e+02
2.6125e+02 2.6175e+02 2.6225e+02 2.6275e+02 2.6325e+02 2.6375e+02
2.6425e+02 2.6475e+02 2.6525e+02 2.6575e+02 2.6625e+02 2.6675e+02
2.6725e+02 2.6775e+02 2.6825e+02 2.6875e+02 2.6925e+02 2.6975e+02
2.7025e+02 2.7075e+02 2.7125e+02 2.7175e+02 2.7225e+02 2.7275e+02
2.7325e+02 2.7375e+02 2.7425e+02 2.7475e+02 2.7525e+02 2.7575e+02
2.7625e+02 2.7675e+02 2.7725e+02 2.7775e+02 2.7825e+02 2.7875e+02
2.7925e+02 2.7975e+02 2.8025e+02 2.8075e+02 2.8125e+02 2.8175e+02
2.8225e+02 2.8275e+02 2.8325e+02 2.8375e+02 2.8425e+02 2.8475e+02
2.8525e+02 2.8575e+02 2.8625e+02 2.8675e+02 2.8725e+02 2.8775e+02
2.8825e+02 2.8875e+02 2.8925e+02 2.8975e+02 2.9025e+02 2.9075e+02
2.9125e+02 2.9175e+02 2.9225e+02 2.9275e+02 2.9325e+02 2.9375e+02
2.9425e+02 2.9475e+02 2.9525e+02 2.9575e+02 2.9625e+02 2.9675e+02
2.9725e+02 2.9775e+02 2.9825e+02 2.9875e+02 2.9925e+02 2.9975e+02
3.0025e+02 3.0075e+02 3.0125e+02 3.0175e+02 3.0225e+02 3.0275e+02
3.0325e+02 3.0375e+02 3.0425e+02 3.0475e+02 3.0525e+02 3.0575e+02
3.0625e+02 3.0675e+02 3.0725e+02 3.0775e+02 3.0825e+02 3.0875e+02
3.0925e+02 3.0975e+02 3.1025e+02 3.1075e+02 3.1125e+02 3.1175e+02
3.1225e+02 3.1275e+02 3.1325e+02 3.1375e+02 3.1425e+02 3.1475e+02
3.1525e+02 3.1575e+02 3.1625e+02 3.1675e+02 3.1725e+02 3.1775e+02
3.1825e+02 3.1875e+02 3.1925e+02 3.1975e+02 3.2025e+02 3.2075e+02
3.2125e+02 3.2175e+02 3.2225e+02 3.2275e+02 3.2325e+02 3.2375e+02
3.2425e+02 3.2475e+02 3.2525e+02 3.2575e+02 3.2625e+02 3.2675e+02
3.2725e+02 3.2775e+02 3.2825e+02 3.2875e+02 3.2925e+02 3.2975e+02
3.3025e+02 3.3075e+02 3.3125e+02 3.3175e+02 3.3225e+02 3.3275e+02
3.3325e+02 3.3375e+02 3.3425e+02 3.3475e+02 3.3525e+02 3.3575e+02
3.3625e+02 3.3675e+02 3.3725e+02 3.3775e+02 3.3825e+02 3.3875e+02
3.3925e+02 3.3975e+02 3.4025e+02 3.4075e+02 3.4125e+02 3.4175e+02
3.4225e+02 3.4275e+02 3.4325e+02 3.4375e+02 3.4425e+02 3.4475e+02
3.4525e+02 3.4575e+02 3.4625e+02 3.4675e+02 3.4725e+02 3.4775e+02
3.4825e+02 3.4875e+02 3.4925e+02 3.4975e+02 3.5025e+02 3.5075e+02
3.5125e+02 3.5175e+02 3.5225e+02 3.5275e+02 3.5325e+02 3.5375e+02
3.5425e+02 3.5475e+02 3.5525e+02 3.5575e+02 3.5625e+02 3.5675e+02
3.5725e+02 3.5775e+02 3.5825e+02 3.5875e+02 3.5925e+02 3.5975e+02] [2.5000e-01 7.5000e-01 1.2500e+00 1.7500e+00 2.2500e+00 2.7500e+00
3.2500e+00 3.7500e+00 4.2500e+00 4.7500e+00 5.2500e+00 5.7500e+00
6.2500e+00 6.7500e+00 7.2500e+00 7.7500e+00 8.2500e+00 8.7500e+00
9.2500e+00 9.7500e+00 1.0250e+01 1.0750e+01 1.1250e+01 1.1750e+01
1.2250e+01 1.2750e+01 1.3250e+01 1.3750e+01 1.4250e+01 1.4750e+01
1.5250e+01 1.5750e+01 1.6250e+01 1.6750e+01 1.7250e+01 1.7750e+01
1.8250e+01 1.8750e+01 1.9250e+01 1.9750e+01 2.0250e+01 2.0750e+01
2.1250e+01 2.1750e+01 2.2250e+01 2.2750e+01 2.3250e+01 2.3750e+01
2.4250e+01 2.4750e+01 2.5250e+01 2.5750e+01 2.6250e+01 2.6750e+01
2.7250e+01 2.7750e+01 2.8250e+01 2.8750e+01 2.9250e+01 2.9750e+01
3.0250e+01 3.0750e+01 3.1250e+01 3.1750e+01 3.2250e+01 3.2750e+01
3.3250e+01 3.3750e+01 3.4250e+01 3.4750e+01 3.5250e+01 3.5750e+01
3.6250e+01 3.6750e+01 3.7250e+01 3.7750e+01 3.8250e+01 3.8750e+01
3.9250e+01 3.9750e+01 4.0250e+01 4.0750e+01 4.1250e+01 4.1750e+01
4.2250e+01 4.2750e+01 4.3250e+01 4.3750e+01 4.4250e+01 4.4750e+01
4.5250e+01 4.5750e+01 4.6250e+01 4.6750e+01 4.7250e+01 4.7750e+01
4.8250e+01 4.8750e+01 4.9250e+01 4.9750e+01 5.0250e+01 5.0750e+01
5.1250e+01 5.1750e+01 5.2250e+01 5.2750e+01 5.3250e+01 5.3750e+01
5.4250e+01 5.4750e+01 5.5250e+01 5.5750e+01 5.6250e+01 5.6750e+01
5.7250e+01 5.7750e+01 5.8250e+01 5.8750e+01 5.9250e+01 5.9750e+01
6.0250e+01 6.0750e+01 6.1250e+01 6.1750e+01 6.2250e+01 6.2750e+01
6.3250e+01 6.3750e+01 6.4250e+01 6.4750e+01 6.5250e+01 6.5750e+01
6.6250e+01 6.6750e+01 6.7250e+01 6.7750e+01 6.8250e+01 6.8750e+01
6.9250e+01 6.9750e+01 7.0250e+01 7.0750e+01 7.1250e+01 7.1750e+01
7.2250e+01 7.2750e+01 7.3250e+01 7.3750e+01 7.4250e+01 7.4750e+01
7.5250e+01 7.5750e+01 7.6250e+01 7.6750e+01 7.7250e+01 7.7750e+01
7.8250e+01 7.8750e+01 7.9250e+01 7.9750e+01 8.0250e+01 8.0750e+01
8.1250e+01 8.1750e+01 8.2250e+01 8.2750e+01 8.3250e+01 8.3750e+01
8.4250e+01 8.4750e+01 8.5250e+01 8.5750e+01 8.6250e+01 8.6750e+01
8.7250e+01 8.7750e+01 8.8250e+01 8.8750e+01 8.9250e+01 8.9750e+01
9.0250e+01 9.0750e+01 9.1250e+01 9.1750e+01 9.2250e+01 9.2750e+01
9.3250e+01 9.3750e+01 9.4250e+01 9.4750e+01 9.5250e+01 9.5750e+01
9.6250e+01 9.6750e+01 9.7250e+01 9.7750e+01 9.8250e+01 9.8750e+01
9.9250e+01 9.9750e+01 1.0025e+02 1.0075e+02 1.0125e+02 1.0175e+02
1.0225e+02 1.0275e+02 1.0325e+02 1.0375e+02 1.0425e+02 1.0475e+02
1.0525e+02 1.0575e+02 1.0625e+02 1.0675e+02 1.0725e+02 1.0775e+02
1.0825e+02 1.0875e+02 1.0925e+02 1.0975e+02 1.1025e+02 1.1075e+02
1.1125e+02 1.1175e+02 1.1225e+02 1.1275e+02 1.1325e+02 1.1375e+02
1.1425e+02 1.1475e+02 1.1525e+02 1.1575e+02 1.1625e+02 1.1675e+02
1.1725e+02 1.1775e+02 1.1825e+02 1.1875e+02 1.1925e+02 1.1975e+02
1.2025e+02 1.2075e+02 1.2125e+02 1.2175e+02 1.2225e+02 1.2275e+02
1.2325e+02 1.2375e+02 1.2425e+02 1.2475e+02 1.2525e+02 1.2575e+02
1.2625e+02 1.2675e+02 1.2725e+02 1.2775e+02 1.2825e+02 1.2875e+02
1.2925e+02 1.2975e+02 1.3025e+02 1.3075e+02 1.3125e+02 1.3175e+02
1.3225e+02 1.3275e+02 1.3325e+02 1.3375e+02 1.3425e+02 1.3475e+02
1.3525e+02 1.3575e+02 1.3625e+02 1.3675e+02 1.3725e+02 1.3775e+02
1.3825e+02 1.3875e+02 1.3925e+02 1.3975e+02 1.4025e+02 1.4075e+02
1.4125e+02 1.4175e+02 1.4225e+02 1.4275e+02 1.4325e+02 1.4375e+02
1.4425e+02 1.4475e+02 1.4525e+02 1.4575e+02 1.4625e+02 1.4675e+02
1.4725e+02 1.4775e+02 1.4825e+02 1.4875e+02 1.4925e+02 1.4975e+02
1.5025e+02 1.5075e+02 1.5125e+02 1.5175e+02 1.5225e+02 1.5275e+02
1.5325e+02 1.5375e+02 1.5425e+02 1.5475e+02 1.5525e+02 1.5575e+02
1.5625e+02 1.5675e+02 1.5725e+02 1.5775e+02 1.5825e+02 1.5875e+02
1.5925e+02 1.5975e+02 1.6025e+02 1.6075e+02 1.6125e+02 1.6175e+02
1.6225e+02 1.6275e+02 1.6325e+02 1.6375e+02 1.6425e+02 1.6475e+02
1.6525e+02 1.6575e+02 1.6625e+02 1.6675e+02 1.6725e+02 1.6775e+02
1.6825e+02 1.6875e+02 1.6925e+02 1.6975e+02 1.7025e+02 1.7075e+02
1.7125e+02 1.7175e+02 1.7225e+02 1.7275e+02 1.7325e+02 1.7375e+02
1.7425e+02 1.7475e+02 1.7525e+02 1.7575e+02 1.7625e+02 1.7675e+02
1.7725e+02 1.7775e+02 1.7825e+02 1.7875e+02 1.7925e+02 1.7975e+02
1.8025e+02 1.8075e+02 1.8125e+02 1.8175e+02 1.8225e+02 1.8275e+02
1.8325e+02 1.8375e+02 1.8425e+02 1.8475e+02 1.8525e+02 1.8575e+02
1.8625e+02 1.8675e+02 1.8725e+02 1.8775e+02 1.8825e+02 1.8875e+02
1.8925e+02 1.8975e+02 1.9025e+02 1.9075e+02 1.9125e+02 1.9175e+02
1.9225e+02 1.9275e+02 1.9325e+02 1.9375e+02 1.9425e+02 1.9475e+02
1.9525e+02 1.9575e+02 1.9625e+02 1.9675e+02 1.9725e+02 1.9775e+02
1.9825e+02 1.9875e+02 1.9925e+02 1.9975e+02 2.0025e+02 2.0075e+02
2.0125e+02 2.0175e+02 2.0225e+02 2.0275e+02 2.0325e+02 2.0375e+02
2.0425e+02 2.0475e+02 2.0525e+02 2.0575e+02 2.0625e+02 2.0675e+02
2.0725e+02 2.0775e+02 2.0825e+02 2.0875e+02 2.0925e+02 2.0975e+02
2.1025e+02 2.1075e+02 2.1125e+02 2.1175e+02 2.1225e+02 2.1275e+02
2.1325e+02 2.1375e+02 2.1425e+02 2.1475e+02 2.1525e+02 2.1575e+02
2.1625e+02 2.1675e+02 2.1725e+02 2.1775e+02 2.1825e+02 2.1875e+02
2.1925e+02 2.1975e+02 2.2025e+02 2.2075e+02 2.2125e+02 2.2175e+02
2.2225e+02 2.2275e+02 2.2325e+02 2.2375e+02 2.2425e+02 2.2475e+02
2.2525e+02 2.2575e+02 2.2625e+02 2.2675e+02 2.2725e+02 2.2775e+02
2.2825e+02 2.2875e+02 2.2925e+02 2.2975e+02 2.3025e+02 2.3075e+02
2.3125e+02 2.3175e+02 2.3225e+02 2.3275e+02 2.3325e+02 2.3375e+02
2.3425e+02 2.3475e+02 2.3525e+02 2.3575e+02 2.3625e+02 2.3675e+02
2.3725e+02 2.3775e+02 2.3825e+02 2.3875e+02 2.3925e+02 2.3975e+02
2.4025e+02 2.4075e+02 2.4125e+02 2.4175e+02 2.4225e+02 2.4275e+02
2.4325e+02 2.4375e+02 2.4425e+02 2.4475e+02 2.4525e+02 2.4575e+02
2.4625e+02 2.4675e+02 2.4725e+02 2.4775e+02 2.4825e+02 2.4875e+02
2.4925e+02 2.4975e+02 2.5025e+02 2.5075e+02 2.5125e+02 2.5175e+02
2.5225e+02 2.5275e+02 2.5325e+02 2.5375e+02 2.5425e+02 2.5475e+02
2.5525e+02 2.5575e+02 2.5625e+02 2.5675e+02 2.5725e+02 2.5775e+02
2.5825e+02 2.5875e+02 2.5925e+02 2.5975e+02 2.6025e+02 2.6075e+02
2.6125e+02 2.6175e+02 2.6225e+02 2.6275e+02 2.6325e+02 2.6375e+02
2.6425e+02 2.6475e+02 2.6525e+02 2.6575e+02 2.6625e+02 2.6675e+02
2.6725e+02 2.6775e+02 2.6825e+02 2.6875e+02 2.6925e+02 2.6975e+02
2.7025e+02 2.7075e+02 2.7125e+02 2.7175e+02 2.7225e+02 2.7275e+02
2.7325e+02 2.7375e+02 2.7425e+02 2.7475e+02 2.7525e+02 2.7575e+02
2.7625e+02 2.7675e+02 2.7725e+02 2.7775e+02 2.7825e+02 2.7875e+02
2.7925e+02 2.7975e+02 2.8025e+02 2.8075e+02 2.8125e+02 2.8175e+02
2.8225e+02 2.8275e+02 2.8325e+02 2.8375e+02 2.8425e+02 2.8475e+02
2.8525e+02 2.8575e+02 2.8625e+02 2.8675e+02 2.8725e+02 2.8775e+02
2.8825e+02 2.8875e+02 2.8925e+02 2.8975e+02 2.9025e+02 2.9075e+02
2.9125e+02 2.9175e+02 2.9225e+02 2.9275e+02 2.9325e+02 2.9375e+02
2.9425e+02 2.9475e+02 2.9525e+02 2.9575e+02 2.9625e+02 2.9675e+02
2.9725e+02 2.9775e+02 2.9825e+02 2.9875e+02 2.9925e+02 2.9975e+02
3.0025e+02 3.0075e+02 3.0125e+02 3.0175e+02 3.0225e+02 3.0275e+02
3.0325e+02 3.0375e+02 3.0425e+02 3.0475e+02 3.0525e+02 3.0575e+02
3.0625e+02 3.0675e+02 3.0725e+02 3.0775e+02 3.0825e+02 3.0875e+02
3.0925e+02 3.0975e+02 3.1025e+02 3.1075e+02 3.1125e+02 3.1175e+02
3.1225e+02 3.1275e+02 3.1325e+02 3.1375e+02 3.1425e+02 3.1475e+02
3.1525e+02 3.1575e+02 3.1625e+02 3.1675e+02 3.1725e+02 3.1775e+02
3.1825e+02 3.1875e+02 3.1925e+02 3.1975e+02 3.2025e+02 3.2075e+02
3.2125e+02 3.2175e+02 3.2225e+02 3.2275e+02 3.2325e+02 3.2375e+02
3.2425e+02 3.2475e+02 3.2525e+02 3.2575e+02 3.2625e+02 3.2675e+02
3.2725e+02 3.2775e+02 3.2825e+02 3.2875e+02 3.2925e+02 3.2975e+02
3.3025e+02 3.3075e+02 3.3125e+02 3.3175e+02 3.3225e+02 3.3275e+02
3.3325e+02 3.3375e+02 3.3425e+02 3.3475e+02 3.3525e+02 3.3575e+02
3.3625e+02 3.3675e+02 3.3725e+02 3.3775e+02 3.3825e+02 3.3875e+02
3.3925e+02 3.3975e+02 3.4025e+02 3.4075e+02 3.4125e+02 3.4175e+02
3.4225e+02 3.4275e+02 3.4325e+02 3.4375e+02 3.4425e+02 3.4475e+02
3.4525e+02 3.4575e+02 3.4625e+02 3.4675e+02 3.4725e+02 3.4775e+02
3.4825e+02 3.4875e+02 3.4925e+02 3.4975e+02 3.5025e+02 3.5075e+02
3.5125e+02 3.5175e+02 3.5225e+02 3.5275e+02 3.5325e+02 3.5375e+02
3.5425e+02 3.5475e+02 3.5525e+02 3.5575e+02 3.5625e+02 3.5675e+02
3.5725e+02 3.5775e+02 3.5825e+02 3.5875e+02 3.5925e+02 3.5975e+02]
``````
``````rvar = ncr["var"]
rice_var = rvar[:]
print(rice_var)
rlat = ncr["lat"]
rice_lat = rlat[:]
print(rice_lat)
``````
``````[[-- -- -- ... -- -- --]
[-- -- -- ... -- -- --]
[-- -- -- ... -- -- --]
...
[-- -- -- ... -- -- --]
[-- -- -- ... -- -- --]
[-- -- -- ... -- -- --]]
[-89.75 -89.25 -88.75 -88.25 -87.75 -87.25 -86.75 -86.25 -85.75 -85.25
-84.75 -84.25 -83.75 -83.25 -82.75 -82.25 -81.75 -81.25 -80.75 -80.25
-79.75 -79.25 -78.75 -78.25 -77.75 -77.25 -76.75 -76.25 -75.75 -75.25
-74.75 -74.25 -73.75 -73.25 -72.75 -72.25 -71.75 -71.25 -70.75 -70.25
-69.75 -69.25 -68.75 -68.25 -67.75 -67.25 -66.75 -66.25 -65.75 -65.25
-64.75 -64.25 -63.75 -63.25 -62.75 -62.25 -61.75 -61.25 -60.75 -60.25
-59.75 -59.25 -58.75 -58.25 -57.75 -57.25 -56.75 -56.25 -55.75 -55.25
-54.75 -54.25 -53.75 -53.25 -52.75 -52.25 -51.75 -51.25 -50.75 -50.25
-49.75 -49.25 -48.75 -48.25 -47.75 -47.25 -46.75 -46.25 -45.75 -45.25
-44.75 -44.25 -43.75 -43.25 -42.75 -42.25 -41.75 -41.25 -40.75 -40.25
-39.75 -39.25 -38.75 -38.25 -37.75 -37.25 -36.75 -36.25 -35.75 -35.25
-34.75 -34.25 -33.75 -33.25 -32.75 -32.25 -31.75 -31.25 -30.75 -30.25
-29.75 -29.25 -28.75 -28.25 -27.75 -27.25 -26.75 -26.25 -25.75 -25.25
-24.75 -24.25 -23.75 -23.25 -22.75 -22.25 -21.75 -21.25 -20.75 -20.25
-19.75 -19.25 -18.75 -18.25 -17.75 -17.25 -16.75 -16.25 -15.75 -15.25
-14.75 -14.25 -13.75 -13.25 -12.75 -12.25 -11.75 -11.25 -10.75 -10.25
-9.75  -9.25  -8.75  -8.25  -7.75  -7.25  -6.75  -6.25  -5.75  -5.25
-4.75  -4.25  -3.75  -3.25  -2.75  -2.25  -1.75  -1.25  -0.75  -0.25
0.25   0.75   1.25   1.75   2.25   2.75   3.25   3.75   4.25   4.75
5.25   5.75   6.25   6.75   7.25   7.75   8.25   8.75   9.25   9.75
10.25  10.75  11.25  11.75  12.25  12.75  13.25  13.75  14.25  14.75
15.25  15.75  16.25  16.75  17.25  17.75  18.25  18.75  19.25  19.75
20.25  20.75  21.25  21.75  22.25  22.75  23.25  23.75  24.25  24.75
25.25  25.75  26.25  26.75  27.25  27.75  28.25  28.75  29.25  29.75
30.25  30.75  31.25  31.75  32.25  32.75  33.25  33.75  34.25  34.75
35.25  35.75  36.25  36.75  37.25  37.75  38.25  38.75  39.25  39.75
40.25  40.75  41.25  41.75  42.25  42.75  43.25  43.75  44.25  44.75
45.25  45.75  46.25  46.75  47.25  47.75  48.25  48.75  49.25  49.75
50.25  50.75  51.25  51.75  52.25  52.75  53.25  53.75  54.25  54.75
55.25  55.75  56.25  56.75  57.25  57.75  58.25  58.75  59.25  59.75
60.25  60.75  61.25  61.75  62.25  62.75  63.25  63.75  64.25  64.75
65.25  65.75  66.25  66.75  67.25  67.75  68.25  68.75  69.25  69.75
70.25  70.75  71.25  71.75  72.25  72.75  73.25  73.75  74.25  74.75
75.25  75.75  76.25  76.75  77.25  77.75  78.25  78.75  79.25  79.75
80.25  80.75  81.25  81.75  82.25  82.75  83.25  83.75  84.25  84.75
85.25  85.75  86.25  86.75  87.25  87.75  88.25  88.75  89.25  89.75]
``````
``````from statistics import mean

rice_lat.mean()
rice_var.mean()
``````
``````3.7061101112565447
``````

### 1.1: Basic Plotting

Now that we have our variables organized, we are going to do some basic plotting with the 1990 data. We will first plot the rice yield along with it’s lon/lat coordinate. Then we will add in the other variables. In the next section, we will do a time series analysis using mulitple years and the same variables.

``````rice_varmod = rice_var[:, 0]
rice_lonmod = rice_lon[:360]
rice_varmod.shape, rice_lat.shape, rice_lonmod.shape
``````
``````((360,), (360,), (360,))
``````
``````from mpl_toolkits.mplot3d import Axes3D

x = rice_lonmod
y = rice_lat
z = rice_varmod

fig = plt.figure(figsize=(6, 6))
ax = fig.add_subplot(111, projection="3d")
ax.scatter(x, y, z)
plt.show()
``````

There appears to be some bifurcation in the data which could signal some type of clustering depending on the geography. In addition, there is a slight error in this visual as I had to slice the datasets in order for them to have the same dimensions. I’m going to try to reorganize these datasets more realistically and according to actual location. But first let’s plot the yields individually with latitude and longtitude separately.

``````plt.scatter(y, z)
plt.title("Rice: Latitude vs. Yield")
plt.xlabel("Latitude")
plt.ylabel("Yield")
plt.show()
``````

## 1.2: Findings

From our basic plotting, it seems there is higher yield at more extreme values of latitude. We shall next see what happens with yield of the other crops as we move latitudinally.

Therefore, in Section 2, we will incorporate the other yields before adding in our time series analysis in Section 3. We will be able to further test our thesis: testing yields as we move away from latitudinal 0.

## 2.0: Adding More Crops

We are now going to add the other crops, wheat, soybean and maize, to our analysis. We hope to not only see the same correlation, but perhaps a trend of which crops respond more to latitudinal change.

``````wvar = ncw["var"]
wheat_var = wvar[:]

wlat = ncw["lat"]
wheat_lat = wlat[:]

wheat_varmod = wheat_var[:, 0]

x = wheat_lat
y = wheat_varmod

plt.scatter(x, y)
plt.title("Wheat: Latitude vs. Yield")
plt.xlabel("Latitude")
plt.ylabel("Yield")
plt.show()
``````

Very Interesting. Wheat seems to show more consistent change in yield with latitudinal change. In addition, at lower latitude, the yield for wheat appears nonexistent.

``````mvar = ncm["var"]
maize_var = mvar[:]

mlat = ncm["lat"]
maize_lat = mlat[:]

maize_varmod = maize_var[:, 0]

x = maize_lat
y = maize_varmod

plt.scatter(x, y)
plt.title("Maize: Latitude vs. Yield")
plt.xlabel("Latitude")
plt.ylabel("Yield")
plt.show()
``````

We see similar clustering with maize as we do with rice. Perhaps they grow well in similar climates, unlike wheat, which seems to prefer a very specific climate window.

## 2.1: Combination

We are going to place all of the crops on the same scatterplot. After assessing the findings, we will proceed to add time series into the data.

``````wheat_plot = plt.scatter(wheat_lat, wheat_varmod, color="b")
rice_plot = plt.scatter(rice_lat, rice_varmod, color="r")
maize_plot = plt.scatter(maize_lat, maize_varmod, color="g")
plt.title("Crops: Latitude vs. Yield")
plt.xlabel("Latitude")
plt.ylabel("Yield")
plt.legend(
(wheat_plot, rice_plot, maize_plot),
("Wheat", "Rice", "Maize"),
loc="upper left",
fontsize=8,
)
``````
``````<matplotlib.legend.Legend at 0x27da1e01f10>
``````

While none of the crops appear to grow well at latitudes closer to 0, wheat doesn’t even appear to be attempted at any latitude plus or minus 30. Wheat yield also appears to have the most linear correlation at increasing absolute latitudes, with rice representing almost exponential growth with latitudinal change.

## 2.2: Time Series with Single Variable

To keep things simple for now, we will use a single variable to do a time series analysis of yields with regards to latitude. We will use the wheat yield, as it appears to show the most consistent change. The period being assessed is 1990-1995.

``````wheat91_path = "wheat/yield_1991.nc4"
wheat92_path = "wheat/yield_1992.nc4"
wheat93_path = "wheat/yield_1993.nc4"
wheat94_path = "wheat/yield_1994.nc4"
wheat95_path = "wheat/yield_1995.nc4"
``````
``````ncw91 = netCDF4.Dataset(wheat91_path, "r")
ncw92 = netCDF4.Dataset(wheat92_path, "r")
ncw93 = netCDF4.Dataset(wheat93_path, "r")
ncw94 = netCDF4.Dataset(wheat94_path, "r")
ncw95 = netCDF4.Dataset(wheat95_path, "r")
``````
``````wvar1 = ncw91["var"]
wheat_var1 = wvar1[:]

wlat1 = ncw91["lat"]
wheat_lat1 = wlat1[:]

wheat_varmod1 = wheat_var1[:, 0]

wvar2 = ncw92["var"]
wheat_var2 = wvar2[:]

wlat2 = ncw92["lat"]
wheat_lat2 = wlat2[:]

wheat_varmod2 = wheat_var2[:, 0]

wvar3 = ncw93["var"]
wheat_var3 = wvar3[:]

wlat3 = ncw93["lat"]
wheat_lat3 = wlat3[:]

wheat_varmod3 = wheat_var3[:, 0]

wvar4 = ncw94["var"]
wheat_var4 = wvar4[:]

wlat4 = ncw94["lat"]
wheat_lat4 = wlat4[:]

wheat_varmod4 = wheat_var4[:, 0]

wvar5 = ncw95["var"]
wheat_var5 = wvar5[:]

wlat5 = ncw95["lat"]
wheat_lat5 = wlat5[:]

wheat_varmod5 = wheat_var5[:, 0]
``````
``````wheat_plot = plt.scatter(wheat_lat, wheat_varmod, color="b")
wheat_plot1 = plt.scatter(
wheat_lat1, wheat_varmod1, color="r"
)
wheat_plot2 = plt.scatter(
wheat_lat2, wheat_varmod2, color="g"
)
wheat_plot3 = plt.scatter(
wheat_lat3, wheat_varmod3, color="m"
)
wheat_plot4 = plt.scatter(
wheat_lat4, wheat_varmod4, color="y"
)
wheat_plot5 = plt.scatter(
wheat_lat5, wheat_varmod5, color="k"
)
plt.title("Wheat 1990-1995: Latitude vs. Yield")
plt.xlabel("Latitude")
plt.ylabel("Yield")
plt.legend(
(
wheat_plot,
wheat_plot1,
wheat_plot2,
wheat_plot3,
wheat_plot4,
wheat_plot5,
),
(
"Wheat 1990",
"Wheat 1991",
"Wheat 1992",
"Wheat 1993",
"Wheat 1994",
"Wheat 1995",
),
loc="upper left",
fontsize=8,
)
``````
``````<matplotlib.legend.Legend at 0x27da0b22190>
``````

It appears wheat grows best between 45-50 degrees from 0.

## 2.3: Regression for Wheat 1990-1995

Let’s execute a regression line for these wheat values.

``````# import seaborn as sns
``````
``````# sns.regplot(x=["wheat_lat", "wheat_lat1", "wheat_lat2", "wheat_lat3", "wheat_lat4", "wheat_lat5"],
# y=["wheat_varmod", "wheat_varmod1", "wheat_varmod2", "wheat_varmod3", "wheat_varmod4", "wheat_varmod5"])
``````

### Sources

1. Iizumi, Toshichika (2019): Global dataset of historical yields v1.2 and v1.3 aligned version. PANGAEA, https://doi.org/10.1594/PANGAEA.909132, Supplement to: Iizumi, Toshichika; Sakai, T (2020): The global dataset of historical yields for major crops 1981–2016. Scientific Data, 7(1), https://doi.org/10.1038/s41597-020-0433-7