GREENHOUSE

Using Regression Models to Grow the Best Crops

Contributors: Shonda Kuiper, Anna Olsen, Ginger Rowell, DASIL, Shreyas Agrawal' 24

Fun Fact:

Several statistics techniques were originally developed to address agricultural research questions. Click here to read more!

Part 2A: Introduction

In Part 1 of this lab, we used the Greenhouse game to collect data and reviewed rules for simple linear regression. Recall that the Greenhouse game was developed so that the yields (amount of crops produced) follow models that reflect actual crop growth in the United States. Each test plot represents 1/10th of an acre of land and yields are the number of bushels produced on that plot. In this part of the lab, we will build and compare several regression models using sample data.

We will start by comparing linear, quadratic, and cubic models.

The linear least-squares equation has the form: Ŷ = b₀ + b₁X

The quadratic least-squares equation has the form: Ŷ = b₀ + b₁X + b₂X ²

The cubic least-squares equation has the form: Ŷ = b₀ + b₁X + b₂X ² + b₃X ³

We will then consider the use of models with interaction terms.

A least-squares equation with two explanatory variables and no interaction term has the form: Ŷ = b₀ + b₁X₁ + b₂X₂

A least-squares equation with an interaction term has the form: Ŷ = b₀ + b₁X₁ + b₂X₂ + b₃X₁X₂

Part 2B: Exploring Regression Models

We will use Settings A (at left) in the Greenhouse Models app to gain a better visual understanding of more advanced regression models shown above. Check your understanding with the Exploring Regression Models questions below.

Settings A

Group ID sample1

PlayerID: [leave blank]

Level: Challange

X-axis Variable: Water

Y-axis Variable: Profit

Select Crop: Corn

Facet: None

Statistical Model: Linear, Quadratic, Cubic (try all 3)

Ignore other items, such as the Remove Interaction Terms, Nitrate Levels, and X-axis Limits.

Settings B

Group ID sample2

PlayerID: [leave blank]

Level: Challange

X-axis Variable: Water

Y-axis Variable: Yield

Select Crop: Bean,Tomato

Facet: None

Statistical Model: Linear

Ignore other items, such as the Remove Interaction Terms, Nitrate Levels, and X-axis Limits.

Instructors Note: Go to faculty resources to access student data

Part 2D: Greenhouse Challenge: Drawing Conclusions from Data

1. Use Settings A for the following (make sure only sample1 is selected):

a) Explain why you would expect the quadratic model to have a higher R-squared value than the linear model

b) Explain why you would expect the quadratic model to have a similar R-squared value as the cubic model

c) When we used only sample1 data, the linear model was somewhat effective. When both sample1 and sample2 data are used for the corn crop, the coefficients for the linear model changed. Explain why the R-squared valued dropped so much when both sample1 and sample2 data are used.

d) Give a possible explanation as to why the p-value for X² under the cubic model is so different than with the quadratic model. How does this help to explain why these individual p-values should not be used when trying to determine whether or not a term is important in our model?

2. Use Settings B for the following (make sure only sample2 is selected):

a) Explain how the interaction term (i.e. Water*Tomatoes in this example) modifies our predictions.

b) Explain why the interaction term should be included if you are interested in accurately predicting crop yields.

c) Notice that the linear coefficient is negative ( - 0.30347*X) in the model with an interaction. However it is positive ( 0.01632*X) in the model without an interaction. When there are multiple terms in a regression model, does it appear that the direction of the coefficients are meaningful? In other words, explain why we should be hesitant to say that the model shows “adding water will increase yields” or “adding water will decrease yields”.