GREENHOUSE
MODULE 1
MODULE 2
MODULE 3
R TUTORIAL
CODAPP
HANDOUTS




Using Regression Models to Grow the Best Crops

Contributors: Shonda Kuiper, Anna Olsen, Ginger Rowell, DASIL, Shreyas Agrawal' 24












Fun Fact:fun fact icon

Several statistics techniques were originally developed to address agricultural research questions. Click here to read more!




Part 2A: Introduction

In Part 1 of this lab, we used the Greenhouse game to collect data and reviewed rules for simple linear regression. Recall that the Greenhouse game was developed so that the yields (amount of crops produced) follow models that reflect actual crop growth in the United States. Each test plot represents 1/10th of an acre of land and yields are the number of bushels produced on that plot. In this part of the lab, we will build and compare several regression models using sample data.

We will start by comparing linear, quadratic, and cubic models.

  • The linear least-squares equation has the form: Ŷ = b0 + b1
  • The quadratic least-squares equation has the form: Ŷ = b0 + b1X + b2X 2
  • The cubic least-squares equation has the form: Ŷ = b0 + b1X + b2X 2 +  b3X 3
  • We will then consider the use of models with interaction terms.

  • A least-squares equation with two explanatory variables and no interaction term has the form: Ŷ = b0 + b1X1 + b2X2
  • A least-squares equation with an interaction term has the form: Ŷ = b0 + b1X1 + b2X2 + b3X1X2



  • Part 2B: Exploring Regression Models

    We will use Settings A (at left) in the Greenhouse Models app to gain a better visual understanding of more advanced regression models shown above. Check your understanding with the Exploring Regression Models questions below.

    Settings A

  • Group ID   sample1
  • PlayerID:   [leave blank]
  • Level:   Challange
  • X-axis Variable:   Water
  • Y-axis Variable:   Profit
  • Select Crop:   Corn
  • Facet:   None
  • Statistical Model:   Linear, Quadratic, Cubic (try all 3)
  • Ignore other items, such as the Remove Interaction Terms, Nitrate Levels, and X-axis Limits.

    Settings B

  • Group ID   sample2
  • PlayerID:   [leave blank]
  • Level:   Challange
  • X-axis Variable:   Water
  • Y-axis Variable:   Yield
  • Select Crop:   Bean,Tomato
  • Facet:   None
  • Statistical Model:   Linear
  • Ignore other items, such as the Remove Interaction Terms, Nitrate Levels, and X-axis Limits.

    Instructors Note: Go to faculty resources to access student data



    Part 2D: Greenhouse Challenge: Drawing Conclusions from Data challenge icon


    1. Use Settings A for the following (make sure only sample1 is selected):

    a) Explain why you would expect the quadratic model to have a higher R-squared value than the linear model

    b) Explain why you would expect the quadratic model to have a similar R-squared value as the cubic model

    c) When we used only sample1 data, the linear model was somewhat effective. When both sample1 and sample2 data are used for the corn crop, the coefficients for the linear model changed. Explain why the R-squared valued dropped so much when both sample1 and sample2 data are used.

    d) Give a possible explanation as to why the p-value for X2 under the cubic model is so different than with the quadratic model. How does this help to explain why these individual p-values should not be used when trying to determine whether or not a term is important in our model?


    2. Use Settings B for the following (make sure only sample2 is selected):

    a) Explain how the interaction term (i.e. Water*Tomatoes in this example) modifies our predictions.

    b) Explain why the interaction term should be included if you are interested in accurately predicting crop yields.

    c) Notice that the linear coefficient is negative ( - 0.30347*X) in the model with an interaction. However it is positive ( 0.01632*X) in the model without an interaction. When there are multiple terms in a regression model, does it appear that the direction of the coefficients are meaningful? In other words, explain why we should be hesitant to say that the model shows “adding water will increase yields” or “adding water will decrease yields”.

    2E: Data Literacy Breakdown data literacy icon

      3. Take a few minutes to read the one-page Nature article
        a) What are the two explanatory variables?
        b) What is the response variable for this data?
        c) What was the authors’ reasoning for focusing on the simple linear regression technique and not the other techniques mentioned?
        d) Do the authors’ arguments seem reasonable? Give a brief explanation as to why the model created by the authors does not appropriately address their research question.
        e) In part 1 of the Greenhouse game, a linear model looked like a good fit, but the linear model was not helpful in identifying the optimum Yield or the optiumum Profit. How is this related to the errors made in the Nature article?



    Continue to Part 3





    Data Stories


    NYPD
    Covid-19
    Recidivism
    Brexit

    Stats Games


    Racer
    Greenhouse
    Statistically Grounded
    Psychic

    Questions?


    If you have any questions or comments, please email us at DASIL@grinnell.edu

    Dataspace is supported by the Grinnell College Innovation Fund and was developed by Grinnell College faculty and students. Copyright © 2021. All rights reserved

    This page was last updated on  November 11th  2024.