Final Project: Proposal

Please prepare a short proposal on your final project idea for next Thursday, Nov 2. The proposal should include:

  • Title & description of the project
  • Your name & partner’s name
  • A description of the data required, and how it will be obtained (e.g. URL/DOI to data source)
  • 3 questions / analysis tasks you will perform on the data; in the spirit of the assignments we have been doing.

You may choose to work with your partner or independently on the final project. Please indicate which clearly in your proposal.

Replicating results of an existing study and exploring the impact of alternative assumptions in the data preparation, statistical methods chosen etc can provide an excellent template for an analysis (you’ll see more of this in units 3 & 4)

Please create your proposal in a markdown file called in the root directory of the final project repo.

Preliminary Rubric (additional areas will be added)

Project questions must illustrate all of the following tasks:

  • Some form of data access / reading into R
  • Data tidying preparation
  • Initial data visualization
  • Use of GitHub
  • Reproducible execution with use of Travis
  • RMarkdown writeup, with final submission as a nicely formatted PDF document that includes code and results.
  • Overall clean and clear presentation of repository, code, and explanations.

and at least three of the following skills (this list may be modified/extended):

  • Use of at least 5 dplyr verbs / functions
  • Writing / working with custom R functions
  • Creating an R package for functions used in the analysis
  • Interaction with an API
  • Use of regular expressions
  • Use of an external relational database
  • Preparing processed data for archiving / publication
  • Parsing extensible data formats (JSON, XML)
  • Use of spatial vector data (sf package) and visualization of spatial data
  • Creation of an R package
  • Use of purrr package functions for iteration
  • Manipulation of dates or strings
  • Unique challenges you encounter: particularly messy data/ non-standard formats, special emphasis on visualization or presentation of results.