Plan

  • What’s the first thing to do when you are ready to enter data?
  • Plan a data structure follows tidy data rules
  • Ideally plan before you collect data and match datasheets to entry format

Where to enter data

  • Spreadsheet

  • Text file

  • Database

  • Form (web or GUI databases)

  • All of these are reasonable options

  • When used properly spreadsheets or forms for databases can provide additional protection against bad data being entered

Quality Assurance

  • Stopping bad data from ever being entered

  • Set rules about what values can be entered into a column

  • Let’s create a data table with information on the date of sampling, the plot being sampled, and Species ID and mass of each individual small mammal that we catch.

  • Header row: Date, Plot, SpeciesID, Mass

  • To setup a quality assurance rule let’s select the Plots column

  • Data -> Data Validation

  • Plot numbers are integers so choose the data type: Whole numbers (only that type can be entered)

  • Set limitations

    • Use Data to set the type of limitation: between
    • There are 24 plots so use boxes to provide limit the possible values to between 1 and 24
  • Add a useful error message using Error Alert

    • “Plot numbers must be whole numbers 1-24”
  • Now enter some plot data

  • The numbers 1-24 are entered normally

  • If we try to enter an invalid number, e.g., 222, we get an error

  • We can also limit decimal numbers

  • Select Mass column

  • Data -> Data Validation -> Allow = Decimal

  • Masses have to be positive, so set Minimum = 0

  • The largest mammal that can fit in the small traps used in this study is 300 g, so set Maximum = 300

  • Limit entries to a list of choices

  • This allows us to control data entry for fields like Species ID

  • Select SpeciesID column

  • Data -> Data Validation -> Allow = List

  • Enter list values in Sources: DM, DO, DS, PP, PM

  • Trying to enter a value not on the list results in an error

  • This also generates a drop-down menu, so we can select the SpeciesID instead of typing it in

Care with Spreadsheets

  • Be careful of data conversion issues from spreadsheets

Show posts/papers on dates and gene names

Entering Dates

  • E.g., if we enter the date 2020-02-26 and hit Enter it will get converted to 2/26/2020
  • One solution to this is to tell Excel that the dates are text
  • Select the Date column
  • Home -> Number -> Dropdown -> Text
  • This will ensure that dates will remain in the form you entered them and will export properly for analysis in R or other languages

Optional Exercise 2 - Data Entry Validation in Excel.

Quality Control

  • Looking for bad data that has already been entered
  • Sort
  • Graph
  • Check for realistic ranges of values